Kernel Density Estimation is a non-parametric method used primarily to estimate the probability density function of a collection of discrete data points. If \((x_{1}, x_{2}, \cdots, x_{n})\) represent an independent and identically distributed sample drawn from an unknown distribution, then the kernel density estimator estimates the shape of the density function from the unknown distribution.

The general form of a kernel-smoothed density function can be represented as:

$$ \hat f(x) =\sum_{x_{i}} k_{x_{i}}(x)f_{n}(x_{i}), $$

where \(f_{n}(x_{i})\) is the probability of point \(x_{i}\) in the empirical distribution (usually \(\frac {1}{n}\)).

What follows are a few key definitions which will be useful throughout the remainder of the post:

  • \(K_{x_{i}}(x)\) represents the cumulative distribution function of the distribution used for point \(x_{i}\) and evaluated at \(x\).

  • \(k_{x_{i}}(x)\) represents the probability density function of the distribution used for point \(x_{i}\) and evaluated at \(x\).

  • \(\hat{F}(x) = \sum_{i=1}^{n}\Big(\frac{1}{n}\Big)K_{x_{i}}(x)\) represents the kernel-smoothed distribution function.

  • \(\hat{f}(x) = \sum_{i=1}^{n}\Big(\frac{1}{n}\Big)k_{x_{i}}(x)\) represents the kernel-smoothed density function.

Kernel smoothing consists of selecting a distribution to use for each point \(x_{i}\) in \((x_{1}, x_{2}, \cdots, x_{n})\). Then the kernel-smoothed density estimator is an equally-weighted mixture of the \(n\) densities.

The kernel (or basis function) is the distribution for which \(k_{x_{i}}(x)\) is the density function and \(K_{x_{i}}(x)\) is the distribution function. The kernel can be one of a wide range of distributions, but in this post, discussion will be limited to:

  • Uniform kernel
  • Triangular kernel
  • Gaussian kernel

Of these, only the Gaussian has support over the entire real line: The Uniform and Triangular each have compact, finite support.

For each kernel, it is necessary to provide a specification for the bandwidth, which controls the level of smoothing for a kernel denisty estimate. Generally, as bandwidth increases, more nuanced detail may tend to be smoothed over, obscuring much of the underlying structure. Later we’ll see how changing bandwidth affects the overall appearance of a kernel density estimate.

If Gaussian kernel functions are used to approximate a set of discrete data points, the optimal choice for bandwidth is:

$$ h = \left(\frac{4\hat{\sigma}^5}{3n}\right)^{\frac{1}{5}} \approx 1.06 \hat{\sigma} n^{-1/5} $$

where \({\displaystyle}{\hat{\sigma }}\) is the standard deviation of the samples. This approximation is termed the normal distribution approximation, Gaussian approximation, or Silverman’s rule of thumb 1.

Note that the most common method of assessing the optimality of a particular bandwidth selection is the mean integrated squared error. However, this post is focused on implementing kernel density estimators rather than determining an optimal bandwidth selection. Therefore, the only bandwidth evaluation we will perform will be qualitative and visual in nature.
In what follows, we demonstrate how each of the previously mentioned kernels can be implemented in Python. For each kernel, we’ll construct a function that takes for arguments 1) the data point of interest, and 2) a bandwidth specification. We begin with the specification for the uniform kernel density.

Uniform Kernel

The uniform kernel density estimator is given by:

$$ k_{x_{i}}(x) = \begin{cases} \frac{1}{2b}, & x_{i} -b \leq x \leq x_{i} + b \\ 0, & \text{otherwise} \end{cases} $$



For the uniform kernel distribution function we have:

$$ K_{x_{i}}(x) = \begin{cases} \frac{x - (x_{i} - b)}{2b}, & x_{i} -b \leq x \leq x_{i} + b \\ 1, & x \gt x_{i} + b \end{cases} $$


Translating these specifications into Python results in the following:

# ============================================
# Uniform Kernel PDF & CDF                   |
# ============================================

def uniform_pdf(x_i, bandwidth):
    """Return uniform kernel density estimator."""
    lowerb = (x_i - bandwidth)
    upperb = (x_i + bandwidth)
    def evaluate(x):
        """Evaluate x."""
        if  x<=lowerb: pdf=0
        elif x>upperb: pdf=0
        else: pdf=(1/(2*bandwidth))
        return(pdf)
    return(evaluate)



def uniform_cdf(x_i, bandwidth):
    """Return uniform kernel distribution estimator."""
    lowerb = (x_i - bandwidth)
    upperb = (x_i + bandwidth)
    def evaluate(x):
        """Evaluate x."""
        if  x<=lowerb: cdf=0
        elif x>upperb: cdf=1
        else: cdf=((x - (x_i - bandwidth))/2*bandwidth)
        return(cdf)
    return(evaluate)



Note that uniform_pdf is a closure: When called, it returns a function, which can then be passed an arbitrary input. The kernel density estimate of the input will be returned, and when combined with the kernel density estimators for all other points in the dataset of interest, we obtain a rough estimate of the distribution’s underlying density.

For the uniform and each of the remaining kernel estimates, we require a function which can combine the kernel estimates for each data point and carry out the density averaging. We introduce kde_pdf and kde_cdf, both of which take for arguments an array of data points, a kernel function and a bandwidth specification. Like uniform_pdf and uniform_cdf, kde_pdf and kde_cdf each return a function, which then can be used to evaluate the kernel estimate at any point, or as in our case to generating visualizations:


# ===================================================
# kde_pdf and kde_cdf are used for compiling kernel |
# density and distribution estimates.               |
# ===================================================


def kde_pdf(data, kernel_func, bandwidth):
    """Generate kernel density estimator over data."""
    kernels = dict()
    n = len(data)
    for d in data:
        kernels[d] = kernel_func(d, bandwidth)
    def evaluate(x):
        """Evaluate `x` using kernels above."""
        pdfs = list()
        for d in data: pdfs.append(kernels[d](x))
        return(sum(pdfs)/n)
    return(evaluate)



def kde_cdf(data, kernel_func, bandwidth):
    """Generate kernel distribution estimator over data."""
    kernels = dict()
    n = len(data)
    for d in data:
        kernels[d] = kernel_func(d, bandwidth)

    def evaluate(x):
        """Evaluate x using kernels above."""
        cdfs = list()
        for d in data: cdfs.append(kernels[d](x))
        return (sum(cdfs)/n)
    return(evaluate)


An example using these functions would be the following:

Suppose you have the points \([5, 12, 15, 20]\), and you’re interested in obtaining a kernel density estimate based on the data points using a uniform kernel. You would pass uniform_pdf to kde_pdfs kernel_func argument, along with the desired bandwidth, and then pass any value to the function returned by kde_pdf to obtain the kernel density estimation for that point with a uniform kernel and specified bandwidth. For example, using a bandwidth of 1:

vals = [5, 12, 15, 20]

eval_kde = kde_pdf(data=vals, kernel_func=uniform_pdf, bandwidth=1)

# pass `eval_kde` points to evaluate =>
eval_kde(10)    # 0
eval_kde(15.5)  # .125
eval_kde(20)    # .125



We now produce visualizations that demonstrate how changing bandwidth alters the appearance of the kernel density estimator. We will generate a density estimate based on the same sample data points using a uniform kernel with bandwidths 1, 2, 3 and 4, and then plot them side by side:

# ===============================================
# Logic for producing kernel density estimate   |
# vizualizations.                               |
# ===============================================
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np
import seaborn as sns
sns.set(color_codes=True)
plt.rcParams["figure.figsize"] = (15,10)


vals  = [5, 12, 15, 20]
avals = np.array(vals)
xvals = np.arange(min(vals), max(vals), .01)

fig = plt.figure()

# bandwidth=1:
ax1 = fig.add_subplot(2, 2, 1)
dist_1 = kde_pdf(data=vals, kernel_func=uniform_pdf, b=1)
y1 = [dist_1(i) for i in xvals]
ys1 = [dist_1(i) for i in vals]
ax1.scatter(vals, ys1)
ax1.plot(xvals, y1)


# bandwidth=2:
ax2 = fig.add_subplot(2, 2, 2)
dist_2 = kde_pdf(data=vals, kernel_func=uniform_pdf, bandwidth=2)
y2 = [dist_2(i) for i in xvals]
ys2 = [dist_2(i) for i in vals]
ax2.scatter(vals, ys2)
ax2.plot(xvals, y2)

# bandwidth=3:
ax3 = fig.add_subplot(2, 2, 3)
dist_3 = kde_pdf(vals, kernel_func=uniform_pdf, bandwidth=3)
y3 = [dist_3(i) for i in xvals]
ys3 = [dist_3(i) for i in vals]
ax3.scatter(vals, ys3)
ax3.plot(xvals, y3)

# bandwidth=4:
ax4 = fig.add_subplot(2, 2, 4)
dist_4 = kde_pdf(vals, kernel_func=uniform_pdf, bandwidth=4)
y4     = [dist_4(i) for i in xvals]
ys4    = [dist_4(i) for i in vals]
ax4.scatter(vals, ys4)
ax4.plot(xvals, y4)


# display gridlines 
g1 = ax1.grid(True)
g2 = ax2.grid(True)
g3 = ax3.grid(True)
g4 = ax4.grid(True)

# set title
t1=ax1.set_title(r"Uniform Kernel")
t2=ax2.set_title(r"Uniform Kernel")
t3=ax3.set_title(r"Uniform Kernel")
t4=ax4.set_title(r"Uniform Kernel")


# display legend in each subplot
leg1 = mpatches.Patch(color=None, label='bandwidth=1')
leg2 = mpatches.Patch(color=None, label='bandwidth=2')
leg3 = mpatches.Patch(color=None, label='bandwidth=3')
leg4 = mpatches.Patch(color=None, label='bandwidth=4')

ax1.legend(handles=[leg1])
ax2.legend(handles=[leg2])
ax3.legend(handles=[leg3])
ax4.legend(handles=[leg4])

plt.tight_layout()
plt.show()


Running the code above produces the following:


uniformkde



Notice that although the plot is discontinuous regardless of bandwidth selection, the estimated density appears more reasonable with bandwidth=4 than it does with bandwidth=1: Much of the random noise has been suppressed, and we get a pretty good idea of the shape of the approximate distribution. Next we’ll introduce the triangular kernel, which tempers some of the sharp discontinuities associated with the uniform kernel’s denisty estimate.


Triangular Kernel

The triangular kernel density estimator is given by:

$$ k_{x_{i}}(x) = \begin{cases} \frac{b - |x - x_{i}|}{b^{2}} & x_{i} -b \leq x \leq x_{i} + b \\ 0 & \text{otherwise} \end{cases} $$



For the triangular kernel distribution function we have:

$$ K_{x_{i}}(x) = \begin{cases} \frac{(x - x_{i} + b)^{2}}{2b^{2}} & x_{i} -b \leq x \leq x_{i} \\ 1 - \frac {(x_{i} + b -x)^{2}}{2b^{2}} & x_{i} \leq x \leq x_{i} + b \\ 1 & x \geq x_{i} + b \end{cases} $$


Translating these specifications into Python results in the following:


# ============================================
# Triangular Kernel PDF                      |
# ============================================

def triangular_pdf(x_i, bandwidth):
    """Return triangular kernel density estimator."""
    lowerb = (x_i - bandwidth)
    upperb = (x_i + bandwidth)
    def evaluate(x):
        """Evaluate x."""
        if  x <= lowerb: pdf=0
        elif x > upperb: pdf=0
        else: pdf = ((bandwidth-abs(x-x_i))/bandwidth**2)
        return (pdf)
    return (evaluate)


Just as we produced the uniform density visualizations for varying bandwidth, we can produce triangular density visualizations for varying bandwidth. Here’s the code parameterized for triangular kernels:


# ===============================================
# Logic for producing kernel density estimate   |
# vizualizations.                               |
# ===============================================
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np
import seaborn as sns
sns.set(color_codes=True)
plt.rcParams["figure.figsize"] = (15,10)


vals  = [5, 12, 15, 20]
avals = np.array(vals)
xvals = np.arange(min(vals), max(vals), .01)

fig = plt.figure()

# bandwidth=1:
ax1 = fig.add_subplot(2, 2, 1)
dist_1 = kde_pdf(data=vals, kernel_func=triangular_pdf, bandwidth=1)
y1 = [dist_1(i) for i in xvals]
ys1 = [dist_1(i) for i in vals]
ax1.scatter(vals, ys1)
ax1.plot(xvals, y1)


# bandwidth=2:
ax2 = fig.add_subplot(2, 2, 2)
dist_2 = kde_pdf(data=vals, kernel_func=triangular_pdf, bandwidth=2)
y2 = [dist_2(i) for i in xvals]
ys2 = [dist_2(i) for i in vals]
ax2.scatter(vals, ys2)
ax2.plot(xvals, y2)

# bandwidth=3:
ax3 = fig.add_subplot(2, 2, 3)
dist_3 = kde_pdf(vals, kernel_func=triangular_pdf, bandwidth=3)
y3 = [dist_3(i) for i in xvals]
ys3 = [dist_3(i) for i in vals]
ax3.scatter(vals, ys3)
ax3.plot(xvals, y3)

# bandwidth=4:
ax4 = fig.add_subplot(2, 2, 4)
dist_4 = kde_pdf(vals, kernel_func=triangular_pdf, bandwidth=4)
y4     = [dist_4(i) for i in xvals]
ys4    = [dist_4(i) for i in vals]
ax4.scatter(vals, ys4)
ax4.plot(xvals, y4)


# display gridlines 
g1 = ax1.grid(True)
g2 = ax2.grid(True)
g3 = ax3.grid(True)
g4 = ax4.grid(True)

# set title
t1=ax1.set_title(r"Triangular Kernel")
t2=ax2.set_title(r"Triangular Kernel")
t3=ax3.set_title(r"Triangular Kernel")
t4=ax4.set_title(r"Triangular Kernel")


# display legend in each subplot
leg1 = mpatches.Patch(color=None, label='bandwidth=1')
leg2 = mpatches.Patch(color=None, label='bandwidth=2')
leg3 = mpatches.Patch(color=None, label='bandwidth=3')
leg4 = mpatches.Patch(color=None, label='bandwidth=4')

ax1.legend(handles=[leg1])
ax2.legend(handles=[leg2])
ax3.legend(handles=[leg3])
ax4.legend(handles=[leg4])

plt.tight_layout()
plt.show()


As before, running the code above produces the following:


triangularkde



Although the triangular kernel is a definite improvement over the uniform kernel, it still has discontinuities at each of the transition points. However, the Gaussian density estimate has infinite support and thus is devoid of the sharp discontinuities characteristic of the uniform and triangular kernel estimates. We cover the Gaussian kernel next.


Gaussian Kernel

The primary difference between the uniform and triangular density estimates and the Gaussian is in the interpretation of the bandwidth. For the uniform and triangular kernels, the bandwidth specifies an effective distance from the point of interest in either direction for which the kernel in question has influence on the density estimate. In the Gaussian case, the bandwidth is interpreted as the standard deviation of a normal distribution with mean equal to the data point of interest. With this parameterization, the density estimate is continuous and has infinite support.

The Gaussian kernel density estimator is given by:

$$ k_{x_{i}}(x) = \begin{cases} \frac {1}{\sqrt {2 \pi b^{2}}}e^{\frac {(x - x_{i})^{2}}{2b^{2}}} & -\infty \lt x \lt +\infty \end{cases} $$

This can be easily implemented in Python as follows:

# ============================================
# Gaussian Kernel PDF                        |
# ============================================
import numpy as np


def gaussian_pdf(x_i, bandwidth):
    """Return Gaussian kernel density estimator."""
    x_bar  = x_i
    def evaluate(x):
        """Evaluate x."""
        pdf = (np.sqrt(2*np.pi*bandwidth**2)**-1) * np.exp(-((x - x_bar)**2)/(2*bandwidth**2))
        return(pdf)
    return(evaluate)


This is nothing more than a Normal PDF with standard deviation \(b\) and mean \(x_{i}\). Lets update our plotting logic to facilitate a Gaussian kernel density estimate:

# ===============================================
# Logic for producing kernel density estimate   |
# vizualizations.                               |
# ===============================================
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import numpy as np
import seaborn as sns
sns.set(color_codes=True)
plt.rcParams["figure.figsize"] = (15,10)


vals  = [5, 12, 15, 20]
avals = np.array(vals)
xvals = np.arange(min(vals), max(vals), .01)

fig = plt.figure()

# bandwidth=1:
ax1 = fig.add_subplot(2, 2, 1)
dist_1 = kde_pdf(data=vals, kernel_func=gaussian_pdf, bandwidth=1)
y1 = [dist_1(i) for i in xvals]
ys1 = [dist_1(i) for i in vals]
ax1.scatter(vals, ys1)
ax1.plot(xvals, y1)


# bandwidth=2:
ax2 = fig.add_subplot(2, 2, 2)
dist_2 = kde_pdf(data=vals, kernel_func=gaussian_pdf, bandwidth=2)
y2 = [dist_2(i) for i in xvals]
ys2 = [dist_2(i) for i in vals]
ax2.scatter(vals, ys2)
ax2.plot(xvals, y2)

# bandwidth=3:
ax3 = fig.add_subplot(2, 2, 3)
dist_3 = kde_pdf(vals, kernel_func=gaussian_pdf, bandwidth=3)
y3 = [dist_3(i) for i in xvals]
ys3 = [dist_3(i) for i in vals]
ax3.scatter(vals, ys3)
ax3.plot(xvals, y3)

# bandwidth=4:
ax4 = fig.add_subplot(2, 2, 4)
dist_4 = kde_pdf(vals, kernel_func=gaussian_pdf, bandwidth=4)
y4     = [dist_4(i) for i in xvals]
ys4    = [dist_4(i) for i in vals]
ax4.scatter(vals, ys4)
ax4.plot(xvals, y4)


# display gridlines 
g1 = ax1.grid(True)
g2 = ax2.grid(True)
g3 = ax3.grid(True)
g4 = ax4.grid(True)

# set title
t1=ax1.set_title(r"Gaussian Kernel")
t2=ax2.set_title(r"Gaussian Kernel")
t3=ax3.set_title(r"Gaussian Kernel")
t4=ax4.set_title(r"Gaussian Kernel")


# display legend in each subplot
leg1 = mpatches.Patch(color=None, label='bandwidth=1')
leg2 = mpatches.Patch(color=None, label='bandwidth=2')
leg3 = mpatches.Patch(color=None, label='bandwidth=3')
leg4 = mpatches.Patch(color=None, label='bandwidth=4')

ax1.legend(handles=[leg1])
ax2.legend(handles=[leg2])
ax3.legend(handles=[leg3])
ax4.legend(handles=[leg4])

plt.tight_layout()
plt.show()



Yet again, running the code above produces the following:


gaussiankde



As expected, the Gaussian kernel density estimate is continuous and smooth across the entire range of inputs.

Conclusion

Note that it is also possible to estimate the distribution function for a collection of data points using any of the kernels covered here. This technique will
be explored further in a future post.
If your project requires production-quality denisty estimation functionality, be sure to check out scikit-learn, which exposes robust, thoroughly-tested density estimation routines.

Until next time, happy coding!





Footnotes:

  1. https://en.wikipedia.org/wiki/Kernel_density_estimation#Bandwidth_selection
  2. Weishaus, Abraham. ASM Exam C/4 Study Manual, 17th edition, 2014