Bootstrapping is a resampling method used to estimate the variability of statistical parameters from a dataset which is repeatedly sampled with replacement. As the name implies, the empirical bootstrap makes no assumptions regarding the distribution of the sample, and only requires it be a good approximation of the underlying population distribution. Although for most problems it is impossible to know a statistic’s true confidence interval, the bootstrap method is asymptotically more accurate than the standard intervals obtained using sample variance and assumptions of normality.

In Bootstrap Methods for Standard Errors, Confidence Intervals and other Measures of Statistical Accuracy, Efron and Tibshirani present four methods for constructing approximate confidence intervals for a statistic of interest. The method described here is analagous to the second approach, the bootstrap percentile method.

If the original sample of size \(n\) is represented as

$$ x_{1}, x_{2}, \cdots, x_{n}, $$

then a bootstrap sample of size \(n\) is given by

$$ x_{1}^{*}, x_{2}^{*}, \cdots, x_{n}^{*} $$

where \(x_{1}^{*}, x_{2}^{*}, \cdots, x_{n}^{*}\) is a resample with replacement from the original dataset.

Bootstrapping allows estimation of the sampling distribution of almost any statistic, including the mean, standard deviation, median or any other percentiles of interest.

Carrying out the following steps results in computing the empirical bootstrap 90% confidence interval for the mean of an arbitrary sample:

1. Compute the sample mean of the dataset, denoted as \(\bar{x}\).

2. Sample the initial dataset with replacement (the size of the resample should be the same as the initial dataset).

3. Compute the statistic of interest based on the resampling. For this example, the resampled sample mean is denoted as \(\bar{x}_{i}^{*}\).

4. Repeat steps 2-3 many times, preserving \(\bar{x}_{i}^{*}\) from each iteration for later use.

After the data has been resampled, the resulting array of \(\bar{x}_{i}^{*}\)s are to be ranked in ascending order. Assume that in our example we generated 1000 samples with replacement. To obtain the 90% bootstrap confidence interval from our ranked collection of resampled sample means, extract the 50th and 950th elements from the collection, \(\bar{x}_{50}^{*}\) and \(\bar{x}_{950}^{*}\), which correspond to the 5th and 95th percentiles of the distribution of sample means. More generally, for a sample size \(n\) and a confidence interval \(\alpha\), the position in the ordered collection of resampled means corresponding to the bounds of the empirical bootstrap confidence interval are given by

$$ [n(1-\alpha)/2, n(1+\alpha)/2] $$

For our example, plugging in \(n=1000\) and \(\alpha=.9\) yields \([1000*(1-.9)/2, 1000*(1+.9)/2]=[50, 950]\), as indicated above.


What follows is a closure written in Python that encapsulates the logic described above: First, the dataset along with the desired number of bootstrap samples to run are passed to the bootstrap function. bootstrap returns a function which can then be passed a confidence interval which returns a tuple containing the upper and lower bounds for the interval specified (for example, specifying .90 returns the 5th and 95th percentiles). Note that the functionality of bootstrap is similiar conceptually to scipy’s interpolate.interp1d function:

import numpy as np

def bootstrap(data, n=1000, func=np.mean):
    Generate `n` bootstrap samples, evaluating `func`
    at each resampling. `bootstrap` returns a function,
    which can be called to obtain confidence intervals
    of interest.
    simulations = list()
    sample_size = len(data)
    xbar_init = np.mean(data)
    for c in range(n):
        itersample = np.random.choice(data, size=sample_size, replace=True)
    def ci(p):
        Return 2-sided symmetric confidence interval specified
        by p.
        u_pval = (1+p)/2.
        l_pval = (1-u_pval)
        l_indx = int(np.floor(n*l_pval))
        u_indx = int(np.floor(n*u_pval))

To demomstrate the functionality of bootstrap, we calculate the 90%, 95%, 99% & 99.5% bootstrap confidence intervals for the mean based on the following sample:

$$ 10.3, 10.6, 11.7, 14.0, 14.2, 15.0, 16.8, 18.2, 21.3, 21.0 $$

>>> v = [10.3, 10.6, 11.7, 14.0, 14.2, 15.0, 16.8, 18.2, 21.3, 21.0]
>>> boot = bootstrap(v, n=5000)
>>> cintervals = [boot(i) for i in (.90, .95, .99, .995)]
>>> print(cintervals)

[(13.41, 17.36),
 (13.02, 17.73),
 (12.47, 18.45),
 (12.22, 18.66)]

Notice that unlike confidence intervals obtained from a normal or t-distribution, the bootstrapped confidence interval is not symmetric about the mean, which provides an indication of the degree of skewness of the population in question.

To gain insight into the shape of the distirbution of bootstrapped sample means, we can plot a histogram of the observations centered on the initial sample mean. Using matplotlib and seaborn, this is readily accomplished in just a few lines of code:

Generate a histogram of the distribution of sample means for 
5000 bootstrap resamplings.
import numpy as np
import matplotlib
import seaborn as sns

v = [10.3, 10.6, 11.7, 14.0, 14.2, 15.0, 16.8, 18.2, 21.3, 21.0]
xbar_init = np.round(np.mean(v), 2)

# generate 5000 resampled sample means  =>
means = [np.mean(np.random.choice(v,size=len(v),replace=True)) for i in range(5000)]
sns.distplot(means, color='r', kde=True, hist_kws=dict(edgecolor="b", linewidth=.675))
plt.xlabel("Initial Sample Mean: {}".format(xbar_init))
plt.title("Distribution of Sample Mean")
plt.axvline(x=xbar_init) # vertical line at xbar_init

Resulting in:


It should come as no suprise that the distribution of the sample mean tends toward normality as the number of observations increases (see Central Limit Theorem for more information).

In terms of bootstrap sophistication, the percentile method ranks low. Other variations, such as the Bias-Corrected and Accelerated method return meaningful results even when the distribution in question exhibits significantly deviation from normality. But the core approach remains the same.

As mentioned earlier, bootstrap confidence intervals can be obtained for just about any statistic of interest. In a regression setting, resampling with replacement from the residuals can be used to estimate the standard error of model coefficients. In a future post, we’ll demonstrate estimating the standard error for both standard linear and generalized linear model coefficients. Until then, happy coding!