A while back I began working on a project tasked with producing large loss expectations and ranges for our insured lines of business. The targeted expectations of interest were counts (i.e., how many claims will a line of business incur in excess 1M USD in the next 12 months) and aggregate losses (the sum of the loss dollars associated with the large count projection). For modeling large count expectations, we would employ one of the following distributions based on the data’s variance to mean ratio:

- If \(Var[N]/E[N] \gt 1\), model large counts using the Negative Binomial Distribution
- If \(Var[N]/E[N] \lt 1\), model large counts using the Binomial Distribution
- If \(Var[N]/E[N] = 1\), model large counts using the Poisson Distribution

Once the details of the modeling approach were agreed upon, the next step was to create a prototype in Excel prior to developing a more robust implementation intended for production. It was at this point that we learned Excel does not have a function to compute the cumulative distribution function for the negative binomial distribution. Digging deeper, we found that the negative binomial CDF can be obtained using the regularized incomplete beta function. At this time, I was familiar with the beta function, but not the incomplete beta function, or what regualrized meant in this context.
This post is an attempt to highlight variants of the beta function, and reinforce that understanding by implementing a fuctiion that reproduces CDF output from scipy.stats.nbinom for a given parameterization.

(Note: The Wikipedia page for the beta function contains a section titled Software implementation that contains an example of how calculate the complete beta value in VBA using Excel’s builtin GammaLn).


The Gamma function

No discussion of the Beta function would be complete without first introducing the gamma function. The gamma function is an extension of the factorial function, with its argument shifted down by 1, to real and complex numbers:

$$ \Gamma(x) = \int_{0}^{\infty} t^{x-1} e^{-t} dt $$


If \(n\) is a positive integer, the gamma function reduces to:

$$ \Gamma(x) = (n-1)! $$


We make a point to mention the gamma function since the Beta function can be represented in terms of it, which we demonstrate next.

The Beta function

The beta function is given by:

$$ B(a, b) = \int_{0}^{1} t^{a-1} (1-t)^{b-1} dt, $$


for real \((a,b) \gt 0\).
We can represent the beta function in terms of the gamma function as follows:

$$ B(a, b) = \frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)} $$


When \(a\) and \(b\) are positive integers, the expression becomes:

$$ B(a, b) = \frac{(a-1)!(b-1)!}{(a+b-1)!} $$


Notice that in the first representation of the beta function the limits of integration were \((0,1)\). The incomplete beta function is a generalization of the beta function which allows the upper limit of integration to take on values in the range \([0,1]\). Symbolically, the incomplete beta function can be represented as:

$$ B(x; a, b) = \int_{0}^{x} t^{a-1} (1-t)^{b-1} dt $$


When \(x=1\), the incomplete beta function and the beta function are identical. Put another way, the beta function is the incomplete beta function evaluated at \(x=1\).
Having described the beta function and the incomplete beta function, we are now in a position to introduce the regularized incomplete beta function, also referred to as the regularized beta function. It is defined as the ratio of the incomplete beta function to the beta function, evaluated at \(x\):

$$ I_{x}(a,b) = \frac{B(x; a,b)}{B(a,b)} $$


The regularized beta function is the cumulative distribution function for the beta distribution, and as already mentioned, can be used to calculate the CDF for both the negative binomial and binomial distributions.
For a binomial random variable \(X\), to determine the probability of \(k\) or fewer successes in \(n\) independent trials, \(k \leq n\), the CDF is given by:

$$ F(x) = \sum_{i=0}^{n} \binom{n}{i} p^{i} (1-p)^{n-i} = I_{1-p}(n-k,1+k) $$


To demonstrate, assume \(p=.25\), \(n=5\) and we’re interested in determining the probability of 3 or fewer successes. Expansion of the binomial summation results in:

$$ P[X \leq 3] = \binom{5}{0} .25^{0} .75^{5} + \binom{5}{1} .25^{1} .75^{4} + \binom{5}{2} .25^{2} .75^{3} + \binom{5}{3} .25^{3} .75^{2} = 0.984375 $$


Alternatively, calculation with the regularized incomplete beta function yields:

$$ P[X \leq 3] = I_{.75}(2, 4) = 0.984375. $$


For the negative binomial distribution, assuming the common \(r,k\) parameterization in which the PDF is given by

$$ P(X=k) = \binom{k+r-1}{k} p^{k} (1-p)^{r} \quad for k = 0, 1, 2, \cdots $$


the expression for the negative binomial CDF can be reduced to:

$$ P(X \leq k) = 1 - I_{p}(k+1, r) $$


Implementation

We now have all the necessary pieces to implement a function to compute the CDF of the negative binomial distribution using functions available in scipy.special. We will compare our results to scipy.stats.nbinom.cdf for a set of inputs to assess the correctness of the implementation.

"""
Calculation of negative binomial CDF using regularized incomplete 
beta function.
"""
from scipy.special import gamma, beta, betainc
from scipy.stats import nbinom
import numpy as np

# The call signature for nbinom.cdf is:
# nbinom.cdf(<nbr_failures>, <nbr_successes>, <prob_success>)

nbr_successes = 10 
nbr_failures  = 10 
prob_success  = .5 

# Vectorize nbinom.pmf.
nb_pdfs = np.vectorize(nbinom.pmf)

# Determine sum of nbinom PDF from 0-10.
sum_pdfs = nbinom_pdf(np.arange(nbr_failures+1),nbr_successes,prob_success).sum() # 0.58809852600097756

# Determine nbinom CDF using nbinom.cdf.
actual_cdf = nbinom.cdf(nbr_failures,nbr_successes,prob_success) # 0.58809852600097678


def nb_cdf(k, n,  p)
    """
    Implementation of negative binomial CDF using the 
    regularized incomplete beta function.
    """
    I = betainc(n+1, k, p)
    return(1-I)


nb_cdf(nbr_failures, nbr_successes, prob_success)
# returns: 0.58809852600097678


As expected, we find that all three approaches arrive at the same negative binomial CDF for a given set of inputs.