(Note: All source files referenced in what follows can be obtained here)

ctypes is part of the Python Standard Library which provides C compatible data types and allows calling functions in shared libraries. It can be used to wrap libraries written in compiled languages in pure Python.

In this post, we’ll demonstrate how to call functions written in C from Python using ctypes. We’ll call functions that take as input an array of normal variates with specified mean and variance, and computes the normal CDF for each input array element. The sample code also demonstrates how to link to the C math library during compilation, from which we’ll leverage the error function, \(erf\), which can be used to compute the normal CDF. Note that if following along on Windows, it is assumed that all development targets Python3, and a relatively recent version of gcc is available (either distributed along with Cygwin or MinGW).


Normal Cumulative Distribution Function

Background

Our goal is to be able to call a C-implemented normal cumulative distribution function (CDF) from Python, which take as input a value \(x\), mean \(\mu\) and standard deviation \(\sigma\), and returns the corresponding CDF of \(x\) (which ranges from 0 to 1), and a second function that populates an array of evaluated normal quantiles for a given input array.

The standard C math library doesn’t expose functionality to calcuate the normal CDF directly, however, it does expose both the error function \(erf\) and the complementary error function, \(erfc\). \(erf\) is given by

$$ erf(x) = \frac{2}{\sqrt{\pi}}\int_{0}^{x} e^{-t^{2}}dt, $$


and \(erfc\) is the complement of \(erf\):

$$ erfc(x) = 1 - erf(x). $$




Recall that the normal probability density function (PDF) is given by


$$ f(x| \mu,\sigma) = \frac{1}{\sqrt{2 \pi \sigma^{2}}} e^{-\frac{(x-\mu)^{2}}{2\sigma^{2}}}. $$



Given the similiarity between \(erf\) and the normal PDF, it’s possible to use \(erf\)/\(erfc\) to calculate any normal quantiles of interest.

Let \(\Phi(x)\) represent the normal CDF evaluated at \(x\). \(\Phi(x)\) can be obtained using the following identity:


$$ \begin{align*} \Phi(x)&=\frac{1}{\sqrt{2 \pi}}\int_{-\infty}^{x} e^{\frac{-t^{2}}{2}}dt \\ &=\frac{1}{2}\Big[1 + erf \Big(\frac{x}{\sqrt{2}}\Big)\Big] \\ &=\frac{1}{2}\Big[erfc\Big(-\frac{x}{\sqrt{2}}\Big)\Big]. \end{align*} $$


Note that when x is large, \(erfc\) should be used in place of \(erf\)[1].

We’ll leverage \(erf\)/\(erfc\) from the standard C math library along with the identities above to create a function that computes normal quantiles for a given sequence of normal variates.

One final point: The identity relating the normal CDF and \(erf\)/\(erfc\) returns quantiles from the standard normal distribution, which has zero mean and unit variance (\(\mu=0\), \(\sigma^{2}=1\)). To transform a normal distribution with arbitrary mean and variance into standard form, we use the familiar transformation:

$$ z = \frac{x - \mu}{\sigma}. $$


We can then evaluate \(z\) to obtain the corresponding normal quantile. We’ll use this point in our implementation as well.

C Implementation

The actual implementation of our normal CDF computator will consist of a single .c file norm.c. What follows are the contents of norm.c:

/*
Determine the Normal CDF of x given mean `mu`
and standard deviation `sigma`.

    `x`     => value for which the normal CDF will be determined
    `mu`    => mean of the corresponding normal distribution
    `sigma` => standard deviation of the corresponding normal distribution
*/
#include <math.h>


double norm_cdf(double x, double mu, double sigma)
{
    double cdf;  // variable to hold result
    double z;    // transformation to standard normal
    const double SQRT_2 = 1.4142135623730951;
    z = (x - mu) / sigma;

    // if x > 3, call erfc; otherwise call erf 
    if (x >= 3) {
        cdf = .5 * erfc(- z / SQRT_2);
    }
    else {
        cdf = .5 + .5 * erf(z / SQRT_2);    
    }
    return(cdf);
}


/*
For a given array of arbitrary normal variates, calculate 
the corresponding quantiles using `norm_cdf`.

    `mu`           => mean of the corresponding normal distribution
    `sigma`        => standard deviation of the corresponding normal distribution
    `n`            => the length of input_array/output_array
    `input_array`  => array of inputs (doubles)
    `output_array` => array of computed normal CDFs
*/
void cdf_array(double mu, double sigma, int n, 
                 double* input_array, double* output_array)
{
    int i;

    // For each element of input_array, call norm
    for (i=0; i<n; i++) {

        output_array[i] = norm_cdf(input_array[i], mu, sigma);

    }    
}


In the next section, we demonstrate how to compile norm.c into a shared library which can then be called from Python via ctypes.

Compilation

(Note: Calls to gcc refer to the gcc distributed with Cygwin.)

Our ultimate goal is to create a shared library that contains the C functions norm_cdf and cdf_array which can be accessed using ctypes and called from within a Python session. The first step is to compile norm.c into an object file. For gcc, the -c flag is used to compile a source file into an object file:

$ gcc -Wall -fPIC -lm -c norm.c    


We do not need to include the -o flag to specify the name of the output file in the command above, since compiling with -c automatically creates an object file with the same name as our source file, but with an .o extension.

The -lm flag precludes us from having to include the full path and name of the C math library, /cygdrive/c/cygwin64/lib/libm.a. Upon completion, the object file norm.o will be output to the same directory in which norm.c resides.

In the next step we create the shared library. Linux shared libraries have .so extensions, and since we’re using Cygwin, our library will also have an .so extension. The following command creates norm.so in the same directory as norm.c and norm.o:

$ gcc -shared -o norm.so norm.o  


A brief aside: If we had a collection of files that we wanted to compile into a single shared library, we’d list them on the command line one after the other. If instead of norm.o we had norm1.o, norm2.o and norm3.o, the command would become:

$ gcc -shared -o norm.so norm1.o norm2.o norm3.o


This would also produce a single norm.so file in the working directory, same as our original command specification referencing a single object file.

In the final step, we’ll leverage ctypes to make our C-compiled functions norm_cdf and cdf_array available within Python.

Calling C Functions from Python

Prior to calling our C functions from Python, we need to specify the parameter and return types of norm_cdf and cdf_array. In addition, we need to coerce any Python datatypes that are passed to the library functions into C-compatible datatypes. This is demonstrated below, with each section commented to make it easier to follow along. This is a Python file, which we’ll identify as norm_main.py:

#!/usr/bin/env python
"""
norm_main.py

Calls 2 functions from the compiled C library `norm.so`.
The function prototypes are:

    [+] double norm(double x, double mu, double sigma)

    [+] void cdf_array(double mu, double sigma, int n,
                     double* input_array, double* output_array)

"""
import ctypes
import numpy as np
from scipy.stats import norm

# Suppress scientific notation.
np.set_printoptions(suppress=True)


# Provide full path to shared library.
LIB_PATH = "norm.so"


# Bind reference to shared library `norm.so`.
normlib = ctypes.cdll.LoadLibrary(LIB_PATH)


# Specify argument datatypes for norm_cdf and cdf_array.
normlib.norm_cdf.argtypes  = [
    ctypes.c_double, ctypes.c_double, ctypes.c_double
    ]
normlib.cdf_array.argtypes = [
    ctypes.c_double, ctypes.c_double, ctypes.c_int,
    ctypes.POINTER(ctypes.c_double), ctypes.POINTER(ctypes.c_double)
    ]


# Specify return datatypes for norm_cdf and cdf_array (cdf_array declared as void).
normlib.norm_cdf.restype  = ctypes.c_double
normlib.cdf_array.restype = None


# Use scipy.stats to generate 10 standard normal random variates. This will 
# be `input_arr`. We also initialize `output_arr` to all zeros, and set the 
# random seed in numpy for reproducibility.
np.random.seed(516)
mu, sigma, n = 0., 1., 10
input_arr    = norm.rvs(loc=mu, scale=sigma, size=n)
output_arr   = np.zeros(n, np.float_)


# Initialize ctypes-compatible versions of mu, sigma, n, input_arr and output_arr.
ct_mu         = ctypes.c_double(mu)
ct_sigma      = ctypes.c_double(sigma)
ct_n          = ctypes.c_int(n)
ct_input_arr  = np.ctypeslib.as_ctypes(input_arr)
ct_output_arr = np.ctypeslib.as_ctypes(output_arr)


print("\nNormal variates w/ mean {} and standard deviation {}:\n".format(mu, sigma))
print(input_arr)


print("\nOutput_arr before passing to cdf_array:\n\t")
print(output_arr)

# Call `normlib.cdf_array` from C library.
normlib.cdf_array(ct_mu, ct_sigma, ct_n, ct_input_arr, ct_output_arr)


print("\nOutput_arr after passing to cdf_array:\n\t")
print(output_arr)


# Compare results returned by cdf_array to scipy's norm.cdf.
spcdfs = norm.cdf(input_arr, loc=mu, scale=sigma)
print("\nscipy-evaluated CDFs:\n\t")
print(spcdfs)


To summarize, we read in norm.so, specify the parameter and return datatypes for the library functions, then call cdf_array. In the last few lines, we compare the output of cdf_array with norm.cdf from scipy.stats, and find the results to be identical.

Note that we are not copying data, but simply passing the pointers to the data from Python to C. In C, the data pointed to is operated on, which means we do not need to pass any data back[3]. This explains why cdf_arrays return type is specified as void.

Also note that calculating normal CDFs for a sequence of normal variates can be accomplished much more efficiently using the scipy.stats API. This particular example was chosen to demonstrate non-trival ctypes extensibility, but the example itself should be considered a discourse on the method, and not necessarily a preferred or optimal approach for computing normal CDFs.

The following terminal capture verifies that CDFs calculated with cdf_array and scipy are the same:

norm_main1



Finally, we compare CDFs for normal variates generated from a non-standard normal distribution. The only change we need to make in norm_main.py is to update mu and sigma. Setting mu=2.5 and sigma=5 yields:

norm_main2



Again verifying that the calcuated CDFs are identical.

Conclusion

ctypes exposes a great deal of functionality, and it’s also used in other standard library modules (for example, multiprocessing). Having a good understanding of how ctypes works will improve your understanding of Python, and knowing how to natively interface with foreign function libraries in your language of choice is always a plus. Until next time, happy coding!

Footnotes:

[1]. https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_72/rtref/sc415607.pdf, pg. 2:90-91 [2]. https://tfetimes.com/wp-content/uploads/2015/09/An_Introduction_to_GCC-Brian_Gough.pdf, pg. 11 [3]. http://www.ifnamemain.com/posts/2013/Dec/10/c_structs_python/