(Note: All source files referenced in what follows can be obtained here)

ctypes is part of the Python
Standard Library which provides C compatible data types and allows calling
functions in shared libraries. It can be used to wrap libraries written in
compiled languages in pure Python.

In this post, we’ll demonstrate how to call functions written in C from Python
using `ctypes`

. We’ll call functions that take as input an array of normal
variates with specified mean and variance, and computes the normal CDF for
each input array element. The sample code also demonstrates how to link to the
C math library during compilation, from which we’ll leverage the error
function, \(erf\), which can be used to compute the normal CDF. Note that if
following along on Windows, it is assumed that all development targets
Python3, and a relatively recent version of gcc is available (either
distributed along with Cygwin or MinGW).

## Normal Cumulative Distribution Function

### Background

Our goal is to be able to call a C-implemented normal cumulative distribution function (CDF) from Python, which take as input a value \(x\), mean \(\mu\) and standard deviation \(\sigma\), and returns the corresponding CDF of \(x\) (which ranges from 0 to 1), and a second function that populates an array of evaluated normal quantiles for a given input array.

The standard C math library doesn’t expose functionality to calcuate the
normal CDF directly, however, it does expose both the error function \(erf\) and
the complementary error function, \(erfc\). \(erf\) is given by

and \(erfc\) is the complement of \(erf\):

Recall that the normal probability density function (PDF) is given by

Given the similiarity between \(erf\) and the normal PDF, it’s possible to use
\(erf\)/\(erfc\) to calculate any normal quantiles of interest.

Let \(\Phi(x)\) represent the normal CDF evaluated at \(x\). \(\Phi(x)\) can be
obtained using the following identity:

Note that when `x`

is large, \(erfc\) should be used in place of \(erf\)[1].

We’ll leverage \(erf\)/\(erfc\) from the standard C math library along with the
identities above to create a function that computes normal quantiles for a
given sequence of normal variates.

One final point: The identity relating the normal CDF and \(erf\)/\(erfc\) returns
quantiles from the standard normal distribution, which has zero mean and unit
variance (\(\mu=0\), \(\sigma^{2}=1\)). To transform a normal distribution with
arbitrary mean and variance into standard form, we use the familiar
transformation:

We can then evaluate \(z\) to obtain the corresponding normal quantile. We’ll
use this point in our implementation as well.

### C Implementation

The actual implementation of our normal CDF computator will consist of a
single `.c`

file *norm.c*. What follows are the contents of *norm.c*:

```
/*
Determine the Normal CDF of x given mean `mu`
and standard deviation `sigma`.
`x` => value for which the normal CDF will be determined
`mu` => mean of the corresponding normal distribution
`sigma` => standard deviation of the corresponding normal distribution
*/
#include <math.h>
double norm_cdf(double x, double mu, double sigma)
{
double cdf; // variable to hold result
double z; // transformation to standard normal
const double SQRT_2 = 1.4142135623730951;
z = (x - mu) / sigma;
// if x > 3, call erfc; otherwise call erf
if (x >= 3) {
cdf = .5 * erfc(- z / SQRT_2);
}
else {
cdf = .5 + .5 * erf(z / SQRT_2);
}
return(cdf);
}
/*
For a given array of arbitrary normal variates, calculate
the corresponding quantiles using `norm_cdf`.
`mu` => mean of the corresponding normal distribution
`sigma` => standard deviation of the corresponding normal distribution
`n` => the length of input_array/output_array
`input_array` => array of inputs (doubles)
`output_array` => array of computed normal CDFs
*/
void cdf_array(double mu, double sigma, int n,
double* input_array, double* output_array)
{
int i;
// For each element of input_array, call norm
for (i=0; i<n; i++) {
output_array[i] = norm_cdf(input_array[i], mu, sigma);
}
}
```

In the next section, we demonstrate how to compile *norm.c* into a shared
library which can then be called from Python via `ctypes`

.

### Compilation

(Note: Calls to gcc refer to the gcc distributed with Cygwin.)

Our ultimate goal is to create a shared library that contains the C functions
`norm_cdf`

and `cdf_array`

which can be accessed using `ctypes`

and called
from within a Python session. The first step is to compile *norm.c* into an
object file. For gcc, the `-c`

flag is used to compile a source file into an
object file:

```
$ gcc -Wall -fPIC -lm -c norm.c
```

We do not need to include the `-o`

flag to specify the name of the output file
in the command above, since compiling with `-c`

automatically creates an
object file with the same name as our source file, but with an `.o`

extension.

The `-lm`

flag precludes us from having to include the full path and name of
the C math library, */cygdrive/c/cygwin64/lib/libm.a*. Upon completion, the
object file *norm.o* will be output to the same directory in which *norm.c* resides.

In the next step we create the shared library. Linux shared libraries have
`.so`

extensions, and since we’re using Cygwin, our library will also have an
`.so`

extension. The following command creates *norm.so* in the same directory
as *norm.c* and *norm.o*:

```
$ gcc -shared -o norm.so norm.o
```

A brief aside: If we had a collection of files that we wanted to compile into
a single shared library, we’d list them on the command line one after the
other. If instead of *norm.o* we had *norm1.o*, *norm2.o* and *norm3.o*, the
command would become:

```
$ gcc -shared -o norm.so norm1.o norm2.o norm3.o
```

This would also produce a single *norm.so* file in the working directory, same
as our original command specification referencing a single object file.

In the final step, we’ll leverage ctypes to make our C-compiled functions
`norm_cdf`

and `cdf_array`

available within Python.

### Calling C Functions from Python

Prior to calling our C functions from Python, we need to specify the parameter
and return types of `norm_cdf`

and `cdf_array`

. In addition, we need to coerce
any Python datatypes that are passed to the library functions into
C-compatible datatypes. This is demonstrated below, with each section
commented to make it easier to follow along. This is a Python file, which
we’ll identify as *norm_main.py*:

```
#!/usr/bin/env python
"""
norm_main.py
Calls 2 functions from the compiled C library `norm.so`.
The function prototypes are:
[+] double norm(double x, double mu, double sigma)
[+] void cdf_array(double mu, double sigma, int n,
double* input_array, double* output_array)
"""
import ctypes
import numpy as np
from scipy.stats import norm
# Suppress scientific notation.
np.set_printoptions(suppress=True)
# Provide full path to shared library.
LIB_PATH = "norm.so"
# Bind reference to shared library `norm.so`.
normlib = ctypes.cdll.LoadLibrary(LIB_PATH)
# Specify argument datatypes for norm_cdf and cdf_array.
normlib.norm_cdf.argtypes = [
ctypes.c_double, ctypes.c_double, ctypes.c_double
]
normlib.cdf_array.argtypes = [
ctypes.c_double, ctypes.c_double, ctypes.c_int,
ctypes.POINTER(ctypes.c_double), ctypes.POINTER(ctypes.c_double)
]
# Specify return datatypes for norm_cdf and cdf_array (cdf_array declared as void).
normlib.norm_cdf.restype = ctypes.c_double
normlib.cdf_array.restype = None
# Use scipy.stats to generate 10 standard normal random variates. This will
# be `input_arr`. We also initialize `output_arr` to all zeros, and set the
# random seed in numpy for reproducibility.
np.random.seed(516)
mu, sigma, n = 0., 1., 10
input_arr = norm.rvs(loc=mu, scale=sigma, size=n)
output_arr = np.zeros(n, np.float_)
# Initialize ctypes-compatible versions of mu, sigma, n, input_arr and output_arr.
ct_mu = ctypes.c_double(mu)
ct_sigma = ctypes.c_double(sigma)
ct_n = ctypes.c_int(n)
ct_input_arr = np.ctypeslib.as_ctypes(input_arr)
ct_output_arr = np.ctypeslib.as_ctypes(output_arr)
print("\nNormal variates w/ mean {} and standard deviation {}:\n".format(mu, sigma))
print(input_arr)
print("\nOutput_arr before passing to cdf_array:\n\t")
print(output_arr)
# Call `normlib.cdf_array` from C library.
normlib.cdf_array(ct_mu, ct_sigma, ct_n, ct_input_arr, ct_output_arr)
print("\nOutput_arr after passing to cdf_array:\n\t")
print(output_arr)
# Compare results returned by cdf_array to scipy's norm.cdf.
spcdfs = norm.cdf(input_arr, loc=mu, scale=sigma)
print("\nscipy-evaluated CDFs:\n\t")
print(spcdfs)
```

To summarize, we read in `norm.so`

, specify the parameter and return datatypes
for the library functions, then call `cdf_array`

. In the last few lines, we
compare the output of `cdf_array`

with `norm.cdf`

from `scipy.stats`

, and find
the results to be identical.

Note that we are not copying data, but simply passing the pointers to the data
from Python to C. In C, the data pointed to is operated on, which means we do
not need to pass any data back[3]. This explains why `cdf_array`

‘s return type
is specified as `void`

.

Also note that calculating normal CDFs for a sequence of normal variates can
be accomplished much more efficiently using the `scipy.stats`

API. This
particular example was chosen to demonstrate non-trival ctypes extensibility,
but the example itself should be considered a discourse on the method, and not
necessarily a preferred or optimal approach for computing normal CDFs.

The following terminal capture verifies that CDFs calculated with `cdf_array`

and scipy are the same:

Finally, we compare CDFs for normal variates generated from a non-standard normal distribution. The only change we need to make in

*norm_main.py*is to update

`mu`

and `sigma`

. Setting `mu=2.5`

and `sigma=5`

yields:

Again verifying that the calcuated CDFs are identical.

## Conclusion

`ctypes`

exposes a great deal of functionality, and it’s also used in other
standard library modules (for example, `multiprocessing`

). Having a good
understanding of how `ctypes`

works will improve your understanding of Python,
and knowing how to natively interface with foreign function libraries in your
language of choice is always a plus. Until next time, happy coding!

### Footnotes:

[1]. https://www.ibm.com/support/knowledgecenter/ssw_ibm_i_72/rtref/sc415607.pdf, pg. 2:90-91 [2]. https://tfetimes.com/wp-content/uploads/2015/09/An_Introduction_to_GCC-Brian_Gough.pdf, pg. 11 [3]. http://www.ifnamemain.com/posts/2013/Dec/10/c_structs_python/