Minimalistic and maintainable Python extensions

The last 10 year have show that Python is versatile enough and fast enough for general purpose computations. An exception to this may be solving problems that require to juggle a very large number of intricately connected objects, e.g. when implementing a CAD program or a rendering engine and which are better done using a language with more direct access to the computer’s hardware. In Python, data-heavy computations can often be accomplished efficiently with numpy and other specialized libraries. Occansionally however one runs into a specific problem that can’t be solved efficiently with the availably libraries or where adding a extra dependency is an unfavorable option (e.g. for licensing, security, or maintenance issues). In that case, builing a C extension module is often a good solution to optimize the slow code.

Python provides are large number of mechanism for interfacing C code with Python code: ctypes, cython, cffi for C, boost::python and its successors for C++. Differences between the approaches are discussed in a recent C’t article series. Here I want to focus on the security and maintability aspects that affect the choice of extension mechanism.

_config.yml

In any case, adding an extension module is a break in the development environment. Current and future developers need not only be familiar with the project’s main language (Python) but also with the language of the extention (C) and the inter-language interface. In a corporate setting, all members of the development team might have been replaced over the years and it’s unclear if a given inter-language interface will remain core knowledge of future generations of programmers or will become rather arcane.

This leads to some problems in the maintenance of the extension module. While very simply extensions can easily reach a bug-free state and don’t require any maintenance of the extension code, the enviroment such as the Python version, numpy version (if used) and compiler version will change more ofter over the history of the project. Security issues with widepread libraries such as the recently witnessed zlib bug affect the Python interpreter and other central components and may require unanticipated updates. (If I’m allowed to speculate: 20 year after Java has become mainstream, we’re sill flooded with critical security issues in mainstream Java application. If the same is bound to happen for Python, we can expect a similar flood that will last the next 10 years.)

To address the problem of maintaining extension modules I suggest two possbile approaches, that I describe in more detail below:

  • compiling automatically and often
  • compiling as rarely as possible with the absolute least amount of dependencies

Compiling automatically and often

This means to build the extension module in a build pipeline and to automate all the required steps. This option is particularly appealing if the software product can be built using a single pipeline (as opposed to building independent packages which require their own versioning and are combined into the final product by a downstream build pipeline).

I suggest using Cython for builing extension modules that are compiled as part of the build pipeline. The Cython setup contains an extra section in the setup.py file which instructs setuptools to automatically build the extension.

Even though it’s possible to use less files, I recommend to split the implementation, the low-level glue part of the binding and the high-level part of the binding into three files as follows:

The C file contains the implementation of the extension. For sake of simplicity our example function xxx_c copies its input as given by the float array data into the results array out. The array work serves as temporary work memory that is allocated outside of xxx_c in the spirit of FORTRAN 77. For simple extensions, I think that is the preferable route. Calling malloc on the C side is possible but I wouldn’t recommend it without a thorough understanding about the malloc implementation on the target platform and python’s memory management.

file xxx_c.c

#include <stddef.h>
int xxx_c(const *float restrict const data, 
          const *float restrict work, 
          const *float restrict out, 
          size_t len) 
{
    for(size_t i=0; i<len; i++) {
       work[i] = data[i]; 
    }
    for(size_t i=0; i<len; i++) {
       out[i] = work[i];
    }
    return 0;
}

A corresponsing header file is required to satisfy Cython.

file xxx_c.h

#ifndef XXX_C_H
#define XXX_C_H
int xxx_c(const *float restrict const data,
          const *float restrict work,
          const *float restrict out,
          size_t len);
#endif

The Cython pyx file contains the low-level (and possibly some high-level) glue code between C and Python. It starts with a repitition of the C prototype from the header. Cython doen’t seem to support the restrict keyword, that’s why we omit it here (we have to recall it though to make no mistake in the memory allocation parts that come later). The funny [::1] brackets are specific to Cython and instruct Cython to accept only 1-D, memory-contiguous arrays. In more detail, they ask to Cython to accept data structure that expose their data via the buffer protocal interface. An example for such data strcutures are numpy arrays, but other implementations exists. The reason why I perfer to use the buffer protocol over Cython’s more direct numpy interface is that some types of build errors are ruled out by not making numpy a build-time requirment at al but only a run-time requirement.

file xxx_cython.pyx

from extern xxx_c.c import:
    int xxx_c(const *float const data, 
              const *float work,
              const *float out,
              size_t len)

# keep this as minimalistic as possible
def xxx_cython_warpper(float data[::1] not None, 
                       float out[::1] not None,
                       float work[::-1] not None):
    length = len(data)
    if len(out) != length:
        raise IndexError("Length of data and out arrays has to match.")
    if len(work) != length:
        raise IndexError("Length of data and work arrays has to match.")
    if length == 0:
        raise RuntimeError("data array has to have at least one element.")
        # return  # returning here and doing nothing could also be a meaningful implementation
    result = xxx_c(&data[0], &work[0], &out[0], length)
    if result != 0:
        raise RuntimeError("An error occured inside the extension module xxx.")

I like to put the public interface and the high-level parts of the glue code in the python file xxx.py. The high-level glue code in this case handles the memory allocation via numpy’s allocator.

file xxx.py

from .xxx_cython import xxx_cython_wrapper 
def xxx(data: np.ndarray) -> np.ndarray:
    """Computes the xxxed version of the given data array
    
        Parameters
        ----------
        data: np.ndarray((T,), dtype=float)
            the input data to xxx

        Returns
        -------
        np.ndarray((T,), dtype=float)
            the xxxed data
    """
    data = np.ascontiguous(data, )
    length = len(data)
    out = np.empty((length,), dtype=float)
    work = np.empty((length,), dtype=float)
    xxx_cython_wrapper(data, out, work)
    return out

The setup.py file allows to define the C extension, including compilation options and link options that are necessary for build. Compilation happens automatically during pip install . Under the Windows operating system, the Microsoft visual C compilers need to be installed. Integration of visual C with setuptools can be tricky sometimes, that’s why I recommend to do the build on a dedicated build server and not on you unsuspecting team member’s computer (or worse yet on the customer’s computer). The alternative clang compiler works under the Windows operating system but isn’t yet integrated with setuptools. Development is underway though (as of spring 2022).

file setup.py

from setuptools import Extension

No software is complete without a good unit test:

file test_xxx.py

import numpy as np
from mypackage.xxx import xxx

def test_xxx():
    test_data = np.random.randn(100)
    np.testing.assert_allclose(xxx(test_data), test_data)

def test_xxx_empty():
    with np.testing.assert_raises():
        # passing a list is a slight misuse of the public interface
        xxx([])

One more note about the advantage of the buffer protocol over the numpy-C interface. In recent times, conda build seem to get replaced by the more conveniant conda builds. Whereas conda provided an automated way to match python packages to numpy versions, this isn’t true for pip packages (or only in a quite cumbersome way). The current dominance of pip over conda can possibly be explained by the fact that building pip packages in dockerized pipelines and posting them to cororate package repositories is easier than building and posting conda packages.

Build as rarely as possibly with the absolute least amount of dependencies

In the Cython-recipe above, the build process produced a dynamic library (.pyd file which is a stealth .dll file under the Windows operating system or a .so file under the Linux and Mac OS operating systems). This raises the question if it isn’t possible to build such a dynamic library directly without cython. The answer is yes and interfacing such dynamic libraries can be done using the built-in python package ctypes.

I find this approach especially appealing, the product can’t be build using a single pipeline and if the extension module is distributed independently so that there is a risk that it becomes unmaintained. A similar situation may occur if developing for the Windows operating system and the extension module is the only piece of C code in an otherwise Python-only project. To mitigate the effects of an unmaintained extension, I would recommend to keep the module so simple that it can be developed to total completion and to abolutely minimize its dependencies. While Cython support for the stable Python API isn’t yet complete and while Cython and the stable Python API weren’t yet proven to be bug-free, I recommend to compile the extension code without bindings to python and to place all the glue code at the python side.

To demonstrate the build process, we replicate the C file from above:

file xxx_c.c

#include <stddef.h>
int xxx_c(const *float restrict const data, 
          const *float restrict work,
          const *float restrict out,
          size_t len) 
{
    for(size_t i=0; i<len; i++) {
       work[i] = data[i]; 
    }
    for(size_t i=0; i<len; i++) {
       out[i] = work[i];
    }
    return 0;
}

The compilation step is no longer defined in the setup.py file (though with some ugly hack it could be added as an extention to setup.py, which I don’t recommend however). Instead I recommend to compile the code as step of the build pipline or build script with the following command:

file compile.sh

clang -O3 -Wall -shared -undefined dynamic_lookup -nostdlib -o xxx_c.so xxx_c.c
cl.exe /D_USRDLL /D_WINDLL xxx_c.c /NODEFAULTLIB /MT /link /DLL /OUT:xxx_c.dll

The glue code can be placed in a python file that also contains the public python interface of the extension. The glue code in this case consisits of

file xxx.py

import ctypes
import typing
import sys
import pathlib
import numpy as np

here = pathlib.Path(__file__).absolute()
if sys.platform.startswith('win'):
    dynamic_lib_ext = '.dll'
else:
    dynamic_lib_ext = '.so'
xxx_lib = ctypes.CDLL(str(here.with_name('xxx_c').with_suffix(dynamic_lib_ext)))
xxx_lib.xxx_c.argtypes = (ctypes.POINTER(ctypes.c_float),
                          ctypes.POINTER(ctypes.c_float),
                          ctypes.POINTER(ctypes.c_float),
                          ctypes.c_size_t,)


def xxx(data: np.ndarray) -> np.ndarray:
    c_float_p = ctypes.POINTER(ctypes.c_float)
    data = np.ascontiguousarray(data, dtype=np.float32)
    length = len(sample_times)
    out = np.empty((length,), dtype=np.float32)
    work = np.empty((length,), dtype=np.float32)
    result = xxx_lib.xxx_c(data.ctypes.data_as(c_float_p), 
                           out.ctypes.data_as(c_float_p),
                           work.ctypes.data_as(c_float_p),
                           ctypes.c_size_t(length))
    if int(result) != 0:
        raise RuntimeError("An error occured inside the extension module xxx.")
    else:
        return out

file setup.py

if sys.platform.startswith('win'):
    dynamic_lib_ext = '.dll'
else:
    dynamic_lib_ext = '.so'
data_files=['xxx_c' + dynamic_lib_ext], 
Written on April 21, 2022