Skip main navigation

Avoiding function call overheads

Function calls in Python are relatively expensive compared to some other languages. In this article we show how Cython can reduce these overheads.

Function calls in Python are relatively expensive in comparison to for example C. Cython allows one to reduce this overhead significantly.

Main reason for the large function call overhead in Python is the “boxing” and “unboxing” of function call arguments and return values. Once again, dynamic typing makes programming more flexible but with a cost in performance.

One can investigate the function call overhead in Python with the following example. Let’s make a simple code that does not perform much else than calls a function in a loop:

def inner(i):
return i+1

def outer_1(n):
x = 0
for i in range(n):
x = inner(x)

We can create also a version which does not have the function call:

def outer_2(n):
x = 0
for i in range(n):
x = x + 1

If one measures the performance of the two outer functions, one observes
that the one with the function call in the loop is typically 3-4 times slower.

Using pure C functions

If a function is used only within a Cython module, one can get rid of a large
part of Python’s function call overhead by declaring the function as a pure C
function, once again using the cdef keyword:

cdef int add (int x, int y):
cdef int result
result = x + y
return result

When a function is defined as a pure C function, it can be called only from
the corresponding Cython module, but not from a Python code. If a function is
called both from Cython and Python, Cython can generate an additional Python
wrapper by declaring the function with cpdef instead of cdef:

cpdef int add (int x, int y):
cdef int result
result = x + y
return result

This adds some overhead, so if the function is not called from Python it is
better to use just cdef.

Mandelbrot kernel as pure C function

The kernel function is called only within the mandelbrot.pyx module,
so we can make it a pure C function:

cdef int kernel(zr, zi, cr, ci, lim, cutoff):
''' Computes the number of iterations `n` such that
|z_n| > `lim`, where `z_n = z_{n-1}**2 + c`.
'''

count = 0
while ((zr*zr + zi*zi) < (lim*lim)) and count < cutoff:
zr, zi = zr * zr - zi * zi + cr, 2 * zr * zi + ci
count += 1
return count

When continuing with the performance investigation, we obtain the following results:

  • Pure Python: 0.57 s
  • Static type declarations in the kernel: 14 ms
  • Kernel as C-function: 9.6 ms

In this case the speed-up is not so drastic, but still a respectable 1.5.

© CC-BY-NC-SA 4.0 by CSC - IT Center for Science Ltd.
This article is from the free online

Python in High Performance Computing

Created by
FutureLearn - Learning For Life

Reach your personal and professional goals

Unlock access to hundreds of expert online courses and degrees from top universities and educators to gain accredited qualifications and professional CV-building certificates.

Join over 18 million learners to launch, switch or build upon your career, all at your own pace, across a wide range of topic areas.

Start Learning now