Want to keep learning?

This content is taken from the Partnership for Advanced Computing in Europe (PRACE)'s online course, Python in High Performance Computing. Join the course to learn more.

Parallel programming with Python

Python has a rich ecosystem also for parallel computing, both standard library and third party packages provide tools for different parallel programming approaches.

In this course we focus on the message passing approach (with the mpi4py package), that is normally the most appropriate solution for tightly coupled parallel problems. In this article we review briefly some other parallel programming packages available for Python, which can be useful if the parallel problem is not communication intensive.

Python standard library contains some modules that can be used for parallel programming for a single shared memory computer, i.e. they are not suitable for distributed computing. They are also mainly meant for launching concurrent tasks with no (or moderate) dependencies and communication needs between the tasks. The threading module provides a thread based approach, whereas multiprocessing is based on child processes. There is also a higher level concurrent.futures module which can utilize either threads or processes. Using multiple processes has a higher memory overhead, however, in the standard CPython interpreter there is something called the Global Interpreter Lock, which allows only one thread to execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). Even though CPU-intensive tasks cannot execute in parallel with threading, it can still be an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

For parallel data analytics that needs to scale over multiple computers, one can utilize Dask or PySpark third party packages. Dask is a Python library providing advanced parallelism with easy to use interfaces (e.g. NumPy-like parallel Dask array). PySpark is a Python interface to Apache Spark, a general distributed cluster-computing framework for big data processing.

We will now move on to learn message passing with Python in more detail, but please comment if you have experiences about these other approaches!

Share this article:

This article is from the free online course:

Python in High Performance Computing

Partnership for Advanced Computing in Europe (PRACE)