£199.99 £139.99 for one year of Unlimited learning. Offer ends on 14 November 2022 at 23:59 (UTC). T&Cs apply

Find out more
Parallel programming with Python
Skip main navigation

Parallel programming with Python

In this article we give an overview of parallel programming approaches in the Python ecosystem.
© CC-BY-NC-SA 4.0 by CSC - IT Center for Science

Python has a rich ecosystem also for parallel computing, both standard library and third party packages provide tools for different parallel programming approaches.

In this course we focus on the message passing approach (with the mpi4py package), that is normally the most appropriate solution for tightly coupled parallel problems. In this article we review briefly some other parallel programming packages available for Python, which can be useful if the parallel problem is not communication intensive.

Python standard library contains some modules that can be used for parallel programming for a single shared memory computer, i.e. they are not suitable for distributed computing. They are also mainly meant for launching concurrent tasks with no (or moderate) dependencies and communication needs between the tasks. The threading module provides a thread based approach, whereas multiprocessing is based on child processes. There is also a higher level concurrent.futures module which can utilize either threads or processes. Using multiple processes has a higher memory overhead, however, in the standard CPython interpreter there is something called the Global Interpreter Lock, which allows only one thread to execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). Even though CPU-intensive tasks cannot execute in parallel with threading, it can still be an appropriate model if you want to run multiple I/O-bound tasks simultaneously.

For parallel data analytics that needs to scale over multiple computers, one can utilize Dask or PySpark third party packages. Dask is a Python library providing advanced parallelism with easy to use interfaces (e.g. NumPy-like parallel Dask array). PySpark is a Python interface to Apache Spark, a general distributed cluster-computing framework for big data processing.

We will now move on to learn message passing with Python in more detail, but please comment if you have experiences about these other approaches!

© CC-BY-NC-SA 4.0 by CSC - IT Center for Science
This article is from the free online

Python in High Performance Computing

Created by
FutureLearn - Learning For Life

Our purpose is to transform access to education.

We offer a diverse selection of courses from leading universities and cultural institutions from around the world. These are delivered one step at a time, and are accessible on mobile, tablet and desktop, so you can fit learning around your life.

We believe learning should be an enjoyable, social experience, so our courses offer the opportunity to discuss what you’re learning with others as you go, helping you make fresh discoveries and form new ideas.
You can unlock new opportunities with unlimited access to hundreds of online short courses for a year by subscribing to our Unlimited package. Build your knowledge with top universities and organisations.

Learn more about how FutureLearn is transforming access to education