Duration
5 weeksWeekly study
4 hours
GPU Programming for Scientific Computing and Beyond
Other courses you might like
This course isn't running right now. We can email you when it starts again, or check out these other courses you might like.
Browse more in IT & Computer Science
Optimise GPU programming to accelerate scientific computing and other operations
Accelerators such as graphics processing units (GPUs) and co-processors are very effective at achieving high-performance computing (HPC) to optimise a device’s power. Typical GPU architecture means the units often perform several operations in parallel, breaking down large problems into smaller and simpler ones that can be executed at the same time.
To get the most from these resources, end-users and scientific developers need to use the parallel programming of a graphics processing unit at maximum efficiency. On this five-week course from the Partnership for Advanced Computing in Europe (PRACE) you’ll learn to do that.
Achieve accelerated parallel programming
Faster parallel programming means better performance and quicker scientific computing and other HPC results, making it a top priority. On this course, you’ll learn how to accelerate parallel programming using a GPU.
With this enhanced computational power, you’ll be able to run scientific and engineering simulations for scientific computing, solve matrices and vectors for artificial intelligence, or optimally run end-user applications like games.
Get all you need to kick-start your GPGPU programming
The course covers all aspects of general-purpose graphics processing unit (GPGPU) programming. You’ll get comprehensive instruction on GPU architecture, programming languages, code optimisation and tuning, and everything else required for any kind of HPC.
Learn GPGPU programming with a world-class team
PRACE is dedicated to enabling high-impact scientific and engineering discovery and research, and to strengthening HPC usage across Europe. This mission, along with the Partnership’s combined expertise, makes it ideally suited to helping you take your GPGPU programming game further.
Syllabus
Week 1
Course Organization, Parallel Programming Concepts and GPU Architecture
Course Introduction and Welcome by Pascal Bouvry
Prof. Pascal Bouvry talks about the overview of this course and how GPUs have been evolved over the years. And how it is being used for science and engineering and AI/ML computations.
Course Introduction by Ezhilmathi Krishnasamy
Dr. Ezhilmathi Krishnasamy talks about an overview of this GPU programming course and its use for scientific computing. Furthermore, it also talks about course organization and what you will be learning as a participant.
Introduction to Week 1 Activities
This video will give a short description of this whole week's content and what you will be learning. Moreover, it also tells each article's main ideas and concept briefly. Hence, it makes learners get an excellent idea.
Introduction (Article 1, Quiz and Discussion)
This section will show the importance of GPU computing in scientific computing and artificial intelligence; and how GPUs are incorporated into modern supercomputers.
Course Organization and GPU Access (Article 2, Quiz and Discussion)
This section gives an overview of the outcome of the course, prerequisite, and course structure. Lastly, it suggests ways of accessing the GPU hardware and software.
Parallel Computer Architectures (Article 3, Quiz and Discussion)
In this section, we will study the basics of computer parallel architecture. It shows an overview of the CPU parallel architecture. By knowing this, later it would help to understand the GPU architecture.
General Parallel Programming Concepts (Article 4, Quiz and Discussion)
This section shows the introduction to general parallel programming. Going through the section will help to see the difference between parallel programming on the CPU and GPUs (accelerators).
GPU Architecture (Article 5, Quiz and Discussion)
In this section, we study the generic GPU architecture, memory, cores, and how the GPUs have improved data communication over the years.
Week 2
CUDA (basic): Introduction to CUDA Programming
Introduction to Week 2 Activities
This video will give a short description of this whole week's content and what you will be learning. Moreover, it also tells each article's main ideas and concept briefly. Hence, it makes learners get an excellent idea.
Basic Programming (Article 1, Quiz and Discussion)
This section shows our first hello world programming using the GPU. We will then go through how the device, kernel, thread synchronize, and device synchronize are organized in the CUDA code.
Understanding the CUDA Threads (Article 2, Quiz and Discussion)
This section will give you a detailed overview of how the CUDA threads are organized and how it is mapped on the Nvidia GPUs. Plus also it shows how to convert different thread blocks of threads into another form.
CUDA API for C/C++ (Article 3, Quiz and Discussion)
In this section, we will go through a few of the API that is available in the CUDA programming, which will instruct the compilers to do the assigned job.
Vector Operations (Article 4, Quiz and Discussion)
In this section, we will study how to do the CUDA programming for the vector operations from numerical linear algebra.
Matrix Operations (Article 5, Quiz and Discussion)
In this section, we will go through how to write the CUDA programming for the matrix multiplication application.
Week 3
CUDA (advanced): Numerical Algebra, Advanced Topics, Profiling and Tuning
Introduction to Week 3 Activities
This video will give a short description of this whole week's content and what you will be learning. Moreover, it also tells each article's main ideas and concept briefly. Hence, it makes learners get an excellent idea.
Shared Memory Matrix Opertaions (Article 1, Quiz and Discussion)
This section studies optimized matrix operations using shared memory and the tiled matrix concept.
Unified Memory (Article 2, Quiz and Discussion)
This section will study the unified memory concept in the CUDA programming on Nvidia GPU. The unified memory concept makes programmers less effort for writing the CUDA program for data handling.
CUDA Streams (Article 3, Quiz and Discussion)
CUDA Streams are used for concurrent operations of CUDA API calls. It is quite useful when there is a computation and communication is occurring on the device.
CUDA Application Profiling (Article 4, Quiz and Discussion)
This section will show how to profile to Nvidia CUDA code. GPUs have many cores, and different memory options and profiling will help write an optimized GPU code.
Performance Optimization and Tuning (Article 5, Quiz and Discussion)
This section shows a few of the performance tuning options for CUDA programming. If the code is not tuned, then we might even get bad performance, which might underperform the exiting code or CPU code.
Week 4
OpenACC (basic): Introduction to OpenACC Programming Model
Introduction to Week 4 Activities
This video will give a short description of this whole week's content and what you will be learning. Moreover, it also tells each article's main ideas and concept briefly. Hence, it makes learners get an excellent idea.
Introduction to OpenACC (Article 1, Quiz and Discussion)
In this section, we show and discuss the introduction to OpenACC. We consider both C/C++ and Fortran programming languages here.
Functionality of OpenACC (Article 2, Quiz and Discussion)
In this section, we show and discuss an overview of the functionality of the OpenACC.
OpenACC Compute Constructs (Article 3, Quiz and Discussion)
This section shows and discusses the introduction to OpenACC compute constructs. These are essential to learning how to parallelize the serial code.
The Data Environment in OpenACC (Article 4, Quiz and Discussion)
In this section, we show a few of the data handling clauses that are available in OpenACC. These are very important APIs for data transfer and writing.
Programming in OpenACC (Article 5, Quiz and Discussion)
This section shows a gentle introduction to the OpenACC programming model for C/C++ and Fortran programming languages.
Week 5
OpenACC (advanced): Numerical Algebra, Advanced Topics, Profiling and Tuning
Introduction to Week 5 Activities
This video will give a short description of this whole week's content and what you will be learning. Moreover, it also tells each article's main ideas and concept briefly. Hence, it makes learners get an excellent idea.
Vector Operations (Article 1, Quiz and Discussion)
In this section, we will go through simple vector addition using both C/C++ and Fortran programming languages.
Matrix Operations (Article 2, Quiz and Discussion)
In this section, we will study how to do the matrix multiplication using the OpenACC for C/C++ and Fortran languages. We will also use a few of the essential OpenACC clauses with examples.
Shared Memory and Async (Article 3, Quiz and Discussion)
In this section, we show how to use the shared memory from the GPUs by using the OpenACC and also how to enable Async, similar to CUDA streams in CUDA.
Profiling (Article 4, Quiz and Discussion)
In this section, we will show how to profile the OpenACC code. By doing that, we will come to know how much time each function takes. But also, time is taken for the memory transfer between CPU and GPU and OpenACC APIs.
Tuning and Optimization (Article 5, Quiz and Discussion)
As you all now might have familiar with OpenACC and would have noticed that, it is easy to parallelize the serial code into OpenACC. But most of the tricks involve code tuning and optimization; this is what we focus on now.
Learning on this course
On every step of the course you can meet other learners, share your ideas and join in with active discussions in the comments.
What will you achieve?
By the end of the course, you‘ll be able to...
- GPU parallel programming: CUDA and OpenACC programming model; GPU architecture; Efficient implementation of computational linear algebra routines; Running the scientific applications; Code optimization and fine-tuning concerning different architecture.
Who is the course for?
This course is designed for anyone who needs to use GPGPU programming, from end-users playing complex video games to researchers involved with artificial intelligence and scientific computing.
Who will you learn with?
I am a postdoctoral researcher at Luxembourg University, working with the Parallel Computing and Optimization group. My research interests are scientific and quantum computing.
Dr Pascal Bouvry, is full professor at the University of Luxembourg.
Learning on FutureLearn
Your learning, your rules
- Courses are split into weeks, activities, and steps to help you keep track of your learning
- Learn through a mix of bite-sized videos, long- and short-form articles, audio, and practical activities
- Stay motivated by using the Progress page to keep track of your step completion and assessment scores
Join a global classroom
- Experience the power of social learning, and get inspired by an international network of learners
- Share ideas with your peers and course educators on every step of the course
- Join the conversation by reading, @ing, liking, bookmarking, and replying to comments from others
Map your progress
- As you work through the course, use notifications and the Progress page to guide your learning
- Whenever you’re ready, mark each step as complete, you’re in control
- Complete 90% of course steps and all of the assessments to earn your certificate
Want to know more about learning on FutureLearn? Using FutureLearn
Do you know someone who'd love this course? Tell them about it...
You can use the hashtag #GPUforscientificcomputing to talk about this course on social media.