This course provides an introduction to CUDA and programming parallel hardware architectures like todays GPUs. We will show how to program with CUDA and what problems can be solved efficiently with modern GPUs. The discussed algorithms are not necessarily related to Computer Graphics. The course will be accompanied by practical exercises and the students will have to work on a small project to pass.

The format of the course will change mid-way through the term. Two-hour lectures and one-hour tutorials will be replaced by practical work on larger projects. The course focuses entirely on parallel programming on modern GPUs. CUDA will be used to implement all practical assignments which will include common parallel primitives like parallel prefix sum, parallel reduction, and parallel sorting algorithms (e.g. radix sort). In addition to the training material available from NVIDIA and other sources, we will also use some of the recent scientific papers for up-to-date results and programming methods.


Teaching Assistants




  • Programming experience with C++


Register for the course via Microsoft Teams.


Date Lecture - Instructor Slides Videos Assignments
12.11.2020 Assignment 1
Development Environment

19.11.2020 Toolchain
26.11.2020 Assignment 2

03.12.2020 Questions
10.12.2020 Assignment 3

17.12.2020 Assignment 4

Christmas Break
14.01.2021 Questions
21.01.2021 Questions
28.01.2021 Questions
04.02.2021 Assignment Presentations


The assignments will be posted under the course schedule.

The projects are expected to compile and work out of the box on the machines in the CIP-pool students’ lab in order to give the tutors a guaranty that the code will run on machines that both them and the students have access to.




10% Performance Challenge
50% Assignments
40% Final exam

Course Schedule

Date Lecture - Instructor Slides Videos Assignments
03.11.2020 Introduction and History of GPU Programming
10.11.2020 CUDA Programming Model
17.11.2020 CUDA API in Detail
24.11.2020 Parallel Computation Patterns
01.12.2020 Hardware Scheduling
08.12.2020 Memory Hierarchy
15.12.2020 Performance Optimization Case Study
Christmas Break
12.01.2021 Advanced Programming Techniques
19.01.2021 Related Programming Models
26.01.2021 AnyDSL Compiler Framework
02.02.2021 Wrap-up

Performance Competition



The course does not follow a particular book, but suggested readings include: