[Lecture] High Performance Computing for Machine Learning
High Performance Computing (HPC) is concerned with the efficient and fast execution of large simulations on modern supercomputers. It makes use of state-of-the-art technologies (such as GPUs, low-latency connections etc.) to efficiently solve complex scientific and data-driven problems. One of the key factors for the current success of machine learning models is the ability to perform calculations on modern computers with many model parameters and large amounts of training data. However, in their simplest form, current machine learning libraries only make limited efficient use of available HPC resources. The aim of this lecture is therefore to examine theoretical and practical aspects for the efficient training of machine learning and, in particular, deep learning models on modern HPC resources.
With this in mind, in the first part of the lecture, we will cover techniques that typically are used for the performance optimization of software on supercomputers. After a short introduction to HPC, we will deal specifically with GPUs (graphics processing units) and various memory models as well as performance optimization models and a practical introduction to CUDA, a programming interface developed by Nvidia for GPU programming.
In the second part of the lecture, the learnt techniques and concepts for the efficient training of Machine Learning and Deep Learning models will be applied. Different data- and model-parallel trainings methods for the efficient training on GPUs, algorithmic and practice-oriented, will be demonstrated using various examples from applications.
[Lecture] Compute Continuum
Modern computing has moved away from the desktop computer to the cloud, where resources and data are shared alike. Yet the Cloud computing paradigm suffers from the scope and complexity of modern compute scenarios, where data may reside anywhere, be produced and consumed anytime in any amount, and where users are mobile and distributed all over the world. To reduce the load on servers and the network, Fog and Edge computing were introduced - forms of distributed computing with flexible and variable allocation and load. This course will introduce the concept of the compute continuum, which aims at executing distributed applications flexibly over any infrastructure. The goal is to adapt immediately to different usage contexts. The compute continuum aims at scenarios arising from connected smart homes, smart cities, global logistic networks etc.
Within this lecture, we will investigate the relevant technologies to realise such an environment, and when it can be used, as well as its obstacles. The lecture is essentially segmented into three parts: The first part focuses on the hardware layer, including equally the type of processors, embedded system architectures and their connectivity. In the second part we will talk about the main principles of distributed computing, including how data is distributed and processed, and which use case criteria are fulfilled how. The third part is focusing on adaptive execution in the compute continuum, that includes embedded Operating Systems, virtualisation and containerisation.
[Seminar] High-Performance Computing with GPUs
GPUs are ubiquitous in High-Performance Computing, delivering the majority of performance in the fastest supercomputers around the world. The platform is enabled by highly parallel
applications, suitable programming models, and a close combination between software and hardware, and advanced hardware designs. The seminar covers topics relevant to all components
of the HPC GPU ecosystem, like effective implementation of GPU algorithms, investigations to programming models, performance analysis, benchmarking of applications, and understanding
hardware features.
[Seminar] Programming Principles of Distributed Systems
This seminar covers emerging topics in parallel and distributed computing. The scope spans from tightly coupled high performance computing systems to loosely coupled cloud and edge
computing systems. Special emphasis is placed on advanced system architectures with heterogeneous processor and memory technologies.
During the seminar, students will work together in a small group to reproduce the results of a previously published research paper from the above mentioned scientific domain.
Relevant publications must have been peer-reviewed by a major conference or in a journal and feature an open source codebase (see the literature below for examples of representative papers). To
build/run and produce the results of the paper, both local and remote resources can be used, which are provided by the seminar lecturer as needed. Optionally the students are even capable
to improve or optimize the proposed solution in the chosen paper. We are planning to send students who have demonstrated outstanding quality and dedication to a European Reproducibility Challenge if sufficient interest is expressed.