Parallel Application Programming for Embedded Systems : Performance Characterization

M2 internship proposal.

Context and Proposed Work:

The use of parallel programming models is very common in the field of high-performance computing (HPC), and usually leverages supercomputers, i.e., a set of machines (often called cluster) dedicated to intensive/scientific computation, with a fast interconnection network, that is, with high throughput and very low latency.

The advent of chip multi-processors, a.k.a., multicore chips (CMPs), from servers, to workstations, down to high-end embedded systems, has made parallel and concurrent programming more interesting to a wider audience. However, parallel and distributed programming which ties multiple workstations or embedded devices has been less explored, in part because of the assumptions which are generally made in a parallel application: homogeneous latency and throughput, identical compute nodes, etc.

This internship aims at leveraging high-end embedded systems, for example Raspberry Pi 4 boards (RPi4), and after having connected them to the same network, distribute computations on the resulting parallel embedded computer. Applications of interest include linear algebra kernels, which are the basis for many tasks tied to computer vision, image processing, machine learning and neural network processing, etc.

The intern will have several tasks to perform, from performing multiple configurations of the network (wired-only, wireless-only, hybrid), and compare the results to what can be obtained on a single high-end workstation, in terms of pure performance (FLOP/s) as well as energy/power efficiency (FLOPs/W).

Technical skills required:

C or C++ programming; some prior knowledge of MPI and/or OpenMP would be greatly appreciated. A basic understanding of UNIX/Linux would go a long way.

Contacts :