Building an implicitly parallel language for the Dataflow Software Stack

Printer-friendly version

You should take this topic if you:

  • like working on programming languages and their inner workings.
  • want to learn about modern and future processor hardware.
  • have an open mind about unconventional ways to program.
  • followed the "Functional Programming" course (recommended but not required)
  • followed the "Compilers" course (recommended but not required)

Parallel programming is hard. Programmers have been avoiding it as much as possible over the years, relying on the processor hardware getting faster and faster to speed up their code. This has come to an end however, single processor speeds have stagnated and nearly all desktop processors have spawned multiple processing cores. Programmers are now forced to take on parallel programming with languages, techniques and tools created for sequential programming.

This thesis subject tries to avoid the common approach to this problem; adapting the existing languages to fit in this parallel world. Instead, we go back to the root of the problem and propose a programming style that is better suited to the current trends in processing hardware. Almost all popular programming languages use the so-called von Neumann model of computation, where a processor runs through a program sequentially and can uniformly access a large global memory store. This is a simple model that works well on a single processor, but it quickly becomes a problem when used on programs that have to execute in parallel.

In programming languages using the dataflow model of computation, there is no global memory and no forced sequential execution of code. Operations are executed if the data it depends on becomes available. This leaves open exactly when each operation is executed, and allows compilers or interpreters to distribute the computation automatically. Optimizing compilers convert their source language to the dataflow model for this reason, but are severely restricted by the source language. For example we find successful parallelizing compilers for FORTRAN, but not for C or Java. Similarly, modern processors do dataflow execution deep down inside, but the input 'language' (x86) is still sequential.

Disregarding market forces, this is a strange situation: compilers convert to a nicely parallel dataflow representation for ease of reasoning over the program, and transform that back to a sequential language to talk to the processor. That processor's hardware then transforms the input program again for parallel dataflow execution. Our proposal is to cut away this middle man and establish dataflow as the language interface between software and hardware.

In order to achieve this, work has been started on a dataflow software stack. The various components of this stack talk to each other through the use of various dataflow program representations. Work by earlier thesis students has already produced prototypical versions of such a stack, but the refinement of these components still presents a lot of interesting research opportunities.

Goal & Research Activities: 

The goal of this thesis is to build an implicitly parallel language that will run on top of the existing dataflow execution engine. In a first stage you will have to familiarize yourself with the related research literature. You have a large body of related research to fall back on, in both compiler technology research, dataflow processor hardware and parallel language implementations. This will help you build up know-how and clearly outline your research contribution later on. More importantly, it will help you develop the strategy for the next stage.

As a second stage, you will design an implicitly parallel programming language and its compiler. You will be free to choose the actual technologies and platforms you use to reach this goal, within reason. Throughout this phase, you are free to focus on either the design of the programming language or on the dataflow-specific optimizations that your compiler provides.