Mathijs Saey

Office: 10 F 722
Phone: +32 2 629 34 91
Vrije Universiteit Brussel
Faculty of Sciences, DINF – SOFT
Pleinlaan 2
B-1050 Brussels, Belgium
mathsaey's picture

Job Description

I am a PhD student at the Software Languages Lab, which is part of the Computer Science Department of the Faculty of Sciences at the Vrije Universiteit Brussel.

Research Description

The ubiquity of smartphones and the advent of the "Internet of Things" made it possible for the average company to access an enormous amount of real-time heterogeneous data. Processing this data with a workflow composed of various existing components offers several advantages: firstly, code can be easily reused and shared across different applications. Secondly, this approach makes is easier to distribute various workflow components over a cluster, facilitating scalable data processing. Finally, non-programmers can build such workflows in a graphical environment, which allows them to easily build data-processing pipelines; this is especially attractive to easily add (a chain of) pre-processing steps (such as data cleansing) to a data processing chain.

Previous work already makes it possible to combine arbitrary data processing components into a workflow, however, these tools tend to be query-driven, which makes it difficult for these tools to deal with real-time data. In order to work in such a real-time context, we propose an approach inspired by reactive programming; in such an approach, workflows and their components are automatically activated when data arrives from some external data source (such as a smartphone or IOT device). To support this approach, we are designing a framework, called Skitter, which provides the necessary programming language abstractions to wrap existing software inside reactive components. In turn, these components can be composed into reactive workflows, which can be executed by the runtime of our framework.

Thus, our research is focused on two main objectives: finding the correct language abstractions to write reactive workflow components and investigating techniques to efficiently orchestrate the execution of these components while accounting for partial failure. Our language abstractions are based on the notion of effects, which describe the effect that the execution of a component may have on its own internal state and on the external world. The execution model of our framework is based on the dataflow model, combined with a component orchestration technique which uses the information about the aforementioned effects to automatically deal with partial failure handling. The combination of all of the above allows non-experts to combine reactive components into a workflow, enabling them to build data-processing pipelines at a scale.


My research output can be found here.