Welcome to my website. I am a PhD student enrolled at both the WISE and the SOFT lab of the Vrije Universiteit Brussel. I am also a member of the Ambient group. My research focuses on investigating programming language support to deal with the massive amount of concurrent events generated by various input devices. By providing adequate software abstractions for correlating multiple input devices we try to encourage multimodal gestural interaction and ease their implementations.
Keywords: Multimodal Interaction, Gesture Recognition, Rule language, Expert Systems, Complex Event Processing, Programming Languages
Affiliations:
Latest news:
We present SpeeG, a multimodal speech- and body gesture-based text input system targeting media centres, set-top boxes and game consoles. Our controller-free zoomable user interface combines speech input with a gesture-based real-time correction of the recognised voice input. While the open source CMU Sphinx voice recogniser transforms speech input into written text, Microsoft's Kinect sensor is used for the hand gesture tracking. A modified version of the zoomable Dasher interface combines the input from Sphinx and the Kinect sensor. In contrast to existing speech error correction solutions with a clear distinction between a detection and correction phase, our innovative SpeeG text input system enables continuous real-time error correction. An evaluation of the SpeeG prototype has revealed that low error rates for a text input speed of about six words per minute can be achieved after a minimal learning phase. Moreover, in a user study SpeeG has been perceived as the fastest of all evaluated user interfaces and therefore represents a promising candidate for future controller-free text input.
Youtube video coming soon…
In recent years, multimodal interfaces have gained momentum as an alternative to traditional WIMP interaction styles. Existing multimodal fusion engines and frameworks range from low-level data stream-oriented approaches to high-level semantic inference-based solutions. However, there is a lack of multimodal interaction engines offering native fusion support across different levels of abstractions to fully exploit the power of multimodal interactions. We present Mudra, a unified multimodal interaction framework supporting the integrated processing of low-level data streams as well as high-level semantic inferences. Our solution is based on a central fact base in combination with a declarative rule-based language to derive new facts at different abstraction levels. Our innovative architecture for multimodal interaction encourages the use of software engineering principles such as modularisation and composition to support a growing set of input modalities as well as to enable the integration of existing or novel multimodal fusion engines.
My kinect presentation movie is now on Youtube! (Short version)
Currently I'm working around licensing issues to release my framework as open-source. In the meantime, I would like to evaluate my language constructs by implementing the hottest kinect gesture anyone proposes. Feel free to post your own or to vote on Reddit.
Multi-touch technology allows users to use their hands to manipulate digital information. We have observed that mainstream software frameworks do not offer support to deal with the complexity of these new devices. Current multi-touch frameworks only provide a narrow range of hardcoded functionality. Therefore the development of new multi-touch gestures and integrating them with other gestures is notoriously hard. The main goal of this framework is to provide developers with adequate software engineering abstractions to close the gap between the evolution in the multi-touch technology and software detection mechanisms.
Current frameworks force the programmer into an event driven programming model where the programmer has to register and compose event handlers manually. This results in an application where the control flow of the application is driven by external events and no longer by the sequential structure of the program. Reuse, composition and understanding are hampered when using such frameworks.
In this work we propose a solution based on research conducted by the Complex Event Processing domain. We advocate the use of a rule language which allows programmers to express gestures in a declarative way. The advantage of such an approach is that the programmer no longer needs to be concerned about how to derive gestures but only about describing the gesture. We present a first step in that direction in the form of a domain-specific language supporting spatio-temporal operators.
Complex gestures which are extremely hard to be implemented in traditional approaches can be expressed in one or multiple rules which are easy to understand. The use of a rule language has the benefit that the developed gestures are reusable and ease to compose. Further a strong connection to application-level entities allows developers to activate and deactivate gestures depending on their graphical context.