Multimodal Interaction for Information Appliances


Executive Summary

Project objectives

Industrial relevance

Project work plan

Scientific production

Project prototypes



Multimodal interaction

The challenge is to develop a multi-modal dialogue system that supports various users performing a variety of tasks, from simple information retrieval to complex co-operative problem solving. For instance, for many action tasks multi-modal interaction including speech input and output is crucial, since the user in many cases will have his or her hands busy performing the task, and therefore cannot use typed or direct manipulation interaction. Multimodal dialogue, thus, which involves a number of research issues:
  • Integration and interpretation. In the e-Home we have various users with different information needs. This means that we need to allow each user to utilise any modality and combination of modalities to formulate their information needs. This includes for instance speech only, graphical input only or combining graphical input with language (spoken or written). Furthermore, output must also utilise available modalities to present the results in such a way that they can be easily interpreted. Not all modalities are always at hand, e.g. there is not a screen everywhere. Thus, important research issues include investigations on how to best combine modalities for various situations, how to interpret input from different modalities and development of an architecture allowing multimodal interaction when the available modalities are constrained in various ways. Most approaches for co-ordinating multimodal input utilise temporal information from the input modalities (Johnston et. al., 1997, Johnston, 1998, Johansson, 2000) to form one interpretation from the various modalities. This works well in many situations, but there might be other clues as well. One such is prosodic information. Horne et. al. (1999) present results on accentuation and domain-related information finding relations between focal accents and focus (Cb) which could be used to help integrating information from different modalities.
  • Dialogue. Dialogue management involves controlling the flow of the dialogue by deciding how the system should respond to a user move. This is done by inspecting and contextually specifying the information structure produced by an interpretation module. If some information is missing or a request is ambiguous, clarification questions are specified by the Dialogue Manager and posed to the user. The dialogue manager developed in previous projects (Jönsson, 1997, Dahlbäck, 1999) will be adapted to the situations and tasks in the e-home, especially considering issues on co-operativety and domain knowledge representation as discussed below.
  • Co-operativity. Many of the problems addressed in the e-Home communication can not be solved based on a simple dialogue. Problem solving has instead to be viewed as a joint activity involving co-operation between human and computer (c.f. Allwood, Traum & Jokinen, 2000). The different kinds of families and different housing conditions make the tasks differ between families. But many of these have a number of interesting features in common.
    • There is a sequence of actions to be performed, often including decisions points where alternative action sequences need to be selected
    • The tasks are seldom performed, and hence the level of expertise is low
    • They include both aspects of diagnosis and aspects of treatment
    There is another important difference; this kind of action knowledge or practical knowledge is often learned through some kind of apprentice situation, i.e. is learned by doing instead of through theoretical studies.
  • Representation: A multimodal dialogue system requires representations of domain knowledge in order to perform its task. There are, several problems related to this. For example, in cases where the background system is distributed and consists of several domain and application system knowledge sources the dialogue system must know which of them to access, in what order, and how the results should be integrated into one answer. This type of knowledge can be represented in a domain task model (Flycht-Eriksson, 2000, Flycht-Eriksson & Jönsson, 2000). Furthermore, co-operative dialogue requires representations that facilitates sequences of actions based on knowledge on how things are carried out; simple tasks such as fixing a flat tire, changing a fuse or stopping a bleeding nose, or more complex tasks such as video programming or LAN-configuration.
  • Multiple participants. One novel feature in e-Home communication is the heterogeneous nature of its users. Families, being the most common group of users, consists of individuals with varying interests and needs. Adaptive interfaces, however, violates many principles on human-computer interaction such as control, transparency and predictability (Höök, 2000), and thus require careful design based on evaluations with users of the service.


Jens Allwood, David Traum and Kristiina Jokinen, Cooperation, dialogue and ethics, International Journal on Human-Computer Studies, 53, pp 871-914, 2000.

Nils Dahlbäck and Arne Jönsson: Knowledge Sources In Spoken Dialogue Systems, Proceedings of Eurospeech'99, Budapest, Hungary, 1999, pp. 1523-1526.

Nils Dahlbäck, Annika Flycht-Eriksson, Arne Jönsson, and Pernilla Qvarfordt: An Architecture for Multi-Modal Natural Dialogue Systems, Proceedings of ESCA Tutorial and Research Workshop (ETRW) on Interactive Dialogue in Multi-Modal Systems, Germany, 1999.

Annika Flycht-Eriksson, A Domain Knowledge Manager for Dialogue Systems, Proceedings of the 14th European Conference on Artificial Intelligence (ECAI’2000), IOS Press Amsterdam, 2000.

Annika Flycht-Eriksson. A survey of knowledge sources in dialogue systems. Proceedings of IJCAI'99 workshop on Knowledge and Reasoning in Practical Dialogue Systems, pp. 41-48, Stockholm, Sweden.

Annika Flycht-Eriksson and Arne Jönsson, Dialogue and Domain Knowledge Management in Dialogue Systems 1st SIGdial Workshop on Discourse and Dialogue, Hong Kong, 7-8 October 2000

Arne Jönsson, A model for habitable and efficient dialogue management for natural language interaction, Natural Language Engineering 3(2/3), pp 103-122, Cambridge University Press, 1997.

Kristina Höök, Steps to take before Intelligent User Interfaces become real, Journal of Interaction with Computers, 2000

Pernilla Qvarfordt and Arne Jönsson: Evaluating the Dialogue Component in the Gulan Educational System, Proceedings of Eurospeech'99, Budapest, Hungary, 1999, pp. 643-646.

Page responsible: Webmaster
Last updated: 2012-05-07