The challenge is to develop a multi-modal
dialogue system that supports various users performing a variety of tasks,
from simple information retrieval to complex co-operative problem solving.
For instance, for many action tasks multi-modal interaction including speech
input and output is crucial, since the user in many cases will have his
or her hands busy performing the task, and therefore cannot use typed or
direct manipulation interaction. Multimodal dialogue, thus, which involves
a number of research issues:
Integration and interpretation. In
the e-Home we have various users with different information needs. This
means that we need to allow each user to utilise any modality and combination
of modalities to formulate their information needs. This includes for instance
speech only, graphical input only or combining graphical input with language
(spoken or written). Furthermore, output must also utilise available modalities
to present the results in such a way that they can be easily interpreted.
Not all modalities are always at hand, e.g. there is not a screen everywhere.
Thus, important research issues include investigations on how to best combine
modalities for various situations, how to interpret input from different
modalities and development of an architecture allowing multimodal interaction
when the available modalities are constrained in various ways. Most approaches
for co-ordinating multimodal input utilise temporal information from the
input modalities (Johnston et. al., 1997, Johnston, 1998, Johansson, 2000)
to form one interpretation from the various modalities. This works well
in many situations, but there might be other clues as well. One such is
prosodic information. Horne et. al. (1999) present results on accentuation
and domain-related information finding relations between focal accents
and focus (Cb) which could be used to help integrating information from
Dialogue. Dialogue management involves
controlling the flow of the dialogue by deciding how the system should
respond to a user move. This is done by inspecting and contextually specifying
the information structure produced by an interpretation module. If some
information is missing or a request is ambiguous, clarification questions
are specified by the Dialogue Manager and posed to the user. The dialogue
manager developed in previous projects (Jönsson, 1997, Dahlbäck
et.al, 1999) will be adapted to the situations and tasks in the e-home,
especially considering issues on co-operativety and domain knowledge representation
as discussed below.
Co-operativity. Many of the problems
addressed in the e-Home communication can not be solved based on a simple
dialogue. Problem solving has instead to be viewed as a joint activity
involving co-operation between human and computer (c.f. Allwood, Traum
& Jokinen, 2000). The different kinds of families and different housing
conditions make the tasks differ between families. But many of these have
a number of interesting features in common.
There is another important difference;
this kind of action knowledge or practical knowledge is often learned through
some kind of apprentice situation, i.e. is learned by doing instead of
through theoretical studies.
There is a sequence of actions to be performed,
often including decisions points where alternative action sequences need
to be selected
The tasks are seldom performed, and hence
the level of expertise is low
They include both aspects of diagnosis
and aspects of treatment
Representation: A multimodal dialogue
system requires representations of domain knowledge in order to perform
its task. There are, several problems related to this. For example, in
cases where the background system is distributed and consists of several
domain and application system knowledge sources the dialogue system must
know which of them to access, in what order, and how the results should
be integrated into one answer. This type of knowledge can be represented
in a domain task model (Flycht-Eriksson, 2000, Flycht-Eriksson & Jönsson,
2000). Furthermore, co-operative dialogue requires representations that
facilitates sequences of actions based on knowledge on how things are carried
out; simple tasks such as fixing a flat tire, changing a fuse or stopping
a bleeding nose, or more complex tasks such as video programming or LAN-configuration.
Multiple participants. One novel
feature in e-Home communication is the heterogeneous nature of its users.
Families, being the most common group of users, consists of individuals
with varying interests and needs. Adaptive interfaces, however, violates
many principles on human-computer interaction such as control, transparency
and predictability (Höök, 2000), and thus require careful design
based on evaluations with users of the service.
Jens Allwood, David Traum and Kristiina Jokinen, Cooperation, dialogue and ethics, International Journal on Human-Computer Studies, 53, pp 871-914, 2000.
Nils Dahlbäck and Arne Jönsson: Knowledge Sources In Spoken Dialogue Systems, Proceedings of Eurospeech'99, Budapest, Hungary, 1999, pp. 1523-1526.
Nils Dahlbäck, Annika Flycht-Eriksson, Arne Jönsson, and Pernilla Qvarfordt: An Architecture for Multi-Modal Natural Dialogue Systems, Proceedings of ESCA Tutorial and Research Workshop (ETRW) on Interactive Dialogue in Multi-Modal Systems, Germany, 1999.
Annika Flycht-Eriksson, A Domain Knowledge Manager for Dialogue Systems, Proceedings of the 14th European Conference on Artificial Intelligence (ECAI’2000), IOS Press Amsterdam, 2000.
Annika Flycht-Eriksson. A survey of knowledge sources in dialogue systems. Proceedings of IJCAI'99 workshop on Knowledge and Reasoning in Practical Dialogue Systems, pp. 41-48, Stockholm, Sweden.
Annika Flycht-Eriksson and Arne Jönsson, Dialogue and Domain Knowledge Management in Dialogue Systems 1st SIGdial Workshop on Discourse and Dialogue, Hong Kong, 7-8 October 2000
Arne Jönsson, A model for habitable and efficient dialogue management for natural language interaction, Natural Language Engineering 3(2/3), pp 103-122, Cambridge University Press, 1997.
Kristina Höök, Steps to take before Intelligent User Interfaces become real, Journal of Interaction with Computers, 2000
Pernilla Qvarfordt and Arne Jönsson: Evaluating the Dialogue Component in the Gulan Educational System, Proceedings of Eurospeech'99, Budapest, Hungary, 1999, pp. 643-646.