Project financed by:  

Multimodal Interaction for Information Services


Research goals

Research Issues


Project work plan

Scientific production

Project prototypes




Research Issues

The project can be divided into sub-projects, each addressing important issues needed to achieve the goals presented above; we need further knowledge and software on document processing, we need knowledge on multimodal interaction, the MALIN dialogue system framework must be modified and the use of common knowledge sources must be investigated including development of techniques and methods for building and maintaining such.

During the project, we will adopt the incremental development strategy. This means that the dialogue system will continuously be more and more advanced as we iteratively add capabilities.

Investigations on multimodal interaction

One important area of research concerns multimodal interaction. With improved technology in areas such as speech recognition and gesture recognition, and with new areas of computer use, both in public and private spaces, new possibilities for multimodal interaction emerge.

However, little is known of how the user experiences multimodal interaction. In the area of Human-Computer Interaction, researchers have been sceptical about using speech and or natural language for interaction with computers. Most arguments are based on the fact that the user loses control when using interaction techniques based on natural language. When using e.g. speech as interaction technique, the system makes a statistical interpretation of the users input, thus, interpretation of user input, and hence system response, can vary. This inconsistency means that the user never knows how the system will respond. A more philosophical argument for loss of control in multimodal interaction is based on the fact that natural language requests are less direct than direct manipulation. A third argument is that a speech or multimodal system can give an impression of understanding more than it actually does, and it is not evident what the system really understands. A consequence of this is again a loss of control, since the user often do not know how to adopt to what the system understands. Even without previous misunderstandings from the system, users often find it hard to know what to say to a system.

The most common objection to these arguments are that humans use speech and multimodal interaction everyday, it is their natural way of communication. Furthermore, studies show that multimodal interaction is in fact more efficient.

In this sub-project the aim is to investigate what properties that are important for an acceptance of multimodal interaction from the users. Is control an important property of a multimodal system? If so, is control manifested in use of interaction techniques or in dialogue strategies? Is it important to be able to freely choose what interaction technique to use at any point in time, or to use several simultaneously, or restrict the use of interaction techniques in various points in the dialogue? How should the dialogue between the user and system be designed to maintain a sense of control in the user's view; Should it be very open, and unrestricted, or clearly restricted to certain input from the user?

Development and maintenance of shared knowledge sources

This project, as any language technology project, needs knowledge sources of various types. In this project there are two important issues to address: the dynamic nature of the information sources demanding means for automatic update, and the fact that some knowledge sources are shared.

The lexicon is the main knowledge sources for the Interpreter in our framework. Fortunately, much of the information needed to create the lexicon can be acquired automatically by information extraction from the documents. Furthermore, if the grammar is based on partial information, as is the case in our framework, many auxiliary words are not needed, which also makes automatic extraction and lexicon development easier. Ontologies have a less clearly defined role. In our view, an ontology is a knowledge base with information about concepts, their properties and relations. In an open domain with unstructured information a number of research issues concerning ontologies must be addressed.

The first question to address is how the ontology is to be created. To hand-craft ontologies is a cumbersome and time consuming task, in an open and unstructured domain it can even be an unmanageable task. Techniques for automatic construction of ontologies are therefore highly prioritised issue to investigate. One possible approach is to use existing resources like thesauri, dictionaries, and taxonomies. A thesaurus or taxonomy provides a hierarchical structure, which can be extended with semantic relations derived from a dictionary. One such resource for Swedish could be SWordNet, the Swedish version of WordNet.

Another approach to automatic construction of ontologies is to identify and extract concepts and relations from text corpora. Different IE techniques and machine learning can be used. Although, not much is done here, there are promising results, for instance utilising genetic programming to discover hyponomy relations between concepts.

Once an ontology has been constructed it has to be maintained. Since new information can be added the ontology must be able to change accordingly. This implies that techniques for automatic updates of an ontology has to be developed.

As the ontology will be a central resource in the system and will be shared by various modules, the format and properties must serve different purposes. To discover these an investigation of how the domain ontology can be used by the different modules in a dialogue system has to be conducted.

Information extraction and document processing

Multimodal communication needs access to structured information, and as most information is represented in unstructured documents, processes to identify different types of information of different granularity need to be developed further. Information extraction (IE) has raised interest not only in the NLP field in the last decade. It has also been widely accepted that to be able to improve the current information retrieval techniques, technology from NLP must be incorporated. Not only should the relevant documents be retrieved (as in IR), but the process moves on to first single out passages, or extracts from documents, which contain the desired pieces of information, and second, "transforms them into information that is more readily digested and analysed". The goal of IE is to find and link relevant information and at the same time ignore extraneous and irrelevant information. IE is performed in several steps, e.g., i) identifying and marking up instances of objects, events and relations in documents, ii) extracting the available information based on the mark-up, and iii) presenting the requested information in a suitable format for the user. In NLP terms, IE requires a number of separate techniques that are used together; i.e., parts-of-speech tagging, functional and sense disambiguation, pronoun resolution, etc. In practice, many IE approaches use agents or wrappers that concentrate on a particular task, such as identifying entities in the documents that can be classified as places, persons, organisations, times, positions, etc.

Information sources are often in several languages and it will therefore be necessary for future information systems to handle multilingual information. Multilinguality is not a primary objective in this project, but the topic is still of natural interest to the group as two of the project members also are involved in projects concerning machine translation and multilingual processing within the Natural Language Laboratory at the department.

Furthermore, there is also an interest to investigate how ontology-based document processing will promote advances within automatic summarisation. These research areas are indeed connected and we plan to do some preliminary investigations in this direction.

In this subproject, emphasis will be put on developing a wide set of semantic wrappers (both general and domain-specific) that will be tested and evaluated on several Swedish information domains. Naturally, this is tightly connected to the ontologies and how these are used as a common resource for several components of the project.

Dialogue Systems development

The basis for the dialogue system is the MALIN framework. This framework will be iteratively refined adding further functionality to the modules.

The MALIN interpreter will be refined with a capable component for query analysis. This will decrease the complexity of formulating information requests by allowing intuitive questions in natural language. To be able to correctly decipher the intended meaning of a user question is one of the critical issues in a Q&A system. Vague, ambiguous and partial input has to be dealt with. By using domain knowledge from the ontology together with syntactic surface analysis, as well as dialogue history information, a catalogue of question categories on different levels will be developed. The selected question category for a particular query will then form the basis for the query that is fed into the extraction module. Furthermore, the choice of question type gives the system information about how the answer should be presented to the user.

The central component in the MALIN dialogue system framework is the Dialogue Manager (DM) that maintains the dialogue history and handles the interaction with the user, i.e. control the flow of the dialogue by deciding how to respond to user utterances. In many information-providing dialogue systems the Dialogue Manager has also been responsible for the retrieval of requested information from the background systems. As a consequence, the dialogue, task and domain knowledge and reasoning has often been integrated. This integration has a number of drawbacks, for example, it makes it hard and time consuming to port the system to a new domain, a separation of domain from dialogue and task knowledge and reasoning has therefore been proposed.

In the MALIN dialogue system framework a separate module called the Domain Knowledge Manager (DKM) has been introduced. The DKM is responsible for domain reasoning and retrieval of information from various domain knowledge sources. The DKM cooperates with the DM to answer questions for information posed by the user. User utterances are transformed into information requests by the DM, possibly involving clarification subdialogues with the user. The fully specified request is then sent to the DKM. The DKM consults one or several information or domain knowledge sources in order to retrieve the requested information and produces an answer to the request. This requires that the DKM knows where and how different types of information should be retrieved, a task that becomes difficult when the domain is open and the information unstructured.

The introduction of the separate module for domain knowledge management makes the MALIN framework suitable for development of dialogue systems in unstructured and open domains. The DKM can act as an intermediator between the two subareas of information-processing and multi-modal interaction. However, the DM and DKM were developed for closed and structured domains. Investigations on how they can be extended to work on partially structured documents is one important research issue.

For dialogue management this involves adaptation of the dialogue and task models used by the dialogue manager to handle the dialogue. It also includes issues on how to utilise the domain ontology.

The ontology must also be utilised by the DKM. The ontology can, for example, be used to reason about requests and information retrieved from the documents. Since the information is only partially structured the mechanisms for integration of retrieved/extracted information must be modified. Another modification is that the DKM has to be made more flexible in order to handle that new information can introduced over time.

Page responsible: Webmaster
Last updated: 2012-05-07