Multimodal Interaction for Information Appliances


Executive Summary

Project objectives

Industrial relevance

Project work plan

Scientific production

Project prototypes



Processing multi-modal and multi-lingual information

Information applicable for processing and access in the e-home can come from various sources, in different languages and in multiple formats. Some of the data is structured (databases, XML and SGML documents), but the majority of information is stored in unstructured documents. (Bielawski and Boyle (1997) claims that over 80 per cent of an organization's information are to be found in documents, that is, as unstructured data). Furthermore, the problem is not the availability of the information, rather the problem is to locate the right information for a specific need at the right time. Multi-modal communication need access to structured information, and as most information is represented in unstructured formats (documents), processes to identify different types of information of different granularity need to be developed further. Below we list research areas which are of interest to build structured information sources that can serve the e-home with information in multi-modal dialogue.
  • Information retrieval (IR). The process of identifying the correct information source, for example, the relevant documents, is what is in focus for information retrieval. Here various techniques from language engineering, such as stemming, lemmatizing and shallow syntactic parsing, will aid in providing more accurate retrieval results.
  • Information extraction (IE). This is one step further compared to IR in that not only the relevant documents should be found, but the process moves on to first single out passages, or extracts from documents, which contain the desired pieces of information, and second, "transforms them into information that is more readily digested and analyzed" (Cowie & Lehnert 1996, p. 80-81). The goal of IE is to find and link relevant information and at the same time ignore extraneous and irrelevant information. IE is performed in several steps, i) identifying and marking up instances of objects, events and relations in documents, ii) extracting the available information based on the mark-up, and iii) presenting the requested information in a suitable format for the user. In NLP terms, IE requires a number of separate techniques that are used together; i.e., POS-tagging, functional and sense disambiguation, pronoun resolution, etc. In practice, many IE approaches use agents or spiders that concentrate on a particular task, such as identifying entities in the documents that can be classified as places, persons, organizations, times, positions, etc. For the user, IE will provide added value because the techniques involved will serve as an extra filter and help the user to focus on only the relevant information hidden in parts of the documents.
  • Text mining. Text mining is also known as text data mining (Hearst 1999) is described as the process of extracting interesting and non-trivial patterns or knowledge from unstructured text documents. Often it is viewed as an extension of data mining or knowledge discovery from databases (Simoudis 1996). Text mining is a more multidisciplinary field than IR and IE as it involves IR, IE, text analysis, clustering, categorization, visualization, database technology, machine learning and data mining (Tan 1999). One distinguishing factor that separates text mining from IE is that IE is user-driven in the sense that the user explicitly states what he is looking for whereas text mining can be seen as system-driven if the aim is to find links and relationships between entities in the document base. However, in many practical cases, the borderline between text mining and information extraction is rather vague. Within the e-home framework, text mining can in the future play an important role for the user in the role of acting like intelligent personal assistants. A personal miner would be able to learn a particular user's profile and preferences, perform text mining automatically, and present information to the user without explicit requests (Tan 1999).
  • Document summarization. To be able to accurately compress and summarize large documents into concise descriptions of the content is another approach to handle the increasing information flow. Document summarization can be defined as "a reductive transformation of source text to summary text through content reduction by selection and/or generalization on what is important in the source" (Sparck Jones 1999). There are many basic similarities between the information extraction paradigm and the document summarization field as they use similar NLP techniques. As Sparck Jones points out the division for summarization approaches lies between systems that reduces content by selection or extraction on the one hand, and systems that interpret the source document and shorten the source document by generalizing the content. Most systems today adopt the extraction approach, i.e., where the text is shortened by selecting the most important passages and compiling these into one text. One example of a commercial extraction approach to summarization is the AutoSummarize function present in later versions of Microsoft Word. Very few systems today can accurately interpret and make a shorter abstracted version of a document, but with progress within IE and Text mining will definitely help to promote such progress in the future, as they share a great deal of basic components.
  • Document classification. Document classification is a simpler area than the one mentioned above, and often serves as a subcomponent for IE, text mining and document summarization. Document classification involves superficial analysis of documents in order to specify type, genre and language.
  • Multi-lingual document processing. On the Internet and in many document archives, documents are available in multiple languages. By making information retrieval and extraction as well as text mining and summarization operate on multi-lingual sources, these techniques will give the user access to information stored in several languages. The multi-lingual perspective requires linguistic resources, such as multi-lingual lexicons and term banks, as well as multi-lingual processing systems like machine translation systems, to be able to arrive at language-independent intermediate representations. NLP has seen a number of new corpus-based approaches in the last decades where focus has been on extracting multi-lingual resources from both mono-lingual and multilingual documents, for example by extracting technical terminology and by using word alignment programs to create bilingual lexicons automatically (cf. Melamed 1998, Merkel 1999, Ahrenberg, Andersson & Merkel 1998).
  • Multi-modal document processing. Documents are often regarded as "text only", but in recent years, documents are more and more considered as "containers of information or knowledge", irrespective of modality, be it text, graphics, video, sound, etc. The widening concept of document makes it necessary to process multi-modal documents with a unified perspective, which demands for integrating various techniques, such as image analysis and speech recognition with general text processing techniques.
Taken together, these research areas will contribute to making information more readily available to various applications and information appliances in the e-home.


Lars Ahrenberg, Mikael Andersson and Magnus Merkel. A Simple Hybrid Aligner for Generating Lexical Correspondences in Parallel Texts. In Proceedings of the 36th Annual Meeting of the Association of Computational Linguistics and 17th International Conference on Computational Linguistics, COLING-ACL’98, Montreal, pp. 29-35, 1998.

Jim Cowie and Wendy Lehnert: Information Extraction. In Communications of the ACM. January 1996/Vol.39, No. 1, 1996.

Marti Hearst. Untangling text data mining. In Proceedings of ACL '99: The 37th annual meeting of the Association for Computational Linguistics, University of Maryland, 1999. Also available at

Dan I. Melamed. Empirical Methods for MT Lexicon Construction. In Machine Translation and the Information Soup. D. Farwell, L. Gerber and E. Hovy (eds.), Berlin, Springer Verlag, pp. 18-30, 1998.

Magnus Merkel. Understanding and enhancing translation by parallel text processing. Dissertation No. 607. Department of Computer and Information Science, Linköping university, 1999.

E. Simoudis. Reality check for data mining. In IEEE Expert, 11(5), 1996.

Karen Sparck Jones. Automatic summarizing: factors and directions. In Advances in Automatic Text Summarization. Mani, I. & Maybury M.T. (eds.), MIT Press, London, 1999.

Ah-Hwee Tan. Text Mining: The state of the art and the challenges. In Proceedings, PAKDD'99 Workshop on Knowledge discovery from Advanced Databases (KDAD'99), Beijing, pp. 71-76, April 1999.

Page responsible: Webmaster
Last updated: 2012-05-07