The declarative knowledge-bases of the system, which should be changed to suit the needs of a given application, thus comprises not only the dictionary and the domain concepts, but the grammar and the dialog objects, i.e. the possible moves (speech acts) and exchanges as well.
An aim of our work has been to represent all knowledge in the same structure, and in the same representation language. This would make it possible to develop the linguis tic knowledge and the domain knowledge simultaneously in the same environment. It also makes it possible in principle to integrate syntactic and semantic processing. The processing modules that we have implemented so far, however, differ in the representation languages that they assume.
The central processing module of the system is the dialog manager, DM, which receives user inputs, controls the data-flow of the system and maintains the discourse representation. The discourse representation consists of three dynamic structures. The first one, the score-board, keeps information about salient objects and properties which are needed by the instantiator and generator modules. The score-board is basically an interface to the second dynamic structure, a dialog tree which represents the entire dialog as it proceeds in the interaction. The nodes of the dialog tree are instances of dialog objects, i.e. various types of moves and segments. They carry information about properties such as speaker, hearer, topic and focus, and are associated with a local plan. The plan is structured in terms of actions and is combined with similar plans of other nodes to form the third structure, the action plan stack where the actions to be performed by the DM are stored.
The interaction is interpreted from the information conveyed in the speech act directly; no reasoning about users' intentions or goals is utilized. Speech act information is assembled into Initiative-Response units which form a basis for interpreting the segment structure. A simple context free grammar can model the interaction and the rules are selected based on information about properties of objects describing the information provided by the system. Referring expressions are handled by copying information from the previous segment to the current segment which is in turn updated with information from the background system.
The DM is thus characterized by its distributed control. The actions of the action plan stack are distributed on the nodes of the dialog tree that are still open. This means that if, say, the parser fails with a certain node being current, that node creates an instance of a clarification request segment, which will control the dialog during the clarification. This segment consists of two parts, one part for prompting the user with a clarification request and another to interpret the user input. Finally the user response is integrated into the dialog tree. The distributed design has the advantage that we can use quite simple, local plans. Detailed descriptions of the dialog manager can be found in Ahrenberg, Jönsson and Dahlbäck, (1990), Dahlbäck and Jönsson (1992) and Jönsson (1991, 1993a, 1993b).
The principles for dialogue management utilized in the Dialogue Manager are applied to written natural language interaction for simple service systems. We are currently investigating its applicability for multi-modal interaction. There are indications that the principles also apply to multi-modal communication for simple service systems (Jönsson, 1995; Stein & Thiel, 1993).
We have previously studied a number of different real or simulated background systems, to provide an empirical basis for the development of the LINLIN system described above. This work is described in a number of publications, e.g. Dahlbäck & Jönsson (1989, 1992), Dahlbäck (1991a, b), Dahlbäck, Jönsson, Ahrenberg, (1993).
This work, as well as similar studies by others, indicates that dialogues with computers in written natural language differ from dialogues between people. It is still, however, an open question as to what extent these differences are due to assumed and real differences between people and computers as dialog partners, or due to the qualities of the communication channel. In an on-going project we have collected a corpus of 60 dialogues to study these questions. Three different scenarios were used, two of which concerned querying a data base for information, but on different domains. The third scenario involved both ordering and data base querying. For each scenario, 10 subjects were told that they were interacting with a computer system directly, and 10 were told that they were interacting via terminal with a person having such a system on his desk. The analysis of this corpus is continuing, but the results obtained thus far indicates that there are small or no differences between the dialogues with people and with computers. Consequently, the characteristics of so-called `computerese', i.e. the sub-language used when interacting with a computer, seem to stem more from the characteristics of the communication channel and the task situation, than from the believed characteristics of the communication partner.
There are two general classes of theories on dialogue management in the natural language community. One is the plan-based approach. Here the linguistic structure is used to identify the intentional state in terms of the user's goals and intentions. These are then modelled in plans describing the actions which may possibly lead to their fulfillment.
The other approach to dialogue management is to use only the information in the linguistic structure to model the dialogue expectations, i.e. utterances are interpreted based on their functional relation to the surrounding interaction. The idea is that these constraints on what can be uttered, allows us to write a dialogue grammar to manage the dialogue.
The plan-based approach is not only a model for dialogue in natural language interfaces but also aims to account for general discourse. The dialogue grammar approach, however, is more limited (though there are researchers that claim that this method could also be used as a general model of discourse, both within computational approaches (e.g. Reichman, 1985), and in other areas of discourse analysis (e.g. Stubbs 1983)).
Several theories of discourse that are relevant for NLP make central use of some notion of a discourse segment. A problem with all of them, however, is that they do not provide a definition of a segment which is both general and precise enough for computer applications. In these circumstances we found it necessary in our dialog system project to adopt a sublanguage approach to discourse representation and processing, using simulation data as the primary source of data for development of a model.
A basic finding of the studies was that almost all input from users (and output from the systems) could be classified as either initiatives or responses and that initiatives typically introduce a single goal in the form of a single question or request. Nestings could occur, however, so that an initiative from the system could be countered by an initiative from the user e.g. requesting some clarification from the system. Still, the overall structure of the dialogue can be given a simple tree structure in terms of segments defined by initial initiatives and closing responses. Moreover, this segment structure correlated strongly with the range of anaphoric references (Dahlbäck, 1991a, 1992) and it seemed possible to keep track of the focused information in each segment by means of a small list of attributes that hold items that are likely to be referenced by a pronoun or be implicit in a following utterance (Ahrenberg, Jönsson, Dahlbäck, 1990, Jönsson 1993a). These results can be summarized by saying that a grammar-based approach to discourse representation seems sufficient for many important application areas so that the complexity associated with the more general plan-based approaches can be avoided (Jönsson, 1991, 1993a).
One problem with comparing the two approaches to dialogue management is that they have been developed using different empirical bases. To overcome this, we are currently engaged in a project whose aim is to empirically compare the two approaches, by analysing a set of dialogues using both models. We will collect a corpus of dialogues from human-computer interaction, both written and spoken, and analyze the dialogues using both a coding scheme for our dialogue grammar model, as well as with a scheme for a plan or intention based model, similar to the one used by Grosz and Hirschberg in their empirical work on discourse structure (Hirschberg and Grosz, 1992, Grosz and Hirschberg, 1992). The dialogues will come both from our own corpora, as well as from other researchers in Sweden and abroad. We are interested both in issues such as coding reliability and applicability for the different approaches, as well as the usefulness of the assigned structures for anaphora resolution and answer generation. This work is still in progress, and will continue until the summer of 1996. Parts of the work will be presented at the 1995 AAAI Spring Symposium on Empirical Methods in Discourse Interpretation and Generation (Ahrenberg, Dahlbäck and Jönsson 1995).