Cognitive Autonomous Systems Laboratory

RDE, OPAS, and Robotic World Architecture


The following account of the structure of the RDE and its subsystems is more precise than the one given in the "RDE Overview" webnote.

The RDE contains three kinds of functionality:

Most of these occur in several generations and variants in the group's work, but some aspects of their structure persist across those variations.

The Autonomous Operator's Assistant

Major versions of the Autonomous Operator's Assistant consist of two main subsystems, a Speech and Graphics User Interface (SGUI) and a Dialog Performance System (also called Dialog Manager). The SGUI is in charge of speech input, speech output, presentation of visual information including video, and interpretation of the user's gestures when pointing at the screen containing images or video. It might of course be generalized into a presentation system with additional modalities, but we have chosen to restrict ourselves to the SGUI concept for the purpose of this project.

The task of the Dialog Performance System is to receive and act on two input data streams, namely phrases from the user in textual form (including "phrases" that represent the user's pointing gestures), and messages from the robotic agent that confirm actions or report observations. These input streams come from the SGUI and the robotic agent, respectively. At a minimum the Dialog Performance System shall respond reactively to these inputs by textual and visual output to the user, via the SGUI, and by sending commands to the robotic agent, in such a way as to accomplish both the desired robot behavior and a coherent dialog. Additional capabilities concern providing the Dialog Performance System with the abilities for e.g. learning and autonomous action.

The present base-level Dialog Performance System is called DOSAR. The SGUI and the DOSAR are quite different in character and the connection between them is fairly narrow. Please refer to their respective menu items (in the left-side menu) for descriptions of each of them.

A new Dialog Performance System called CEDERIC is being developed by Karolina Eliasson; this work is described in the menue item 'Thesis Projects for Systems' in the main CASL webpage.

The Robotic Agent and Robotic World

A Robotic Agent is a process that receives commands and queries pertaining to the flight of e.g. a UAV, and that returns messages for observations that are derived from sensor data, and for progress and completion reports on requested actions. The mapping from input stream to output stream depends on the state of the physical or simulated world where the robotic agent is located, and on the behavior of other robotic agents in that world. The Robotic World in our architecture can be either a single robotic agent with its actual or simulated environment, or a structure containing several robotic agents.

The input and output streams of the robotic agent are normally connected to the dialog performance subsystem of an AOA. Some robotic agents may also produce a video stream capturing what is 'seen' by the video camera that is mounted on the UAV. This video stream may be directed to a video server, which in turn is controlled by the dialog performance section of the AOA controlling the agent.

The Software Infrastructure

The software infrastructure, which is the third major part of the Robotic Dialog Environment, consists of the following parts:

The Development Infrastructure, an auxiliary system that is designed to support the development and use of the AOA and Robotic World parts.

The Communication Framework, which defines the modes and means of communication between subsystems. Several approaches to communication are being tried and compared. We first used a framework based on passing KQML-style messages between major subsystems. This has partly been replaced by the use of the Hazard framework that has been developed by Peter Andersson in his thesis project.

The Video Server, which handles both actual video streams and markup of the video stream with information about its recording position and its contents as functions of time. It receives a video stream combined with a stream of markup information from a robotic UAV agent, assigns timestamps to each frame, and stores the video for future retrieval. On request it can provide the SGUI with current and playback video as well as markup information for specific frames.

Please refer to the respective menu items for further details.


Posted on 2005-05-16 as part of the CAISOR website. [Version history].