Teleo-Reactive Programs and the Triple-Tower Architecture

 

Nils J. Nilsson

Robotics Laboratory

Department of Computer Science

Stanford University

Stanford, CA 94305

 

nilsson@cs.stanford.edu

 

August 14, 2001

 

 

I.  Agent Architectures[1]

 

Can anything in general be said about intelligent agent architectures?  Just as there are millions of species of animals, occupying millions of different niches, I expect that there will be many species of artificial agents---each a specialist for one of a countless number of tasks.  The exact forms of their architectures will depend on their tasks and their environments. For example, some will work in time-stressed situations in which reactions to unpredictable and changing environmental states must be fast and unequivocal.  Others will have the time and the knowledge to predict the effects of future courses of action so that more rational choices can be made.  Even though there will probably never be a single, all-purpose agent architecture, there is one that I think will play a prominent role in many future systems.  This architecture is an adaptation of the popular three-level ones that have been prominent in robotics research. 

 

A.  Three-Level Architectures

 

One of the first integrated intelligent agent systems was a collection of computer programs and hardware known as “Shakey the Robot” (Nilsson, 1984). Shakey's design can be viewed as an early example of what has come to be called a three-level architecture.  The levels correspond to different paths from sensory signals to motor commands. The lowest level actions used a short and fast path from sensory signals to effectors. Important “reflexes” were handled by this pathway---such as “stop” when touch sensors detected a close object ahead.  Servo control of motors for achieving set-point targets for shaft angles and so on were also handled by these low-level mechanisms. 

 

The intermediate level combined the low level actions into more complex behaviors---ones whose realization depended on the situation (as sensed and modeled) at the time of execution.  The high-level path involved a system that could generate plans consisting of a sequence of intermediate level programs. 

 

The three-level architecture has been used in a variety of robot systems. As a typical example, see (Connell, 1992).

 

B.  The Triple-Tower Architecture

 

In the three-level architecture, the higher levels use more abstract (or more “coarse”) perceptual predicates and more complex actions than do the lower ones.  Whereas reflex actions are typically evoked by primitive sensory signals, the coordination of higher level actions requires more elaborate perceptual processing.  A generalization of the three-level architecture has been proposed by Albus and colleagues (Albus, 1991; Albus, McCain, & Lumia, 1989).  They envision  hierarchies or “towers” of perceptual, modeling, and action processing.

 

My version of this triple-tower architecture is illustrated in the diagram below:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


The operation of such a system would proceed as follows:   Aspects of the environment that are relevant to the agent’s roles are sensed and converted to primitive predicates and values.  These are stored at the lowest level of the model tower.  Their presence there may immediately evoke primitive actions at the bottom of the action tower.  These actions, in turn, affect the environment, and some of these effects may be sensed---creating a loop in which the environment itself might play an important computational role. 

 

The perceptual tower consists of rules that convert predicates stored in the model tower into more abstract predicates which are then deposited at higher levels in the model tower.  These processes can continue until even the highest levels of the model tower are populated.

 

The action tower consists of a loose hierarchy of action routines that are triggered by the contents of the model tower. The lowest level action routines are simple reflexes---evoked by predicates corresponding to primitive percepts.  More complex actions are evoked by more abstract predicates appropriate for those actions. High level actions “call” other actions until the process bottoms out at the primitive actions that actually affect the environment.

 

We also allow for the possibility that the actions themselves might affect the model tower directly (in addition to the loop through the environment) by writing additional and/or altered content. With the ability both to read from and write in memory, the triple-tower structure is a perfectly general computational architecture.

 

In order to be responsive to ongoing environmental changes, we propose that some sort of truth-maintenance system (TMS) be made a part of the perceptual and model towers.  Such a system should continuously delete predicates and values from the model tower that are no longer derivable (through the perceptual rules) from the then-present components of the model tower.  We describe a specific example of such a triple-tower system in the next section.

 

II.                 A Triple-Tower T-R Agent Builds a Steeple

 

As an illustrative example, I describe a triple-tower system for building a steeple of blocks on a table.  Its action routines are teleo-reactive (T-R) programs (Nilsson, 1994).  (See also the T-R web page at: http://www.robotics.stanford.edu/users/nilsson/trweb/tr.html. )

 

We assume the agent works in an environment of blocks and a table as shown in the sketch below.

 

 

 

 

 

 

 

 

 

 


In this example, there are four blocks (A, B, C, and D), a table (Ta), and a hand that can move blocks from place to place. In the sketch, the hand is holding block D.

 

At the lowest level of the action tower are the action schemas putdown(x,y) and pickup(x).  The schema putdown(x,y) can be applied if the hand is holding block x and if y is either the table or a block with no other block on it.  The result (if successful) is that block x is on y (either directly on the table or on top of block y).  The schema pickup(x) can be applied if no block is on block x and the hand is not already holding anything. The result is that the hand will be holding block x.  We assume that these actions are “ballistic.”  They continue executing until they terminate (either successfully or in some type of failure).

 

At the lowest level of the perceptual tower are primitive sensors that create all of the instances of the predicates On(x,y) and Holding(x) that are satisfied by the environment. These are immediately placed in the model tower.  (These predicates have their obvious intended interpretations.)  If applied to the environment of blocks and table shown above, the sensors would create and place in the model tower the following assertions:  On(C,A), On(A, Ta), On(B, Ta), and Holding (D).  These assertions remain in the model tower only so long as direct sensory perception of the environment sustains them.  (For example, if  block C is moved, the assertion On(C,A) would be immediately deleted.)

 

The goal of our block-stacking agent is to make a steeple of blocks with all of the blocks in the steeple ordered from top to bottom as specified by a user.  The user specifies the order, from top to bottom, by a list, such as (A, B, C).  The complete system for this example is shown in the sketch below.

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 


The perceptual tower is composed of a number of rules that are invoked in the forward direction to create new predicates---more abstract than the primitive ones which are created directly by the sensors.  These rules are invoked by predicates that are already in the model.  The results are inserted in the model.  For example, the rule

 

Ø($x)On(x,y) Ù ØHolding(y) É Clear(y)

 

can be used to assert Clear(C) in the model tower because (by reference to the model) there is no assertion of the form On(x,C) and there is no assertion Holding(C).  (We assume “negation as failure.”)  Similarly, we can assert Clear(B).

 

The rule

 

On(x,Ta) É Ordered(list(x))

 

can be used to assert Ordered((A)) and Ordered((B)) in the model because both On(A,Ta) and On(B, Ta) are in the model already.  (The Lisp construct “list(x)” creates a list whose single element is the argument x.  Note our use of other Lisp notation throughout the example.)  Using the various rules in the perceptual tower, more abstract predicates come to populate the model tower.  The model tower is used to evaluate predicates as needed by the T-R programs in the action tower.  Keep in mind however, that all of these model assertions are subject to a truth-maintenance system that continuously updates them as the (sensed) environment changes.

 

A user can specify a steeple to be built by calling the top-level T-R program, makesteeple(x), with x instantiated to whatever tower s/he desires.  For example, s/he might call makesteeple((A,B,C)).    Calling this program invokes other T-R programs in the action tower.  Their invocation ultimately results in the execution of primitive actions whose primitive effects are sensed and recorded in the model tower---leading to the retraction and assertion of other higher level predicates and the evocation of other actions.  The reader is invited to trace through a successful execution.  Note that the evocation of T-R programs is not strictly hierarchical.  Lower level programs can call those at a higher level.  Note also that the top level program, makesteeple, is recursive. (We assume that the reader is familiar with the operation and the semantics of T-R programs.  Called programs are allowed to continue only so long as the relevant condition in the program that called them remains in the model tower---throughout the entire stack of T-R programs invoked at any time.)

 

The architecture just described is capable of building any single steeple without the need for search.  The complexity is polynomial in the size of the steeple.  [Compare with a similar scheme for steeple building proposed by (Chapman, 1989). The “blocks world” is NP-hard only for building multiple steeples (Gupta & Nau, 1992).]  

 

A Java applet has been programmed that illustrates steeple-building by a triple-tower agent using T-R programs. It is available on the web at:

 

http://www.robotics.stanford.edu/users/nilsson/trweb/TRTower/TRTower.html. 

 

(Note: Running the applet requires the Java 2 Runtime Environment (JRE), version 1.3 or above. Visiting the website above should result in your being asked if you want this JRE downloaded to your computer. You can check the version number of your JRE by typing under the shell prompt: java-version. JRE v1.3 can be downloaded from Sun; it is available for the platforms: Microsoft Windows, Linux (x86), and Solaris (SPARC/x86).)

 

 

 

III.               The Vision

 

Our example shows that it is relatively straightforward to design a triple-tower T-R system for certain kinds of tasks.  But flexible and useful autonomous agents will need to be able to learn about their environments and will also need to be able to learn how to perform tasks in them.  Can triple-tower T-R systems be learned?  We would need to learn both abstractions of sensed data, the rules for creating these abstractions, and hierarchies of action programs.  These learning tasks are difficult because they are interdependent.  The most useful abstractions will depend on the tasks to be learned and on the procedures to perform them. The procedures, in turn, will depend on the abstractions that are available. 

 

There is some existing work that may be relevant to this challenge.  Pfleger has investigated on-line techniques for detecting frequently occurring, hierarchically related sub-strings within data streams (Pfleger, 2001).  Drescher  has proposed methods for building concepts based on Piagetian cognitive development (Drescher, 1991). Stone has developed a layered learning architecture for robotic soccer players (Stone, 2000). Dietterich proposes methods for hierarchical reinforcement learning (Dietterich 1998).  Benson has used inductive logic programming (IPL) methods to learn first-order expressions for action-evoking conditions (Benson, 1996)

 

I think the triple-tower architecture provides an interesting setting for an attack on the problem of integrating perceptual, model, and action learning.

 

 

REFERENCES

 

 

Albus 1991

            Albus, J. S., “Outline for a Theory of Intelligence,” IEEE Systems, Man, and Cybernetics, 21(3):473-509, May/June 1991.

 

Albus, McCain, & Lumia 1989

            Albus, J. S., McCain, H. G., and Lumia, R., “NASA/NBS Standard Reference Model for Telerobot Control System Architecture (NASREM), NIST Technical Note 1235, 1989 Edition, National Institute of Standards and Technology, Gaithersburg, MD, April 1989 (supersedes NBS Technical Note 1235, July 1987).

 

Benson 1996

            Benson, S., Learning Action Models for Reactive Autonomous Agents, PhD Thesis, Department of Computer Science, Stanford University, 1996.

 

Chapman 1989

            Chapman, D., “Penguins Can Make Cake,” AI Magazine, 10(4):45-50, 1989.

 

Connell 1992

            Connell, J.,  “SSS: A Hybrid Architecture Applied to Robot Navigation,” in Proc 1992 IEEE International Conf. on Robotics and Automation, pp. 2719-2724, 1992.

 

Dietterich 1998

            Dietterich, T. G., “Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition,” in Proceedings of the 15th International Conference on Machine Learning ICML'98, San Francisco, CA: Morgan Kaufmann.  (Also to appear in the Journal of Artificial Intelligence Research.)

 

Drescher 1991

            Drescher, G., Made-Up Minds: A Constructivist Approach to Artificial Intelligence, Cambridge, MA: MIT Press, 1991.

 

Gupta & Nau 1992

            Gupta, N., and Nau, D., “On the Complexity of Blocks-World Planning,” Artificial Intelligence, 56(2/3):223-254, 1992.

 

Nilsson 1984

            Nilsson, N. J.,  Shakey the Robot, Technical Note 325, SRI International, Menlo Park, CA, 1984.

 

Nilsson 1994

            Nilsson, N. J.,  “Teleo-Reactive Programs for Agent Control,” Journal of Artificial Intelligence Research, 1, pp. 139-158, January 1994.          

 

Pfleger 2001

            Pfleger, K., Data-driven, Bottom-up Chunking: Learning Hierarchical Compositional Structure, PhD Dissertation (in progress), Department of Computer Science, Stanford University, Stanford, CA 94305, 2001.

 

Stone 2000

            Stone, P., Layered Learning in Multi-Agent Systems:  A Winning Approach to Robotic Soccer, Cambridge, MA: MIT Press, 2000.

 

 

 



[1] This section is adapted from Chapter 25 of my book, Artificial Intelligence: A New Synthesis, San Francisco: Morgan Kaufmann, 1998.