Teleo-Reactive
Programs and the Triple-Tower Architecture
Nils J. Nilsson
Robotics Laboratory
Department of Computer Science
Stanford University
Stanford, CA 94305
August 14, 2001
I. Agent
Architectures[1]
Can anything in general be said about
intelligent agent architectures? Just
as there are millions of species of animals, occupying millions of different
niches, I expect that there will be many species of artificial agents---each a
specialist for one of a countless number of tasks. The exact forms of their architectures will depend on their tasks
and their environments. For example, some will work in time-stressed situations
in which reactions to unpredictable and changing environmental states must be
fast and unequivocal. Others will have
the time and the knowledge to predict the effects of future courses of action
so that more rational choices can be made.
Even though there will probably never be a single, all-purpose agent
architecture, there is one that I think will play a prominent role in many
future systems. This architecture is an
adaptation of the popular three-level ones that have been prominent in robotics
research.
A.
Three-Level Architectures
One of the first integrated intelligent agent
systems was a collection of computer programs and hardware known as “Shakey the
Robot” (Nilsson, 1984). Shakey's design can be viewed as an early example of
what has come to be called a three-level
architecture. The levels correspond
to different paths from sensory signals to motor commands. The lowest level
actions used a short and fast path from sensory signals to effectors. Important
“reflexes” were handled by this pathway---such as “stop” when touch sensors
detected a close object ahead. Servo
control of motors for achieving set-point targets for shaft angles and so on
were also handled by these low-level mechanisms.
The intermediate level combined the low level
actions into more complex behaviors---ones whose realization depended on the
situation (as sensed and modeled) at the time of execution. The high-level path involved a system that
could generate plans consisting of a sequence of intermediate level programs.
The three-level architecture has been used in a
variety of robot systems. As a typical example, see (Connell, 1992).
B. The
Triple-Tower Architecture
In the three-level architecture, the higher
levels use more abstract (or more “coarse”) perceptual predicates and more
complex actions than do the lower ones.
Whereas reflex actions are typically evoked by primitive sensory
signals, the coordination of higher level actions requires more elaborate
perceptual processing. A generalization
of the three-level architecture has been proposed by Albus and colleagues
(Albus, 1991; Albus, McCain, & Lumia, 1989). They envision hierarchies
or “towers” of perceptual, modeling, and action processing.
My version of this triple-tower architecture is illustrated in the diagram below:
The operation of such a system would proceed as
follows: Aspects of the environment
that are relevant to the agent’s roles are sensed and converted to primitive
predicates and values. These are stored
at the lowest level of the model tower.
Their presence there may immediately evoke primitive actions at the
bottom of the action tower. These
actions, in turn, affect the environment, and some of these effects may be
sensed---creating a loop in which the
environment itself might play an important computational role.
The perceptual tower consists of rules that
convert predicates stored in the model tower into more abstract predicates
which are then deposited at higher levels in the model tower. These processes can continue until even the
highest levels of the model tower are populated.
The action tower consists of a loose hierarchy
of action routines that are triggered by the contents of the model tower. The
lowest level action routines are simple reflexes---evoked by predicates
corresponding to primitive percepts.
More complex actions are evoked by more abstract predicates appropriate
for those actions. High level actions “call” other actions until the process
bottoms out at the primitive actions that actually affect the environment.
We also allow for the possibility that the
actions themselves might affect the model tower directly (in addition to the
loop through the environment) by writing additional and/or altered content.
With the ability both to read from and write in memory, the triple-tower
structure is a perfectly general computational architecture.
In order to be responsive to ongoing
environmental changes, we propose that some sort of truth-maintenance system
(TMS) be made a part of the perceptual and model towers. Such a system should continuously delete
predicates and values from the model tower that are no longer derivable
(through the perceptual rules) from the then-present components of the model
tower. We describe a specific example
of such a triple-tower system in the next section.
II.
A Triple-Tower T-R Agent Builds a Steeple
As an illustrative example, I describe a
triple-tower system for building a steeple of blocks on a table. Its action routines are teleo-reactive (T-R)
programs (Nilsson, 1994). (See also the
T-R web page at: http://www.robotics.stanford.edu/users/nilsson/trweb/tr.html.
)
We assume the agent works in an environment of
blocks and a table as shown in the sketch below.
In this example, there are four blocks (A, B, C,
and D), a table (Ta), and a hand that can move blocks from place to place. In
the sketch, the hand is holding block D.
At the lowest level of the action tower are the
action schemas putdown(x,y) and pickup(x). The schema putdown(x,y) can be applied if
the hand is holding block x and if y is either the table or a block with no
other block on it. The result (if
successful) is that block x is on y (either directly on the table or on top of
block y). The schema pickup(x) can be
applied if no block is on block x and the hand is not already holding anything.
The result is that the hand will be holding block x. We assume that these actions are “ballistic.” They continue executing until they terminate
(either successfully or in some type of failure).
At the lowest level of the perceptual tower are
primitive sensors that create all of the instances of the predicates On(x,y)
and Holding(x) that are satisfied by the environment. These are immediately
placed in the model tower. (These
predicates have their obvious intended interpretations.) If applied to the environment of blocks and
table shown above, the sensors would create and place in the model tower the
following assertions: On(C,A), On(A,
Ta), On(B, Ta), and Holding (D). These
assertions remain in the model tower only so long as direct sensory perception
of the environment sustains them. (For
example, if block C is moved, the
assertion On(C,A) would be immediately deleted.)
The goal of our block-stacking agent is to make
a steeple of blocks with all of the blocks in the steeple ordered from top to
bottom as specified by a user. The user
specifies the order, from top to bottom, by a list, such as (A, B, C). The complete system for this example is
shown in the sketch below.
The perceptual tower is composed of a number of
rules that are invoked in the forward direction to create new predicates---more
abstract than the primitive ones which are created directly by the
sensors. These rules are invoked by
predicates that are already in the model.
The results are inserted in the model.
For example, the rule
Ø($x)On(x,y) Ù ØHolding(y) É Clear(y)
can be used to assert Clear(C) in the model
tower because (by reference to the model) there is no assertion of the form
On(x,C) and there is no assertion Holding(C).
(We assume “negation as failure.”)
Similarly, we can assert Clear(B).
The rule
On(x,Ta) É
Ordered(list(x))
can be used to assert Ordered((A)) and
Ordered((B)) in the model because both On(A,Ta) and On(B, Ta) are in the model
already. (The Lisp construct “list(x)”
creates a list whose single element is the argument x. Note our use of other Lisp notation
throughout the example.) Using the
various rules in the perceptual tower, more abstract predicates come to
populate the model tower. The model
tower is used to evaluate predicates as needed by the T-R programs in the
action tower. Keep in mind however,
that all of these model assertions are subject to a truth-maintenance system
that continuously updates them as the (sensed) environment changes.
A user can specify a steeple to be built by
calling the top-level T-R program, makesteeple(x), with x instantiated to
whatever tower s/he desires. For
example, s/he might call makesteeple((A,B,C)). Calling this program invokes other T-R programs in the action
tower. Their invocation ultimately
results in the execution of primitive actions whose primitive effects are
sensed and recorded in the model tower---leading to the retraction and
assertion of other higher level predicates and the evocation of other
actions. The reader is invited to trace
through a successful execution. Note
that the evocation of T-R programs is not strictly hierarchical. Lower level programs can call those at a
higher level. Note also that the top
level program, makesteeple, is recursive. (We assume that the reader is
familiar with the operation and the semantics of T-R programs. Called programs are allowed to continue only
so long as the relevant condition in the program that called them remains in
the model tower---throughout the entire stack of T-R programs invoked at any
time.)
The architecture just described is capable of
building any single steeple without the need for search. The complexity is polynomial in the size of
the steeple. [Compare with a similar scheme
for steeple building proposed by (Chapman, 1989). The “blocks world” is NP-hard
only for building multiple steeples (Gupta & Nau, 1992).]
A Java applet has been programmed that
illustrates steeple-building by a triple-tower agent using T-R programs. It is
available on the web at:
http://www.robotics.stanford.edu/users/nilsson/trweb/TRTower/TRTower.html.
(Note: Running the applet requires the Java 2 Runtime Environment (JRE), version 1.3 or above. Visiting the website above should result in your being asked if you want this JRE downloaded to your computer. You can check the version number of your JRE by typing under the shell prompt: java-version. JRE v1.3 can be downloaded from Sun; it is available for the platforms: Microsoft Windows, Linux (x86), and Solaris (SPARC/x86).)
III.
The Vision
Our example shows that it is relatively
straightforward to design a triple-tower T-R system for certain kinds of
tasks. But flexible and useful
autonomous agents will need to be able to learn about their environments and
will also need to be able to learn how to perform tasks in them. Can triple-tower T-R systems be
learned? We would need to learn both
abstractions of sensed data, the rules for creating these abstractions, and
hierarchies of action programs. These
learning tasks are difficult because they are interdependent. The most useful abstractions will depend on
the tasks to be learned and on the procedures to perform them. The procedures,
in turn, will depend on the abstractions that are available.
There is some existing work that may be relevant
to this challenge. Pfleger has
investigated on-line techniques for detecting frequently occurring,
hierarchically related sub-strings within data streams (Pfleger, 2001). Drescher
has proposed methods for building concepts based on Piagetian cognitive
development (Drescher, 1991). Stone has developed a layered learning
architecture for robotic soccer players (Stone, 2000). Dietterich proposes
methods for hierarchical reinforcement learning (Dietterich 1998). Benson has used inductive logic programming
(IPL) methods to learn first-order expressions for action-evoking conditions
(Benson, 1996)
I think the triple-tower architecture provides
an interesting setting for an attack on the problem of integrating perceptual,
model, and action learning.
REFERENCES
Albus 1991
Albus, J. S., “Outline for a Theory of Intelligence,” IEEE Systems, Man, and Cybernetics, 21(3):473-509, May/June 1991.
Albus, McCain, & Lumia 1989
Albus, J. S., McCain, H. G., and Lumia, R., “NASA/NBS Standard Reference Model for Telerobot Control System Architecture (NASREM), NIST Technical Note 1235, 1989 Edition, National Institute of Standards and Technology, Gaithersburg, MD, April 1989 (supersedes NBS Technical Note 1235, July 1987).
Benson 1996
Benson, S., Learning Action Models for Reactive Autonomous Agents, PhD Thesis, Department of Computer Science, Stanford University, 1996.
Chapman 1989
Chapman, D., “Penguins Can Make Cake,” AI Magazine, 10(4):45-50, 1989.
Connell 1992
Connell, J., “SSS: A Hybrid Architecture Applied to Robot Navigation,” in Proc 1992 IEEE International Conf. on Robotics and Automation, pp. 2719-2724, 1992.
Dietterich 1998
Dietterich, T. G., “Hierarchical Reinforcement Learning with the MAXQ Value Function Decomposition,” in Proceedings of the 15th International Conference on Machine Learning ICML'98, San Francisco, CA: Morgan Kaufmann. (Also to appear in the Journal of Artificial Intelligence Research.)
Drescher 1991
Drescher, G., Made-Up Minds: A Constructivist Approach to Artificial Intelligence, Cambridge, MA: MIT Press, 1991.
Gupta & Nau 1992
Gupta, N., and Nau, D., “On the Complexity of Blocks-World Planning,” Artificial Intelligence, 56(2/3):223-254, 1992.
Nilsson 1984
Nilsson, N. J., Shakey the Robot, Technical Note 325, SRI International, Menlo Park, CA, 1984.
Nilsson 1994
Nilsson,
N. J., “Teleo-Reactive Programs for
Agent Control,” Journal of Artificial
Intelligence Research, 1, pp. 139-158, January 1994.
Pfleger 2001
Pfleger, K., Data-driven, Bottom-up Chunking: Learning Hierarchical Compositional Structure, PhD Dissertation (in progress), Department of Computer Science, Stanford University, Stanford, CA 94305, 2001.
Stone 2000
Stone, P., Layered Learning in Multi-Agent Systems: A Winning Approach to Robotic Soccer, Cambridge, MA: MIT Press, 2000.
[1] This section is adapted from Chapter 25 of my book, Artificial Intelligence: A New Synthesis, San Francisco: Morgan Kaufmann, 1998.