Moderated by Erik Sandewall.

Chitta Baral and Son Cao Tran

Relating Theories of Actions and Reactive Control

The article mentioned above has been submitted to the Electronic Transactions on Artificial Intelligence, and the present page contains the review discussion. Click here for more explanations and for the webpage of theauthors: Chitta Baral and Son Cao Tran.
Overview of interactions

N:o Question Answer(s) Continued discussion

1 3.10  Erik Sandewall
23.11  The authors

2 18.5.1999  Anonymous Referee 1

3 18.5.1999  Anonymous Referee 2
12.6.1999  The Authors

Q1. Erik Sandewall (3.10):

Dear Chitta and Son Cao,
Your article addresses a very interesting topic for our group. I have some specific questions about the contents of the article, and also some comments about how it relates to other current approaches to the same or similar problems. I'll write them as two separate contributions. With respect to the article itself, I have the following questions.
1. For maintenance control modules, which are treated in section 3.3, you write:
But in the presence of exogenous actions the robot with no control over those actions can only strive to make the maintenance goal true if they are made false by some exogenous action. In other words a maintenance control module can not guarantee that the maintenance goal will never be false...
This seems right if the maintenance goal is characterized by one single state in the state space. However, many applications do require controllers to keep the controlled system in an acceptable state at all times, which is possible if there is a number of such acceptable states, and the state transitions due to exogenous "actions" sometimes only move within the acceptable set. In such cases the controller strives to keep the system in those states among the acceptable ones that are "at a safe distance" from the unacceptable ones. More precisely, even if the system is in an acceptable state, the controller may perform actions as a precaution against future exogenous "actions", thereby shifting the system to a state where exogenous actions can not be harmful.
Possibly one can reduce this case to the one you consider by identifying the maintenance goal with a subset of the acceptable states, namely those acceptable states where that are at a safe distance from a transition to the unacceptable ones. Is this the way you intend maintenance goals to be defined, and if so, do you claim that the reduction is always valid? (Concurrency and the use of combined achievement and maintenance modules may introduce complications, I believe).
2. A second question concerns to what extent this work relates to current, logic-based theories of actions and change. In section 9.1 you write:
... in this paper we do not use any specific action theory... rather we use an abstract "entailment relation". Because of this, our formulation need not change with more sophisticated action theories.
However, as I read the paper I wonder whether it depends on any of the concepts or results of current action theories at all? (This is not a criticism, of course, I just want to understand the character of your definitions and results). For example, definition 6.1 on page 28 begins
Let ... A be an action theory with a transition function Phi
and then the rest of the definition is expressed only in terms of Phi . Could it be the case, then, that everything in this article could as well be expressed only in terms of states, state spaces, and transition functions? In this case, action languages and other logicist devices ought to be seen simply as a convenient notation for specifying transition functions, but otherwise they'd not play any role in this particular development.
3. Here is a standing question that I now ask regularly to authors in our area. Given that your paper proclaims a number of theorems, as do many of our publications, (a) do your results rely on any theorem previously published in this literature, (b) do you foresee any of your theorems later being used in a forthcoming publication by yourself or someone else? If the answers to both questions are "no", what do you think is the purpose of including theorems in our research articles?
4. There seems to be a close connection between your Closure concept and the most commonly studied cases of ramification. In both cases, one first specifies the effects of an action "in itself", and then one adds the possibility of additional changes due to phenomena that are characterized separately from that action. Is this a connection that you have also considered?
The relation to ramification is particularly transparent by comparison with my KR-96 paper, [c-kr-96-99]. That article uses two types of state transitions: one result function that maps an action and a state to a set of possible result states, and a binary successor relation that characterizes further transitions. The "closure" of an action, in that case, is the set of all states that can be reached via one member of the set of action result states, followed by an arbitrary number of steps using the successor relation, and that do not in turn have any successor. For ramification, the successor relation characterizes spontaneous and "caused" transitions in the world; in your work they would instead characterize the results of exogenous "actions". These two cases seem to be very close to each other.
The KR article is in fact formulated entirely in terms of states and state transitions, and analyses ramification methods without introducing any logic language at all. Its way of looking at actions is therefore quite close to what you use in your paper. (The main results in my article are assessments that characterize the range of sound applicability for a number of previously published approaches to ramification). Relating back to my question 2, this adds to the impression that ramification and exogenous events are related phenomena.
5. I put quotes around "actions" because I can not get used to the idea of calling natural events `actions'. Although I understand the background of this terminology, which comes from the situation calculus, I do think it makes communication with neighboring disciplines a lot more difficult than it ought to be.
Best regards, Erik

References:

c-kr-96-99 Erik Sandewall.
Assessment of ramification methods that use static domain constraints. [entry]
Proc. International Conf on Knowledge Representation and Reasoning, 1996, pp. 99-110.

A1. Chitta Baral and Son Cao Tran (23.11):

Dear Erik:
We really appreciate your comments. Following is a point by point response to your comments. In addition, we plan to revise our paper to take into account some of your suggestions and our responses to it.
Best regards
Chitta and Son
1. We agree that our current formulation in terms of maintaining goals is weaker than the formulation where the control never allows the agent to get into an unacceptable state, regardless of exogenous actions. The later approach is used in (among other approaches) policy determination algorithms for Markov Decision Process based approaches [DKKN95,KBSD97]. (As we point out in the paper, the algorithms in [DKKN95,KBSD97] consider the control module as a whole and do not have the sufficiency condition for correctness of individual goals.)
We feel that the later approach is too strong and often there may not be any control module satisfying the stronger criteria.
Nevertheless, studying a middle ground between the two is important.
Following your suggestions, one middle ground can be based on using a maximum number (say m ) of exogenous actions that are considered (or allowed) between two successive agent actions.
Let G be the set of goal states. We can define core(G, m) as a subset of G , from where application of any m exogenous actions will result in a state in G , and there exists a sequence of exogenous actions of length m+1 that will take us to a state outside G .
We can then use this parameter m in defining satisfaction of maintenance goals, by modifying Defn 3.6 in the paper, to have Closure(S, M, A) subseteq G and for all states s in Closure(S, M, A) , each step of executing M takes us to a state in Core(G, m) .
We do not forsee any problem with concurrency with this approach. But the definition of achievement and maintainance will no longer be uniform, and as a result the definition for mixed control modules will get a little complicated.
We will try to pursue this in greater detail (or at least mention this) in the revised version of our paper.
2. As you mention, our paper can be formulated in terms of states, state spaces and transition functions without talking about action theory.
We felt that by doing that, the connection between an action specification and the states and transition function it defines in a succinct and elaboration tolerant (often) way is pushed into the background, and we wanted to stress that states and transition functions are not often directly given but are rather specified by an action description. For that reason, we bring in action theories.
We will point this out in our revised version.
3.
(a) Our results rely on earlier theorems about existence of least fixpoints obtained through iterating a monotonic function.
(b) We have two kind of theorems in our paper: correctness based on sufficiency condition (Theorem 4.1 and 7.1) and correctness of automatic construction algorithms (Proposition 4.1 and 7.3). The later propositions are of course important, as any algorithms should have a result about its correctness. The former theorems are important because they are used in developing the algorithm, and are crucial in proving the correctness of the algorithms.
4. We appreciate your observation. We had not considered this connection.
One difference with respect to your KR 96 approach is that in your approach the first step of your ``closure'' is by applying an action and the subsequent steps are due to a successor relation, while in our approach the subsequent steps are due to exogenous actions.
Regardless of this difference, we will point out the connection in our revised version.
5. Calling natural events as actions may sound odd some time. The reason we chose to call everything actions are several.
(i) The exogenous actions are not just natural actions, but are actions that are beyond the control of the agents. They could be the action of another agent in a multi-agent world. Thus they are a super set of what I would normally think of as ``natural actions''.
(ii) By just saying actions, we avoid talking separately about a theory of events. Otherwise every time we talk about effect of actions we would have to talk about effect of events, and their will be a lot of duplication.

Q2. Anonymous Referee 1 (18.5.1999):

1. Are the results of the article, as specified in the summary, of significant interest for researchers in our area?
Yes.
2. Does the full text substantiate the claims made in the summary?
Yes.
3. Is the text intelligible and concise?
Yes. (Additional comments concerning minor corrections are being given to the authors).
4. Do you know of any previous publication of the same result?
No.
5. Are there any other considerations that may affect the decision?
No.

Q3. Anonymous Referee 2 (18.5.1999):

This paper concerns an important topic, namely the connection between action theories and control. The paper is correct and well-written and I think that the paper should be accepted. However, I have one objection to make.
If we instead of "action theory" use the term "open-loop specification" and instead of "reactive control" use "control law" the results of the paper concern the topic of verification and synthesis of control laws w.r.t. to some specification. In this sense the results are not at all original, since these research problems have generated an abundance of work in Control Theory (more precisely, in work on "Discrete Event Dynamic Systems", DEDS). The authors appear to be completely unaware of this work and on a number of points they re-invent the wheel (e.g. the criterion for "correctness of maintenance control modules" corresponds exactly to a particular flavour of stability criteria, see [3]). There is not one single reference to the DEDS (or Hybrid Systems) literature in the bibliography.
In other words, I observe that a large body of work in a similar problem area is neglected, in particular since it is is based on similar definitions and concepts (that is, verification and synthesis of control laws from specifications). It would have been an even better paper if a comparison to this work had been included.
On the other hand, the usage of these definitions and concepts is done in an original way by the authors. The main issues addressed by this paper have not (to the best of my knowledge) been studied by the DEDS or Hybrid Systems communities, for example modeling and control with sensing actions and conditional plans.
It is important to incorporate the achievements in Reasoning about Action and Change with control, but I think that it would be unfortunate if achievements in Control Theory are neglected while doing so.
Examples from the literature of control law synthesis and verification for discrete and hybrid systems can be found in [1, 2, 5], and various definitions of stability in [3, 4].
REFERENCES
[1] Heymann, Lin and Meyer, Controller Synthesis for a class of hybrid systems subject to configuration-based safety constraints, LNCS 1201, 1997.
[2] Kumar and Garg, Modeling and Control of Logical Discrete Event Systems, Kluwer Academic Publishers, 1995.
[3] Ozveren, Willsky and Antsaklis, Stability and Stabilizability of Discrete Event Dynamic Systems, Journal of the ACM, 1991
[4] Passino and Burgess, Stability analysis of discrete event systems, John Wiley, 1998.
[5] Zhang and Macworth, Synthesis of hybrid constraint-based controllers, LNCS 999, 1995.
A3. The Authors (12.6.1999):

The comments of the anonymous reviewers were very useful. We would like to especially thank the second reviewer for pointing us towards the discrete event dynamic system (DEDS) literature and the similarity between our notion of maintainance and the notion of stability there.
We have now modified our paper by discussing this similarity and adding a few references to the literature in DEDS. The changes follow below.
Chitta and Son
1. Added to the first paragraph of Section 3.3
(Our notion of maintainance is weaker than the notion of stability in discrete event dynamic systems \cite{Ozve91,Ram87a,Ram87b}. We discuss this difference later in Section~\ref{other}.)
2. Modified Section 3.4 The modified first three paragraphs are:
In the previous section we described our formulation of maintaining a goal, and correctness of control modules with respect to our definition.
But in some cases our definition can be considered weak. For example, in one of the stronger formulations the agent is not allowed to get to an unacceptable state, regardless of what exogenous actions happen. This is used in policy determination algorithms for MDP (Markov decision process) based approaches \cite{dean96,kab97}. (We compare our work to these works in additional detail in Section~\ref{sec-suff1}.) Another formulation that is stronger than our notion, and used in the discrete event dynamic system literature \cite{Ozve91,Ram87a,Ram87b} is the notion of `stability' where an acceptable state must be reached within a finite number of transitions and then be visited infinitely often. The difference here is that it requires that an acceptable state be reached in a finite number of transitions, regardless of the presence of an window of non-interference. Thus modules that satisfy the maintainability criteria may not guarantee stability.
We feel that the particular stronger formulations mentioned above may be too strong and often there may not be any control module satisfying such a criterion. Moreover, in domains where the robots or agents actions are deterministic (such as in case softbots, and database updates), our approach of not combining the agent's action with the environments action, and use of the maintainance criteria seem to be more appropriate. Nevertheless, we now briefly discuss some possible in between formulations.
3. Modified Section 10.2. The modified section is as follows:
\subsection{Reactive planning, Program synthesis and Discrete event dynamic systems}
Recently there has been some proposals \cite{godef91,kab97,dean96} about automatic construction of control rules inspired by research in program verification and synthesis and operations research. In \cite{kab97}, an algorithm to generate control rules for goals given in a temporal logic is given.
Two major contributions of this work are that it considers deadlines, and it allows general temporal goals that can specify cyclic behaviors. In \cite{dean96}, transitions between states are represented as Markov processes, and goals are specified using reward functions, and policy iteration algorithms from Operations Research are used to construct control rules.
Our approach differs from the approach in \cite{kab97,dean96}, the approaches mentioned in the program verification and synthesis literature (for example, \cite{emerson90,emerson82,pneuli89}), and the approaches in the discrete event dynamic systems literature (for example, \cite{Ozve91,Ram87a,Ram87b}), in that we separate agent actions from exogenous actions and do not combine them and our notion of maintainance is weaker than analogous notions there.
Also we insist on developing (sufficiency) conditions for the correctness of {\em individual} control rules. The formulations in \cite{kab97,dean96} and the ones mentioned in \cite{emerson90} only deal with the correctness of a control module as a whole. It is important to consider correctness of individual control rules, because often we {\em learn} (or we are told) a particular control rule in isolation, and we need to satisfy ourselves that it is correct by itself, regardless of the rest of the control rules.
Also our methodology allows us to upgrade a control module by simply adding additional correct rules for new states that need to be taken care of. In case of \cite{kab97,dean96}, simply adding new rules may not be always enough to upgrade a module and extra care is needed to avoid getting into the kind of cycles present in the module Goto\_elevator\_3 from Section~\ref{sec-suf1}.
Finally we consider sensing actions which are not considered in \cite{kab97,dean96,Ozve91}, and we use AI methodologies such as `planning' which is not considered in the program reasoning approaches described in \cite{emerson90,emerson82,pneuli89}.
4. New references added
\bibitem[Ozv91]{Ozve91}
C.~Ozveren, A.~Willsky, and P. Antsaklis.
\newblock Stability and stabilizability of discrete event dynamic systems.
\newblock {\em Journal of the ACM}, pages 730--752, vol 38, no 3, July 1991.
\bibitem[RW87a]{Ram87a}
P.~Ramadge and W. Wonham.
\newblock Supervisory control of a class of discrete event process.
\newblock {\em SIAM J. Cont. Optimization 25}, 1, pages 206-207, Jan 1987.

\bibitem[RW87b]{Ram87b}
P.~Ramadge and W. Wonham.
\newblock Modular feedback logic for discrete event systems.
\newblock {\em SIAM J. Cont. Optimization 25}, 5, pages 206-207, Sept 1987.
5. Acknowledgment modified
We are grateful to Erik Sandewall and Marcus Bjareland for useful discussions related to the subject of this paper. We are also grateful to the anonymous reviewer who pointed us to the literature in discrete event dynamic systems and the similarity between our notion of maintainance and the notion of stability there. This research was partially supported by the grants NSF CAREER IRI-9501577, NASA NCC5-97 and NSF DUE-9750858.

Background: Review Protocol Pages and the ETAI
This Review Protocol Page (RPP) is a part of the webpage structure for the Electronic Transactions on Artificial Intelligence, or ETAI. The ETAI is an electronic journal that uses the Internet medium not merely for distributing the articles, but also for a novel, two-stage review procedure. The first review phase is open and allows the peer community to ask questions to the author and to create a discussion about the contribution. The second phase - called refereeing in the ETAI - is like conventional journal refereeing except that the major part of the required feedback is supposed to have occurred already in the first, review phase.
The referees make a recommendation whether the article is to be accepted or declined, as usual. The article and the discussion remain on-line regardless of whether the article was accepted or not. Additional questions and discussion after the acceptance decision are welcomed.
The Review Protocol Page is used as a working structure for the entire reviewing process. During the first (review) phase it accumulates the successive debate contributions. If the referees make specific comments about the article in the refereeing phase, then those comments are posted on the RPP as well, but without indicating the identity of the referee. (In many cases the referees may return simply an " accept" or " decline" recommendation, namely if sufficient feedback has been obtained already in the review phase).