******************************************************************** ELECTRONIC NEWSLETTER ON REASONING ABOUT ACTIONS AND CHANGE Issue 98083 Editor: Erik Sandewall 25.11.1998 Back issues available at http://www.ida.liu.se/ext/etai/rac/ ******************************************************************** ********* TODAY ********* Today we send out David Poole's answer to Hector Geffner, in the discussion about David's ETAI submitted article. This discussion is now closed since David has indicated that he wishes to revise the paper and then it will be sent to the referees. We reopen the discussion for possible additional comments after the referees have returned their verdict. We can also announce a contribution today by Hector Levesque, Fiora Pirri, and Ray Reiter to the collection of reference articles for approaches to actions and change. The collection now has a very good coverage, and we also hope of course to have an article about the event calculus. Additional contributions are of course welcome as well. Remember, however, that the purpose of these reference articles is to provide the basic definitions for a well established approach, so that it shall not be necessary to repeat those definitions each time in later articles. The purpose is *not* to present new results. On the contrary, the reference articles should only contain material that is sufficiently stable that it can be expected to remain relevant for a while. Contributions to the review discussion about these reference articles is also welcome, and we suggest that the articles ought to be discussed from the point of view of their intended purpose. That is, the major question is whether the proposed text characterizes sufficiently well the current best practice in the approach being presented, and whether it is clear, understandable, and on the right level of detail. A second use of these reference articles may be to provide a more solid basis for the perennial comparative discussion about advantages and limitations of current approaches. During earlier periods of activity, this discussion was hampered by the fuzzyness about exactly what was included in each approach. If this discussion should be reignited after the appearance of the reference articles, then the new contributions will be included as continuations of the existing panel discussion on ontologies of actions and change. ********* ETAI PUBLICATIONS ********* --- RECEIVED RESEARCH ARTICLES --- ======================================================== | AUTHOR: Hector Levesque, Fiora Pirri, and Ray Reiter | TITLE: Foundations for the Situation Calculus | PAPER: http://www.ida.liu.se/ext/epa/cis/1998/018/tcover.html | [provisional] | REVIEW: http://www.ida.liu.se/ext/etai/ra/rac/017/ ======================================================== ABSTRACT This article gives the logical foundations for the situations-as-histories variant of the situation calculus, focusing on the following items: - The language of the situation calculus. - Foundational axioms for the domain of situations. - Axioms for an underlying domain theory. - The syntax and semantics of the logic programming language GOLOG. - Axioms for knowledge and sensing actions. - Essential metatheoretic results about the situation calculus. --- DISCUSSION ABOUT RECEIVED ARTICLES --- The following debate contributions (questions, answers, or comments) have been received for articles that have been submitted to the ETAI and which are presently subject of discussion. To see the full context, for example, to see the question that a given answer refers to, or to see the article itself or its summary, please use the web-page version of this Newsletter. ======================================================== | AUTHOR: David Poole | TITLE: Decision Theory, the Situation Calculus and Conditional | Plans | PAPER: http://www.ida.liu.se/ext/epa/cis/1998/008/paper.ps | REVIEW: http://www.ida.liu.se/ext/etai/received/actions/008/aip.html ======================================================== -------------------------------------------------------- | FROM: David Poole | TO: Hector Geffner -------------------------------------------------------- > I've enjoyed reading your paper "Decision Theory, the Situation > Calculus and Conditional Plans". I find it hard to disagree on > your general point that a general model of action must combine > elements from logic and probability/utility theory, so my questions > and comments are addressed more to the question of how this > combination is achieved in your proposal. I'll try to comment > first on the more general issues. > > 1. In page 24, you say "The representation in this paper can be seen > as a representation for POMDPs". > > If so, I wonder, why not present the representation in that way > from the very beginning? Wouldn't that make things much clearer? One of the problems with writing a paper that is trying to bridge different areas is to try to explain it for all readers (or to keep them all equally unhappy). I suppose I have succeeded when POMDP researchers say "this is just a representation for POMDPs" and when situation calculus researchers claim "this is just the situation calculus with some some probability added" or those studying Bayesian networks claim that "this is just a rule-based representation for dynamic Bayesian networks with actions and observables". Describing it explicitly in terms of POMDPs may have made it easier for POMDP researchers, but not necessarily for everyone else. > Namely, a POMDP is described by (see [1]): > > A - state space > B - actions > C - transition probabilities > D - cost function > E - observations > F - observation probabilities > > Then the constructions in your language could be seen as a > convenient/concise/modular way for specifying these components. I thought I did do this. (At least I was trying to do this, and I knew when I was finished when I had defined all of the components of a POMDP). The actions and observables of Definition 2.6 are the same as B and E. I specify C, D & F using rules. In particular I represent the cost function in terms of rules that imply the $utility$ predicate (Section 2.6), and observation probabilities in terms of rules that imply the $sense$ predicate (Section 2.7). > Actually, once the mapping from the language to these components > is determined, no further semantics would be needed (i.e., the > probabilities of trajectories, the expected utility of policies, > etc., are all defined as usual in POMDPs). > This is actually the way we do things in [2]. That is essentially what I did. There are some problems with just relying on the definition of a POMDP. This has to do with what is a state. In the formal work, the notion of state is taken as primitive. However, for a well-defined semantics we need to be clearer. A state could either mean: (a) everything that is true at a stage. This would include the actual time (if it is relevant), the values of sensors, the accumulated reward, etc. (b) a sufficient statistic about the history to render the future independent of the past given the state (which, by definition is Markovian). This typically doesn't include the time, the output of the sensors, or the accumulated reward. I have taken the approach of (a). Most POMDP researchers take the approach of (b) because their algorithms reason in terms of the state space. They therefore want to keep the state space as small as possible. I don't want to reason at the level of the state space, but instead at the level of the propositions. > 2. I can see some reasons why not to present your representation > as a "front end" to POMDPs. In general I don't want to do this, because I don't think that reasoning in terms of the state space is the right thing to do. What we can learn from AI is that we want to reason in terms of propositions or random variables, not in terms of the state space. One of the themes of my work is to find compact representations and to exploit the compactness for computational gains. Thus, I don't want to present this as a "front end" to POMDPs. If you mean following the semantic intuitions of POMDPS, then I think I do. But POMDPs are just one of the formalisms I am building on. They have nothing to say about useful independence assumptions or about concise descriptions of actions (state transition functions), which is what this paper is mainly about. > - you want to be more general; e.g., be able to deal with cost > functions that depend on the system history. If this is the case, > I'd suggest to introduce "generalized" POMDPS where (cumulative) > cost functions do not have to be additive (btw, in such POMDPS > belief states are not necessarily sufficient statistics ..) (unless you include the accumulated reward in the state.) > - you want to accommodate multiple agents. Yet this is not done > in this paper, but even in that case, multi-agent POMDPs > could be defined as well, and have been defined in the completely > observable setting (e.g., see Littman). This was done in my AIJ paper, "The Independent Choice Logic for Modelling Multiple Agents Under Uncertainty". I actually don't think that the ICL_SC formulation is as good for this as the formalism in that paper. > - you are not interested in determining an optimal or near-optimal > solution of the POMDP but are interested in analyzing the expected > cost of a policy supplied by a user in the form of a contingent > plan. Again, this is no problem from the point of view of POMDPs, > as long as the contingent plan determines a (stationary or > non-stationary) *function* of belief states into actions (see [4]). First let me point out that POMDPs are not defined in terms of belief states. It is a theorem about POMDPs (Astrom 1965) that says that an optimal policy can be represented in terms of a function from belief states (probability distributions over ordinary states) into actions. However, "optimal" here doesn't take into account computation time. It is pretty obvious that this isn't true for bounded optimal agents (where the computation time and the space are taken into account), which is where I would like to take this. I *am* interested in determining an optimal or near-optimal solution (but one that takes computation time and space onto account). I want to be able to define the expected utility of any strategy the agent may be carrying out, whether it maintains a belief state or not. No matter what program the agent is following, it can be described in terms of a conditional plan that describes what the agent will do based on the alternate sequences of observations (i.e., its function from observation sequences into actions). Note that this doesn't refer to anything inside the agent. The agent could be reasoning with a probability distribution over states, remembering a few previous observations, just reacting to its current observations, or actually following a conditional plan. The important thing is what it does is based on what it observes. What I am advocating is good hacks; the best agent won't do exact probabilistic reasoning. We need good strategies, perhaps like your RTDP-BEL. However, it is not obvious (to me anyway) that the bounded optimal solution will necessarily be an approximation to the (unbounded) optimal solution. The best real (bounded time and bounded space) agent may do something quite different than approximating belief states. Unless we have some representation that lets us model the expected utility of an agent following its function from observation and action history into actions (i.e., its transduction), we won't be able to specify when one agent is better than another. The conditional plans (policy trees) let us model this, without any commitment to the internal representation of the agent. > Indeed, you require *more* than this when you demand (page 18), > that the tests in conditions of the plan, be *directly observable* > at the time when the conditions need to be evaluated. > This is not required in POMDPs [4] or in possible world accounts [3], > where the test may be known indirectly through other observations. NO! I mean that the conditions are those things that are observable. I mean whatever evidence (the "other observations") that the POMDPs or possible worlds accounts take into account. This is a standard technique in Bayesian networks/influence diagrams; we create a variable that represents the actual value sensed. I am quite happy to have this as a (state) variable. You don't want it as a part of your state because it isn't necessary to make the system Markovian. This is fine, but I can model these "other observations" using the "sense" predicate. Is the sense predicate part of the state? I don't know; that is one reason why I didn't blindly accept the notion of state. > 3. A final comment about a non-trivial problem that results from > the decision of removing all probabilities from state transitions > transferring them into suitable priors in "choice space". > > Consider the effect of an action "turn_off" on a single boolean > fluent "on" with transition probabilities: > > P( on=false | on=true; a=turn_off) = .5 > P( on=false | on=false; a=turn_off) = 1 > > > Let's say that initially (on = true) and then we perform a > sequence of N consecutive "turn_off"s. Then the Probability > of (on = false) is .5^N, which clearly depends on N. > > I don't see how this behavior could be captured by priors > over choice space. This seems to be a strong limitation. > It looks as if probabilistic actions cannot be handled > after all. Is this right? Here is how I would represent this: Facts: on(do(turn_off,S)) <- on(S) & off_failed(S). on(init). C_0 contains {off_failed(S),off_succeeded(S)} for all situations S P_0(off_failed(S)) = 0.5 for all situations S [Note that the completion of the facts for on gives your two conditional probabilities.] We can then derive P(on(do(turn_off,do(turn_off,do(turn_off,init)))))= 0.5^3 just as you want (similarly for any N). We need an assumption for each time step. The minimal explanation for on(do(turn_off,do(turn_off,do(turn_off,init)))) is {off_failed(init), off_failed(do(turn_off,init)), off_failed(do(turn_off,do(turn_off,init))}. In all worlds with this explanation true, the light is on in the state after three turn_offs. The probability of these worlds sum to 0.5^3. Any probabilistic actions that you can represent in a dynamic Bayesian network, I can represent in the ICL_SC. Moreover I can represent it at least as succinctly as the dynamic Bayesian network (up to a constant factor) and sometimes exponentially (in the number of state variables) more succinctly (when there is context-specific independence). The mapping is local and modular. Thanks for taking the time to read and comment on the paper. I hope my comments help makes the paper and what I am trying to do clearer. Let me stress that this paper has nothing to say about computation. Whereas your work was motivated by being a front end to an actual POMDP algorithm; the motivation here relies on being able to deliver on POMDP algorithms that can exploit the conditional and contextual independencies of the representation. Unfortunately, I can't show you such algorithms now. But we are working on it. David ******************************************************************** This Newsletter is issued whenever there is new news, and is sent by automatic E-mail and without charge to a list of subscribers. To obtain or change a subscription, please send mail to the editor, erisa@ida.liu.se. Contributions are welcomed to the same address. Instructions for contributors and other additional information is found at: http://www.ida.liu.se/ext/etai/actions/njl/ ********************************************************************