Presumably, you need one of these predicates for each action type, e.g. putdown_succeeds , goto_succeeds , etc. Therefore, why not introduce a generic predicate succeeds(action, S) , in which case, you can write succeeds(pickup(key), S) , succeeds(goto(loc), S) , etc? Similarly, you can have a relation fails(action, S) . In this case succeeds would not only be a function of the action and the situation, but also a function of the condition being achieved (in this case that the key is being carried), and a context (in this case that the robot and the key are at the same position). But this is all that the rules say. The rules should be seen as axioms of conditional effects (see below).

Now, C₀ contains union(succeeds(pickup(key), S), fails(pickup(key), S)) , where the union is over all ground instances of these atoms.

I am not sure what you mean by the union, but C₀ would contain infinitely many 2-element sets. (See the answer to the next point.)

Incidentally, your specifications of what sets are in C₀ , e.g. bottom p. 13, is an abuse of notation, since it looks like an object level sentence, when in fact it is a metalevel statement for which universal quantification over situations is not defined.

Yes. I mean for each situation term S the set {pickup_succeeds(S), pickup_fails(S)} is in C₀ . That is, I am quantifying over terms in the language.

Here C₀ is a set of 2-element sets of ground terms. In this case C₀ is infinite but countable. I could have made it a set of open formulae, in which case it could be finite, but this would have complicated the presentation for no obvious gain.

(b) In axiomatizing at on p. 20, you do not include a goto_succeeds atom in the clause body. Does the atom would_not_fall_down_stairs(S) play this role?

Yes.

2. In the conditional plan of p. 21, don't you want at(key, r101) instead of at_key ?

No. at_key is the sensor defined in Example 2.11 that (noisily) senses whether the robot is at the same position as the key. The robot can only condition on its sensed values. The robot doesn't know if the key is at room r101; it only knows what its sensors tell it. [You could also imagine that a robot could also condition on internal state variables, and what can be computed from internal state variables, which is hinted at in Section 2.10.]

3. Consider the axiom of Example 2.9 and your comment "Note that this implies that putting down the key always succeeds". Suppose not. Then is it the case that the axiom should be modified by replacing the atom A =/ putdown(key) by something like

¬ (A = putdown(key)_^putdown-succeeds(key, S))

If so, perhaps this should be pointed out. (See my call below for a guide for the perplexed.)

This is sort of correct, but it doesn't get the probabilities right. I would expect that the probability of keeping the key to be independent of of the other two cases. There are three free parameters to be assigned probabilities, so we need three choice alternatives (when we only have binary alternatives). See the guide for the perplexed below; I do this example in more detail.

4. In the axiom of Example 2.9, you include a choice keeps_carrying(key, S) , which you introduce because "there may be a probability that the agent will drop the key". You then axiomatize this with P0(keeps_carrying(key, S)) = .95 But I can't think of any realistic way to axiomatize this probability when only the parameter S is available.
Basically, you want to describe the probability that the robot doesn't lose the key during the interval between the occurrence of the last action in S , when carrying(key, S) was true, and the occurrence of the next action A . In other words, you want to describe the probability that the robot is still carrying the key at situation do(A, S) given it is carrying it in situation S . But that depends, at the very least, on WHEN this next action A occurs. Now on one account of adding time to the sitcalc (Reiter, Proc KR'96), an action can take an additional temporal argument denoting that action's occurrence time, for example, openDoor(t) . Then one expects that P0(keeps_carrying(key, S)) given that openDoor(10) is the next action should be greater than this probability given that openDoor(1000) is the next action. So as far as I can tell, the choice keeps_carrying(key, S) should take another argument, namely A , or perhaps simply time(A) , where time(A) is the occurrence time of the next action A .

Yes. That is right. If the agent could drop the key at any time, we need to model (or learn) how the agent may drop the key as a function of time. Presumably this would be modelled as a Poisson process (assuming it is equally probable that the agent can drop the key at any time). As outlined on page 1, I haven't dealt here explicitly with time.

The axiomatization I gave assumes that the agent can hold on to the key tightly enough so whether it drops the key isn't a function of time ;^}

The need for action preconditions

On p. 8, you claim "None of our representations assume that actions have preconditions; all actions can be attempted at any time." I'm skeptical of this claim. Consider your example domain, where you adopt the axioms P0(pickup_succeeds(S)) = .88 Just prior to this, you say " P0(pickup_succeeds(S)) reflects how likely it is that the agent succeeds in carrying the key given that it was at the same position as the key and attempted to pick it up". So you can't seriously write P0(pickup_succeeds(S)) = .88 because here S is any situation, including one in which the agent is 1000 kms from the key. I take it that what you intended to write was something like

at(key, Pos, S)_^at(robot, Pos, S) ·-> P0(pickup_succeeds(S)) = .88 (*)

If this is the case, then it is exactly here where you are using action preconditions, namely, in characterizing those situations in which an action has a high probability of succeeding. It also appears that you are appealing to action preconditions in the axiom of Example 2.8 (namely, at(robot, Pos, S)_^at(key, Pos, S) ). Probably, these atoms should be omitted because their effects are realized in the above axiom (*).

There are two separate issues here, first the meaning of preconditions, and secondly how the atomic choices act in contexts where they aren't applicable.

There are three things that could be meant by a precondition of an action:

A condition under which the action can be done.
A condition under which some effect follows from the action.
A condition under which the action should be done.

It is only the second, which are often called conditional effects, that I represent. The first is what I meant as "preconditions", which I don't have. (These correspond to your Poss predicate, if I am not mistaken). This shouldn't really be seen as a feature, but as a necessary evil. It means that I have to axiomatize the effect of an action under all conditions (including the conditions under which you may say the action is not possible). I do this because, in general, an agent may carry out an action even if it doesn't know that the action is "possible" (in the normal sense). The third point, where the action should be done, is obtained because I want to build sensible agent that maximize utility, but I also want to be able to evaluate how good arbitrary agents are, not just good ones.

The second issue is a bit more subtle. Let's assume that the choice alternatives are all binary (non-binary alternatives are useful when a relation is functional). In this case, there are the same number of alternatives as there are free parameters (conditional probabilities that can be independently assigned) in the probability model. For example, when converting a Bayesian network to an ICL theory, there are the same number of alternatives as there are numbers to be assigned in the Bayesian network. You typically have, for each effect e, rules of the form:
e <- context₁ _^ ac₁
...
e <- context_k _^ ac_k
where the context_i's are disjoint and covering. In this case we can assign the probabilities of the atomic choice ac_i as the conditional probability:
P₀(ac_i) =P(e|context_i)

Your question on conditions for P₀ could be recast as: What happens to the ac_i in the context when the corresponding rule is not applicable? Basically it gets marginalized out. We don't have to worry about it. The abductive characterization of the ICL shows that we only need to worry about the atomic choices that are needed to prove a goal (in our case the utility and the observations).

This should be seen in the context of an overriding theme of this work. We are not trying to see how much we can put into a formalism, but how little we can get away with. We don't need conditions on the probabilities of the atomic choices, so we don't have them.

Finally, you often talk about actions being "attempted", as in the above quotation. I found this particularly confusing since nothing in your ontology deals with action "attempts". Actions simply realize their effects with certain probabilities. You do not, for example, have an action try_pickup , as do other authors (e.g. James Allan) who do try to axiomatize action attempts . In this connection, I had a hard time making sense of footnote 9, since it motivates the axiom of Example 2.9 in terms of talk about "trying" to pick up the key. In fact, even after reading this material several times, I couldn't figure out footnote 9 until, much later, I began to think about how your approach compares with Bacchus, Halpern and Levesque. See my comments below on BHL.

Perhaps the term "attempted" is misleading. What I mean is that the robot sends the motor control for the action, irrespectively of whether the robot can do it (in this sense it is attempting the action). By the pickup(key) action I mean a particular motor control (the one that would achieve carrying the key if it is at the same position as the key and the pickup succeeded). This action can be carried out in any context; it just doesn't achieve carrying the key in these contexts.

Section 3.3

I must confess that I couldn't make much sense of the discussion in Section 3.3, especially your arguments that the sitcalc is inherently unsuitable for modeling multiple, possibly reactive agents. It's true that Golog lacks the right control structures for doing this, but your comments on p. 26 and p. 27 (p. 26 "In order for a situation calculus program to ... and does the appropriate concurrent actions." p. 27 "When there are multiple agents ... or complex actions.") suggest to me that you are not aware of more recent work by De Giacomo, Lesperance and Levesque on Congolog that addresses precisely these issues (Proc. IJCAI 97). Congolog provides reactive rules (interrupts) and interleaving concurrency in a semantically clean way, all formulated within the sitcalc. I believe that Congolog addresses all your criticisms of the sitcalc, except, of course, it does not (yet) incorporate utilities.

"It's true that Golog lacks the right control structures for doing this..." I am quite happy with the agent having simple control structures. I am concerned about modelling asynchronous external actions or events (by other agents or by nature).

It is not clear to me that Congolog addresses this, in the sense that all of the concurrency in the language is internal to the agent. It doesn't model exogenous actions (they generated externally). This would seem to mean that external events are restricted to occur at the situations defined by the primitive events. [I am looking forward to being told I am wrong.]

Moreover, with respect to the mix of the situation calculus and time, it is clear now that we can do what we want when we have both the situation calculus and time. But when you have statements saying when an event occurred, you don't need situations. Thus parsimony would suggest that we do away with situations. [I would expect that this should form a different thread.]

Comparison with Bacchus, Halpern and Levesque

So far as I know, there have only been two proposals for augmenting the sitcalc with probabilities, yours and BHL. As I indicate below, these two approaches are considerably different, and therefore I believe it is important to elaborate on these differences more than you do (top p. 25) especially the substantial ontological differences.

Yes, I do need to discuss this further (that is the joy of ETAI discussion of papers; we can retrospectively expand those parts that are of most interest).

First let me say that I am not claiming that mine is better than theirs, nor that it is different just for the sake of being different. We start from very different perspectives. In their introduction, they explicitly contrast their work with the work starting from Bayesian networks. In particular, they claim "...they [Bayesian networks] have difficulties in dealing with features like disjunction ...". I take a Bayesian perspective that disjunction is not a feature we want! What defines a Bayesian is that probability is a measure of belief, that any proposition can have a probability (both of which BHL would agree with) and that all uncertainty should be measured by probability. Thus you need to keep in mind that the different design choices could stem from this different perspective.

For BHL, there is no action_succeeds and action_fails fluents. Instead, an action whose outcome depend on nature is represented by the nondeterministic choice of simpler, deterministic actions corresponding to each different outcome. For example, the nondeterministic action flipaCoin is represented as the complex action

flipaCoin = flipHeads|flipTails

where | is GOLOG's operator for nondeterministically choosing between the two actions flipHeads and flipTails . For your ongoing example, pickup(key) would be represented by

pickup(key) = pickup_succeeds(key)|pickup_fails(key)

When an agent executes the action pickup(key) in situation S , nature steps in and selects which of the two actions pickup_succeeds(key) and pickup_fails(key) actually occurs, and it does so with frequencies determined by an axiom of the form:

P0(pickup_succeeds(key), S) = p <-> (suitable conditions on S and p)

In other words, your atomic choice of bottom p 13, and your axiom for P0 top of p 14 are represented by BHL in the way I have indicated above.

Except that BHL don't model probabilistic actions (in their IJCAI-95 paper, which is the only one I had seen) although I believe they could. All of their probabilities are within the agent, so such axioms would have to be part of the formula for updating belief. [I just found a new paper, Reasoning about Noisy Sensors and Effectors in the Situation Calculus, 1998, at Faheim Bacchus's web site that does this. I'll call this BHL98.]

One other important point to note is that the probabilistic part of the ICL is a restricted form of Bacchus's and Halpern's logics of probability as belief. The translation is that each atom that isn't an atomic choice is defined by Clark's completion of the rules defining that atom. We need statements that the elements of an alternative are exclusive and covering and that the atomic choices that aren't in the same alternative are probabilistically independent. Thus the translation doesn't require "suitable conditions on S and p".

A general framework for probability and action is intuitively straightforward (see Halpern and Tuttle (1993) for a general semantic framework). You can think about forward simulating the system. Nondeterministic/stochastic actions split the worlds and impose a probability over the resulting worlds. In the ICL we handle splitting the worlds using one simple mechanism: independent choice alternatives. I treat actions by the agent and other stochastic mechanisms (e.g., the ramifications of actions) in exactly the same way. BHL-98 split the worlds by nondeterministic actions (as you outline). They would then need a different mechanism for stochastic ramifications.

These are very different ontological and representational commitments than yours, and I think they deserve a deeper analysis than your discussion provides. Here are a few important differences these commitments lead to:
1. Primitive actions for BHL are entirely different than yours. BHL primitive actions would be things like pickup_succeeds(key) , flipHeads etc, i.e. they consist of the deterministic actions that nature can perform, together with those deterministic actions (if any) that the agent can perform. For you, primitive actions are typically nondeterministic, and correspond to those actions that the agent can perform (but with nondeterministic outcomes chosen by nature), e.g. pickup(key) , flipaCoin .

Yes.

2. Therefore, for BHL, situation terms describe the "actual" history selected by nature in response to the agent performed actions, e.g.

do(pickup_fails(key), do(flipHeads, S0)).

For you, situations denote the history of agent performed actions, e.g.

do(pickup(key), do(flipaCoin, S0)).

The BHL axioms all refer to the former situations. In your ontology, only the latter are allowed.

Yes. I tried to be explicit about both of these points.

3. In BHL, all primitive actions ( pickup_succeeds(key) , flipHeads ) are deterministic and successor state axioms (the causal laws + solution to the frame problem) are formulated only wrt these. For you, the causal axioms are formulated wrt the (nondeterministic) agent's actions ( pickup(key) , flipaCoin ); hence the need for your fluents pickup_succeeds(S) and pickup_fails(S) . These function as random variables that determine the effects of the nondeterministic action pickup(key) . One consequence of this difference is that for BHL, successor state axioms are perfectly straightforward, whereas for you, because of these random variables, the causal laws become, at least conceptually, somewhat opaque (e.g. footnote 9).

Except I would think that my successor state axioms are perfectly straightforward. I get by with just Clark's completion, in much the same way as other logic programming representations of action. Frame and ramification axioms are treated the same (in fact I don't even think of them as different, and the paper doesn't distinguish them). I get by with Clark's completion because it is defined with respect to each world, where I just have a simple acyclic logic program. Maybe the guide to the perplexed below will help.

Moreover, BHL use "classical" action precondition axioms, in contrast to you (see above about action preconditions).

So what happens if the agent doesn't know if the precondition holds, but still wants to do the action? This is important because, with noisy sensors, agents know the actual truth values of virtually nothing outside of their internal state, and most preconditions of actions are properties of the world, not properties of the agent's beliefs.

4. For BHL, your nondeterministic agent actions like pickup(key) , are not part of the language of the sitcalc. Rather, they are nondeterministic GOLOG programs. Since you also introduce a notion of a program (Sections 2.8, 2.10), there are some natural comparisons to be made. The obvious (and perhaps only) important difference is in the nature of the primitive program actions. For BHL programs, the primitives are the deterministic actions described above (e.g. pickup_succeeds(key) , flipHeads ) whereas for you, they are the agent actions like pickup(key) .

One major difference is that in BHL, it is the agent who does probabilistic reasoning. In my framework, the agent doesn't (have to) do probabilistic reasoning, it only has to do the right thing (to borrow the title from Russell and Wefald's book). The role of the utility is to compare agents. This is important when we consider what Good calls type 2 rationality and Russell calls bounded rationality (see the work by I.J. Good, Eric Horvitz, and Stuart Russell for example), where we must take into account the time of the computation done by the agent in order to compute utility. I would expect that optimal agents wouldn't do (exact) probabilistic reasoning at all, because it is too hard! That is why we want a language that lets us model agents and their environment, and defines the expected utility of an agent in an environment. One of the reasons I didn't want to just add "time" into the framework is that I want a way to take into account the computation time of the agent (the thinking time as well as the acting time), and that makes it much more complicated (I wanted to get the foundations debugged first).

Suggestion: A Guide for the Perplexed

Your paper uses a single, ongoing example to illustrate the axiomatization style required by your approach. Personally, I couldn't abstract, from the example, a set of guidelines by which I could invent my own axiomatization for a different example of my choosing. I think it would considerably enhance the paper if you provided such guidelines. For example, for each action, tell the reader what axioms need to be specified. What choice of fluents needs to be made? (For example, there seem to be at least two categories of fluents - "normal" fluents like carrying(X, S) , and atomic choice fluents. How does one decide on the latter, when given the primitive actions and "normal" fluents of an application domain?) Similarly, for each fluent, what axioms need to be specified? Also, what needs to be axiomatized about the initial situation? Finally, what do all these axioms look like syntactically?

Thanks for the suggestion. The following is a paraphrase of how we tell people to represent knowledge in Bayesian networks, translated into the language of the ICL.

First I'll do the propositional (ground) case. We totally order the propositions. The idea is that we will define each proposition in terms of its predecessors in the total ordering. For each proposition e, a parent context is a conjunction of literals made up of the predecessors of e, such that the other predecessors are conditionally independent of e given the context. We find a set of mutually exclusive and covering set {context₁,...,context_k} of parent contexts. (The atoms appearing in one of the parent contexts are the parents of e in the corresponding Bayesian network). If e is always true when context_i is true, we write the rule:
e <- context_i
If P(e | context_i) isn't 0 or 1, we create a rule:
e <- context_i _^ ac_i
and create an alternative {ac_i,nac_i} (where these are both atoms that don't appear anywhere else), with the probability:
P₀(ac_i) =P(e|context_i)
If the context is empty ( e doesn't depend on any of its predecessors) we don't need to create a rule; we can just make e an atomic choice.
When the probability given the context is 0, we don't write any rules.

The case for the ICL with the situation calculus is similar. Intuitively, we make the total ordering of the propositions respect the temporal ordering of situations. We write how a fluent at one situation depends on fluents (lower in the ordering) at that situation and on fluents at previous situations. Note that this rule automatically handles ramifications as well as frame axioms. The total ordering of the fluents guarantees the acyclicity of the rule base and that we don't have circular definitions. The initial situation is handled as any other, but the predecessors in the total ordering are only initial values of fluents (and perhaps atoms that don't depend on the situation).

The only thing peculiar about this is that we often have fluents that depend on values at the current as well as the previous situation. This is important when there are correlated effects of an action; we can't just define each fluent in terms of fluents at the previous situation. But this also means that we don't treat frame axioms and ramification axioms differently.

Let's look at how we would do Examples 2.8 & 2.9, with putting-down the key possibly failing. Lets consider when carrying would be true. Suppose carrying doesn't depend on anything at the same time, so we can ask what are the contexts on which carrying(O,do(A,S)) depends. It would seem there are three such contexts:

A=pickup(O)), and the robot and O are at the same position.
The action isn't pickup(O)), the robot was carrying O in situation S, and the action wasn't to put down O.
The robot was carrying O in situation S, and the action was to put down O.

In no other cases would the robot be carrying O.

The robot would be carrying O in the first context if the pickup succeeds (hence the name of the atomic choice in Example 2.8). This results in the rule:

carrying(O,do(pickup(O),S)) <-
   at(robot,Pos,S) &
   at(O,Pos,S) &
   pickup_succeeds(O,S).

The robot would be carrying O in the second context if it keeps carrying the key (hence the name of the atomic choice in Example 2.9). This results in the rule:

carrying(O,do(A,S)) <-
   carrying(O,S) &
   A \= putdown(O) &
   A \= pickup(O) &
   pickup_succeeds(O,S).

The robot would be carrying O in the third context if the putdown failed. Thus, to account for the chance that the putdown failed, we would write the extra clause:

carrying(O,do(putdown(O),S)) <-
   carrying(O,S) &
   putdown_fails(O,S).

where putdown_fails(O,S) is an atomic choice, with P₀(putdown_fails(O,S)) the conditional probability that the robot is still carrying O after it has carries out the putdown(O) action.

Note that putdown may succeed in doing other things, it just fails in stopping the robot from carrying O. So maybe putdown_fails should be something more like putdown_fails_to_stop_the_robot_from_carrying.

Footnote 9 is there to try to explain what the rules mean if the rule bodies aren't disjoint. We can interpret what these rules mean, but it usually isn't what you want (for the cases where both rules are applicable). When we write these rules, having non-disjoint rules is usually a bug (the implementation available from my web page warns when it finds non-disjoint rules.)

In connection with this last question, notice that on p. 20, the clause with head at(X, P, S) resembles neither an effect axiom (as in Example 2.8) nor a frame axiom (as in Example 2.9).

It is a ramification of the robot moving. The above methodology handles these ramifications in the same way it handles frame axioms (and any other axioms).

Thanks for your comments and questions. I hope this made the paper clearer. I would be happy to answer any other questions you may have (or engage in debate about the usefulness of this).

David

Q3. Hector Geffner (19.11):

David,

1. In page 24, you say "The representation in this paper can be seen as a representation for POMDPs".

If so, I wonder, why not present the representation in that way from the very beginning? Wouldn't that make things much clearer?

Namely, a POMDP is described by (see [1]):

A - state space
B - actions
C - transition probabilities
D - cost function
E - observations
F - observation probabilities

Then the constructions in your language could be seen as a convenient/concise/modular way for specifying these components.

Actually, once the mapping from the language to these components is determined, no further semantics would be needed (i.e., the probabilitities of trajectories, the expected utility of policies, etc., are all defined as usual in POMDPs). This is actually the way we do things in [2].

2. I can see some reasons why not to present your representation as a "front end" to POMDPs.

you want to be more general; e.g., be able to deal with cost functions that depend on the system history. If this is the case, I'd suggest to introduce "generalized" POMDPS where (cumulative) cost functions do not have to be additive (btw, in such POMDPS belief states are not necessarily sufficient statistics ..)
you want to accommodate multiple agents. Yet this is not done in this paper, but even in that case, multi-agent POMDPs could be defined as well, and have been defined in the completely observable setting (e.g., see Littman).
you are not interested in determining an optimal or near-optimal solution of the POMDP but are interested in analyzing the expected cost of a policy supplied by a user in the form of a contingent plan. Again, this is no problem from the point of view of POMDPs, as long as the contingent plan determines a (stationary or non-stationary) function
of belief states into actions (see [4]). Indeed, you require more
than this when you demand (page 18), that the tests in conditions of the plan, be directly observable at the time when the conditions need to be evaluted. This is not required in POMDPs [4] or in possible world accounts [3], where the test may be known indirectly through other observations.
3. A final comment about a non-trivial problem that results from the decision of removing all probabilities from state transitions transferring them into suitable priors in "choice space".
Consider the effect of an action "turn_off" on a single boolean fluent "on" with transition probabilities:

P(on = false | on = true ; a = turn_off) = .5

P(on = false | on = false ; a = turn_off) = 1

Let's say that initially on = true and then we perform a sequence of N consecutive turn_off 's. Then the Probability of on = false is 0.5^N , which clearly depends on N .

I don't see how this behavior could be captured by priors over choice space. This seems to be a strong limitation. It looks as if probabilistic actions cannot be handled after all. Is this right?

References

[1] L. Kaebling, M. Littman, and T. Cassandra. Planning and Acting in Partially Observable Stochastic Domains, AIJ 1998

[2] H. Geffner, B. Bonet. High level planning with POMDPs. Proc. 1998 AAAI Fall Symp on Cognitive Robotics (www.ldc.usb.ve/~hector)

[3] H. Levesque. What's planning in the presence of sensing? AAAI96

[4] H. Geffner and J. Wainer. A model of action, knowledge and control, Proc. ECAI 98 (www.ldc.usb.ve/~hector)

A3. David Poole (24.11):

I've enjoyed reading your paper "Decision Theory, the Situation Calculus and Conditional Plans". I find it hard to disagree on your general point that a general model of action must combine elements from logic and probability/utility theory, so my questions and comments are addressed more to the question of how this combination is achieved in your proposal. I'll try to comment first on the more general issues. 1. In page 24, you say "The representation in this paper can be seen as a representation for POMDPs". If so, I wonder, why not present the representation in that way from the very beginning? Wouldn't that make things much clearer?

One of the problems with writing a paper that is trying to bridge different areas is to try to explain it for all readers (or to keep them all equally unhappy). I suppose I have succeeded when POMDP researchers say "this is just a representation for POMDPs" and when situation calculus researchers claim "this is just the situation calculus with some some probability added" or those studying Bayesian networks claim that "this is just a rule-based representation for dynamic Bayesian networks with actions and observables". Describing it explicitly in terms of POMDPs may have made it easier for POMDP researchers, but not necessarily for everyone else.

Namely, a POMDP is described by (see [1]):
A - state space
B - actions
C - transition probabilities
D - cost function
E - observations
F - observation probabilities
Then the constructions in your language could be seen as a convenient/concise/modular way for specifying these components.

I thought I did do this. (At least I was trying to do this, and I knew when I was finished when I had defined all of the components of a POMDP). The actions and observables of Definition 2.6 are the same as B and E. I specify C, D & F using rules. In particular I represent the cost function in terms of rules that imply the utility predicate (Section 2.6), and observation probabilities in terms of rules that imply the sense predicate (Section 2.7).

Actually, once the mapping from the language to these components is determined, no further semantics would be needed (i.e., the probabilities of trajectories, the expected utility of policies, etc., are all defined as usual in POMDPs). This is actually the way we do things in [2].

That is essentially what I did. There are some problems with just relying on the definition of a POMDP. This has to do with what is a state. In the formal work, the notion of state is taken as primitive. However, for a well-defined semantics we need to be clearer. A state could either mean:

(a) everything that is true at a stage. This would include the actual time (if it is relevant), the values of sensors, the accumulated reward, etc.

(b) a sufficient statistic about the history to render the future independent of the past given the state (which, by definition is Markovian). This typically doesn't include the time, the output of the sensors, or the accumulated reward.

I have taken the approach of (a). Most POMDP researchers take the approach of (b) because their algorithms reason in terms of the state space. They therefore want to keep the state space as small as possible. I don't want to reason at the level of the state space, but instead at the level of the propositions.

2. I can see some reasons why not to present your representation as a "front end" to POMDPs.

In general I don't want to do this, because I don't think that reasoning in terms of the state space is the right thing to do. What we can learn from AI is that we want to reason in terms of propositions or random variables, not in terms of the state space. One of the themes of my work is to find compact representations and to exploit the compactness for computational gains. Thus, I don't want to present this as a "front end" to POMDPs. If you mean following the semantic intuitions of POMDPS, then I think I do. But POMDPs are just one of the formalisms I am building on. They have nothing to say about useful independence assumptions or about concise descriptions of actions (state transition functions), which is what this paper is mainly about.

you want to be more general; e.g., be able to deal with cost functions that depend on the system history. If this is the case, I'd suggest to introduce "generalized" POMDPS where (cumulative) cost functions do not have to be additive (btw, in such POMDPS belief states are not necessarily sufficient statistics ..)

(unless you include the accumulated reward in the state.)

you want to accommodate multiple agents. Yet this is not done in this paper, but even in that case, multi-agent POMDPs could be defined as well, and have been defined in the completely observable setting (e.g., see Littman).

This was done in my AIJ paper, "The Independent Choice Logic for Modelling Multiple Agents Under Uncertainty". I actually don't think that the ICL_SC formulation is as good for this as the formalism in that paper.

you are not interested in determining an optimal or near-optimal solution of the POMDP but are interested in analyzing the expected cost of a policy supplied by a user in the form of a contingent plan. Again, this is no problem from the point of view of POMDPs, as long as the contingent plan determines a (stationary or non-stationary) function of belief states into actions (see [4]).

First let me point out that POMDPs are not defined in terms of belief states. It is a theorem about POMDPs (Astrom 1965) that says that an optimal policy can be represented in terms of a function from belief states (probability distributions over ordinary states) into actions. However, "optimal" here doesn't take into account computation time. It is pretty obvious that this isn't true for bounded optimal agents (where the computation time and the space are taken into account), which is where I would like to take this.

I am interested in determining an optimal or near-optimal solution (but one that takes computation time and space onto account). I want to be able to define the expected utility of any strategy the agent may be carrying out, whether it maintains a belief state or not. No matter what program the agent is following, it can be described in terms of a conditional plan that describes what the agent will do based on the alternate sequences of observations (i.e., its function from observation sequences into actions). Note that this doesn't refer to anything inside the agent. The agent could be reasoning with a probability distribution over states, remembering a few previous observations, just reacting to its current observations, or actually following a conditional plan. The important thing is what it does is based on what it observes.

What I am advocating is good hacks; the best agent won't do exact probabilistic reasoning. We need good strategies, perhaps like your RTDP-BEL. However, it is not obvious (to me anyway) that the bounded optimal solution will necessarily be an approximation to the (unbounded) optimal solution. The best real (bounded time and bounded space) agent may do something quite different than approximating belief states. Unless we have some representation that lets us model the expected utility of an agent following its function from observation and action history into actions (i.e., its transduction), we won't be able to specify when one agent is better than another. The conditional plans (policy trees) let us model this, without any commitment to the internal representation of the agent.

Indeed, you require more than this when you demand (page 18), that the tests in conditions of the plan, be directly observable at the time when the conditions need to be evaluated. This is not required in POMDPs [4] or in possible world accounts [3], where the test may be known indirectly through other observations.

NO! I mean that the conditions are those things that are observable. I mean whatever evidence (the "other observations") that the POMDPs or possible worlds accounts take into account. This is a standard technique in Bayesian networks/ influence diagrams; we create a variable that represents the actual value sensed. I am quite happy to have this as a (state) variable. You don't want it as a part of your state because it isn't necessary to make the system Markovian. This is fine, but I can model these "other observations" using the "sense" predicate. Is the sense predicate part of the state? I don't know; that is one reason why I didn't blindly accept the notion of state.

3. A final comment about a non-trivial problem that results from the decision of removing all probabilities from state transitions transferring them into suitable priors in "choice space".
Consider the effect of an action "turn_off" on a single boolean fluent "on" with transition probabilities:

P(on = false | on = true ; a = turn_off) = .5

P(on = false | on = false ; a = turn_off) = 1

Let's say that initially on = true and then we perform a sequence of N consecutive turn_off 's. Then the Probability of on = false is 0.5^N , which clearly depends on N .

I don't see how this behavior could be captured by priors over choice space. This seems to be a strong limitation. It looks as if probabilistic actions cannot be handled after all. Is this right?

Here is how I would represent this: Facts:

on(do(turn_off, S)) ·-> on(S)_^off_failed(S).

on(init).

C₀ contains {off_failed(S), off_succeeded(S)} for all situations S .

P₀(off_failed(S)) = 0.5 for all situations S

[Note that the completion of the facts for on gives your two conditional probabilities.]

We can then derive

P(on(do(turn_off, do(turn_off, do(turn_off, init))))) = 0.5³

just as you want (similarly for any N ). We need an assumption for each time step. The minimal explanation for

on(do(turn_off, do(turn_off, do(turn_off, init))))

is the set of the following:

off_failed(init)

off_failed(do(turn_off, init))

off_failed(do(turn_off, do(turn_off, init)))

In all worlds with this explanation true, the light is on in the state after three turn_off s. The probability of these worlds sum to 0.5³ .

Any probabilistic actions that you can represent in a dynamic Bayesian network, I can represent in the ICL_SC. Moreover I can represent it at least as succinctly as the dynamic Bayesian network (up to a constant factor) and sometimes exponentially (in the number of state variables) more succinctly (when there is context-specific independence). The mapping is local and modular.

Thanks for taking the time to read and comment on the paper. I hope my comments help makes the paper and what I am trying to do clearer. Let me stress that this paper has nothing to say about computation. Whereas your work was motivated by being a front end to an actual POMDP algorithm; the motivation here relies on being able to deliver on POMDP algorithms that can exploit the conditional and contextual independencies of the representation. Unfortunately, I can't show you such algorithms now. But we are working on it.

David

C3-1. Hector Geffner (26.11):

David,

thanks for your answers; they helped me a lot. Just a brief follow up on the status of your proposed framework: a new model? a new language? a new algorithm? all of them? ...

Best regards. -hector

1. In page 24, you say "The representation in this paper can be seen as a representation for POMDPs".
If so, I wonder, why not present the representation in that way from the very beginning? Wouldn't that make things much clearer?

One of the problems with writing a paper that is trying to bridge different areas is to try to explain it for all readers (or to keep them all equally unhappy). I suppose I have succeeded when POMDP researchers say "this is just a representation for POMDPs" and when situation calculus researchers claim "this is just the situation calculus with some some probability added" or those studying Bayesian networks claim that "this is just a rule-based representation for dynamic Bayesian networks with actions and observables". Describing it explicitly in terms of POMDPs may have made it easier for POMDP researchers, but not necessarily for everyone else.

I think I don't agree with this. I believe that POMDPs are a general and natural model for sequential decision problems that involve sensing. They are not for POMDPs researchers only; in the same sense, that logic is not only for logicians. And they are simple too (unlike some of the POMDP algorithms that are indeed complex).

I think that anyone dealing with sequential decision problems that involve sensing should know about POMDPs, whether or not they appeal to POMDP algorithms for solving them, and whether or not they deal with probabilities. I know this sounds dogmatic ... but it is the truth!!! :-)

No, really, POMDPs are very useful for understanding and identifying the different dimensions of a decision problem: transitions, costs, information; and at the same time they have little to do with probabilities (namely, even if all probabilities become zero or one, POMDPs are still very meaningful and don't reduce to anything else as 0-1 probabilities that would reduce for instance to logic).

On a more general note, I think there are three essential aspects to the work in planning and control in AI:

models
languages
algorithms

Models are about the mathematics of the task: what is a problem? what is a solution? what is an optimal solution? Languages are for describing these models in a convenient way. Algorithms are for computing the solutions.

I think these three ingredients are always present in approaches to planning and control in AI, and I believe it is useful to make to them explicit; even when the algorithms may take advantage of the particular language in which the model is represented (as in Strips planning).

I understand that you expect your language to be useful not only for specifying POMDPs in a convenient way, but also for solving them conveniently. That would be great. Yet even in that case, people using different POMDP algorithms could in principle benefit from your language for setting up their POMDP models. This could also be important, and indeed, in the recent AAAI Symposium on POMDPs, the need for good languages for building POMDPs for particular applications and for exchanging benchmarks was emphasized. May be your language, as well as other action languages suitably extended, could fill up that need. As you know, we have also been doing work in that direction.

Best regards.

- hector

C3-2. Chitta Baral (22.12):

Hi Hector and David:

I have read your exchanges in the ENRAC about the importance of POMDPs in decision problems involving sensing. I have the following two questions to Hector.

(i) I think POMDPs are very good for this particular type of problems. However, I am not sure they scale up in the following sense: If we elaborate sensing as obtaining knowledge then when we consider obtaining knowledge of the form (KKKf) where K is the modal opeartor `knows'. I am not sure if a POMDP based approach can scale upto this.

On the other hand it seems that the approach in Bachus, Levesque and Halpern (IJCAI 95) can scale up to this and they can be shown equivalent to a POMDP based approach in the special case where we only deal with one level of knowledge. (eg: S5 guarantees that.)

(ii) I don't understand why (perhaps you can help me here), in a POMDP formulation (for example, in Hector's AAAI fall symposium paper this year) we have the action a mentioned in P_a(o | s) . I feel it should not matter how s was reached, and we should just have P(o | s) . Otherwise observations are in some sense non-Markovian, as they not only depend on the state of the world, but also how that state was reached. (This is not that big a deal though.)

Best regards
Chitta

Q1. Anonymous Referee 1 (15.10.2001):

Q2. Anonymous Referee 2 (15.10.2001):

	P(on `=` false \| on `=` true ; a `=` turn_off) `=` .5
	P(on `=` false \| on `=` false ; a `=` turn_off) `=` 1

	off_failed(init)
	off_failed(do(turn_off, init))
	*off_failed(do(turn_off, do(*turn_off, init)))**