Moderated by Erik Sandewall. |
David PooleDecision Theory, the Situation Calculus and Conditional Plans |
The
article
mentioned above has been submitted to the Electronic
Transactions on Artificial Intelligence, and the present page
contains the review discussion. Click here for
more
explanations and for the webpage of theauthor, David Poole.
Overview of interactions
Q1. Uffe Kjærulff (30.8):
The focus of the paper seems to be on representational issues, which is fine. But, being a Bayesian-network person, I'm would find it interesting to see how computations of expected utilities of plans are performed in the ICL framework. This issue is only touched upon very briefly in the paper, mentioning that it involves enumerating all possible cases. In my understanding that amounts to constructing the corresponding decision tree (which may be huge), and then computing the probability and utility for each leaf of the tree. So, my question boils down to: Is there an efficient computational scheme for ICL? Uffe Kjærulff Dept. of Computer Science, Aalborg University, Denmark
A1. David Poole (14.9):
The focus of the paper seems to be on representational issues, which is fine. But, being a Bayesian-network person, I would find it interesting to see how computations of expected utilities of plans are performed in the ICL framework. This issue is only touched upon very briefly in the paper, mentioning that it involves enumerating all possible cases. In my understanding that amounts to constructing the corresponding decision tree (which may be huge), and then computing the probability and utility for each leaf of the tree. Thanks for the question! First, ICL theories can be seen as representations of Bayesian networks; there is a local translation between groundings of the ICL theories and Bayesian networks. ICL theories (with only choices by nature) can be seen as first-order rule-based representations of Bayesian networks. Thus any Bayesian network algorithms can, in principle, be used for the ICL. There are two main computational advantages of the ICL approach. The first is that they can exploit logic-based reasoning techniques for probabilistic reasoning (although it turns out that once the link is realized, it isn't restricted to ICL theories). Secondly they give us an opportunity to exploit, not only the graphical independence of a Bayesian network, but also the contextual independence that i natural in the rule based representations. I will expand briefly on each of these here. The ICL (with only choices by nature) can be seen as a way to write first-order Bayesian networks as rules. There is a local translation between ICL theories and Bayesian networks. Note that a Bayesian network has nothing to say about how a node depends on its parents. The ICL lets us write the conditional probabilities as a set of rules. (This is often a more compact representation than conditional probability tables when there are contextual independencies). To translate an ICL theory into a Bayesian network: ground the theory (substituting ground terms for the free variables); the ground atoms are the nodes of the Bayesian network; the atoms in the bodes of the rules that imply a become the parents of a. There is a lot more structure that can be expressed by the rules than in the network (in particular the contextual independencies). Note that the acyclicity of the logic program ensures that the resulting network is acyclic and (even when the resulting network is infinite) there are only finitely many ancestors for any node (which means you only need a finite subset of the graph to answer any conditional probability statement). To translate a Bayesian network to an ICL theory, you write as rules how the parents of a influence a. The atomic choices are introduced as extra conditions on the rules to let us interpret the rules logically. For more details see (Poole 1993). The first way that the ICL can be used computationally is adapting the logic-based algorithms for probabilistic reasoning. In particular, the algorithms for consistency-based diagnosis (such as the focusing algorithms of de Kleer), based on the notions of conflicts, can be adapted to compute posterior probabilities in the ICL (and so for Bayesian networks). This works well when there are skewed probability distributions. This has been explored in (Poole 1996). Another approach is to exploit the structure that can be expressed in a Bayesian network as well as the rule structure of the ICL. This is ongoing work with Nevin Zhang. I'll first sketch how the efficient structured Bayesian network algorithms (such as clique-tree propagation and more query-oriented algorithms such as D'Ambrosio's SPI, Zhang and Poole's VE or Dechter's bucket elimination) can be seen as ways to exploit the factorization of the joint probability distribution. The basic operation is to sum out a (non-observed, non-query) variable. Algorithms such as clique-tree propagation are efficient because they can compile this operation into an secondary structure (the clique tree or join tree) that works by message passing in the join tree so that the posterior distribution for all variables can be obtained in twice the time it takes to compute the posterior on one. The basic operation of the message passing from one clique to another is to sum out the variable that appears in one clique that doesn't appear in the other. It turns out that you can carry out a similar operation on the rule base directly. We have been working on a rule-based variation of the query-oriented algorithms I call probabilistic partial evaluation (Poole 1997). Eliminating a variable turns out to be like resolution. When the rules give no structure beyond that of the Bayesian network (i.e., there is no contextual independence, and there is a rule for each assignment of variables to a parent), the algorithm reduces to the standard query-oriented Bayesian network algorithms. But when there is contextual independence (as there is in many ICL theories) the algorithm lets us exploit that. This also seems to be very promising for approximation, where different approximations can be used in different contexts (Poole 1998). Aside: One main difference between clique tree propagation and the query-oriented algorithms is that clique tree propagation acts on marginal distributions (each clique represents the marginal distribution of the variables in the clique), whereas the query-oriented algorithms act on conditional probabilities. Probabilistic partial evaluation relies on manipulating conditional probabilities. The contextual independencies get lost quickly when you reason with marginal distributions. Finally, there is also a promise of structured dynamic programming using the rules. The most comment methods for solving sequential decision problems is to do dynamic programming. Traditional dynamic programming explicitly reasons over state space.s and thus suffers from what Bellman calls the "curse of dimensionality". We should be able to do much better by reasoning in terms of the variables (or propositions) that define the state space and exploiting the rule-structure in dynamic programming. In the ICL, dynamic programming looks much like regression planning. We can either do this in POMDPs where we regress values through conditional plans (Boutilier and Poole 1996) or where we are building plans conditional on history (as in influence diagrams), and we can use the rule structure to determine states that all have the same utility (Poole 1995). Thus in answer to your question, this is an active area of research by myself and others who are looking at exploiting contextual independence for probabilistic inference and sequential decision making. One of the reasons I am optimistic that this will work is that it promises nice methods of approximation, where we can approximate differently in different contexts. We will have to see if we can deliver on this promise.
ReferencesThe following are references to some of my papers that give more details:
Q2. Ray Reiter (11.9):
David, I enjoyed reading your paper, and learned a lot from it. I agree with you that axiomatic action theories need an account of probability/decision theory, although I do not share your reservations about being able to do so in the sitcalc (see below). My comments and questions concern only the probabilistic (as opposed to the decision theoretic) components of your paper.
Technical issues1. Examples 2.8 and 2.9.
The predicate
(a) It is not parameterized by the thing -- in this case key -- being picked
up. It should be
(b) In axiomatizing at on p. 20, you do not include a
2. In the conditional plan of p. 21, don't you want
3. Consider the axiom of Example 2.9 and your comment "Note that this
implies that putting down the key always succeeds".
Suppose not. Then is it the
case that the axiom should be modified by replacing the atom
4. In the axiom of Example 2.9, you include a choice
Basically, you want to describe the probability that the robot
doesn't lose the key during the interval between the occurrence of the last
action in
The need for action preconditions
On p. 8, you claim "None of our representations assume that actions have
preconditions; all actions can be attempted at any time." I'm sceptical of
this claim. Consider your example domain, where you adopt the axioms
Finally, you often talk about actions being
"attempted",
as in the above quotation. I found this particularly confusing
since nothing in your ontology deals with action "attempts".
Actions simply
realize their effects with certain probabilities. You do not, for example,
have an action
Section 3.3I must confess that I couldn't make much sense of the discussion in Section 3.3, especially your arguments that the sitcalc is inherently unsuitable for modeling multiple, possibly reactive agents. It's true that Golog lacks the right control structures for doing this, but your comments on p. 26 and p. 27 (p. 26 "In order for a situation calculus program to ... and does the appropriate concurrent actions." p. 27 "When there are multiple agents ... or complex actions.") suggest to me that you are not aware of more recent work by De Giacomo, Lesperance and Levesque on Congolog that addresses precisely these issues (Proc. IJCAI 97). Congolog provides reactive rules (interrupts) and interleaving concurrency in a semantically clean way, all formulated within the sitcalc. I believe that Congolog addresses all your criticisms of the sitcalc, except, of course, it does not (yet) incorporate utilities.
Comparison with Bacchus, Halpern and LevesqueSo far as I know, there have only been two proposals for augmenting the sitcalc with probabilities, yours and BHL. As I indicate below, these two approaches are considerably different, and therefore I believe it is important to elaborate on these differences more than you do (top p. 25) especially the substantial ontological differences.
For BHL, there is no
1. Primitive actions for BHL are entirely different than yours. BHL primitive
actions would be things like 2. Therefore, for BHL, situation terms describe the "actual" history selected by nature in response to the agent performed actions, e.g.
3. In BHL, all primitive actions (
4. For BHL, your nondeterministic agent actions like
Suggestion: A Guide for the Perplexed
Your paper uses a single, ongoing example to
illustrate the axiomatization style
required by your approach. Personally, I couldn't abstract, from the example,
a set of guidelines by which I could invent my own axiomatization for a
different example of my choosing. I think it would considerably enhance the
paper if you provided such guidelines. For example, for each action, tell the
reader what axioms need to be specified. What choice of fluents needs to be
made? (For example, there seem to be at least two categories of fluents --
"normal"
fluents like
A2. David Poole (18.9):
Ray,
Technical issues1. Examples 2.8 and 2.9.What it says now is that the probability that picking up something (in this case a key) actually achieves the agent carrying the object doesn't depend on what is being picked up (nor does it depend on the position, as long as the robot and key are at the same position). You are right in that this probably isn't what you want. [This would really only make a difference if the Presumably, you need one of these predicates for each action type, e.g.In this case Now,I am not sure what you mean by the union, but Incidentally, your specifications of what sets are inYes. I mean for each situation term
Here (b) In axiomatizing at on p. 20, you do not include aYes. 2. In the conditional plan of p. 21, don't you wantNo. This is sort of correct, but it doesn't get the probabilities right. I would expect that the probability of keeping the key to be independent of of the other two cases. There are three free parameters to be assigned probabilities, so we need three choice alternatives (when we only have binary alternatives). See the guide for the perplexed below; I do this example in more detail. Yes. That is right. If the agent could drop the key at any time, we need to model (or learn) how the agent may drop the key as a function of time. Presumably this would be modelled as a Poisson process (assuming it is equally probable that the agent can drop the key at any time). As outlined on page 1, I haven't dealt here explicitly with time. The axiomatization I gave assumes that the agent can hold on to the key tightly enough so whether it drops the key isn't a function of time ;^}
The need for action preconditions
On p. 8, you claim "None of our representations assume that actions have preconditions; all actions can be attempted at any time." I'm skeptical of this claim. Consider your example domain, where you adopt the axiomsThere are two separate issues here, first the meaning of preconditions, and secondly how the atomic choices act in contexts where they aren't applicable. There are three things that could be meant by a precondition of an action:
The second issue is a bit more subtle. Let's assume that the choice
alternatives are all binary (non-binary alternatives are useful when a
relation is functional). In this case, there are the same number of
alternatives as there are free parameters (conditional probabilities
that can be independently assigned) in the probability model. For
example, when converting a Bayesian network to an ICL theory, there
are the same number of alternatives as there are numbers to be
assigned in the Bayesian network. You typically have, for each effect
Your question on conditions for This should be seen in the context of an overriding theme of this work. We are not trying to see how much we can put into a formalism, but how little we can get away with. We don't need conditions on the probabilities of the atomic choices, so we don't have them. Perhaps the term "attempted" is misleading. What I mean is that the robot sends the motor control for the action, irrespectively of whether the robot can do it (in this sense it is attempting the action). By the
Section 3.3
I must confess that I couldn't make much sense of the discussion in Section 3.3, especially your arguments that the sitcalc is inherently unsuitable for modeling multiple, possibly reactive agents. It's true that Golog lacks the right control structures for doing this, but your comments on p. 26 and p. 27 (p. 26 "In order for a situation calculus program to ... and does the appropriate concurrent actions." p. 27 "When there are multiple agents ... or complex actions.") suggest to me that you are not aware of more recent work by De Giacomo, Lesperance and Levesque on Congolog that addresses precisely these issues (Proc. IJCAI 97). Congolog provides reactive rules (interrupts) and interleaving concurrency in a semantically clean way, all formulated within the sitcalc. I believe that Congolog addresses all your criticisms of the sitcalc, except, of course, it does not (yet) incorporate utilities."It's true that Golog lacks the right control structures for doing this..." I am quite happy with the agent having simple control structures. I am concerned about modelling asynchronous external actions or events (by other agents or by nature). It is not clear to me that Congolog addresses this, in the sense that all of the concurrency in the language is internal to the agent. It doesn't model exogenous actions (they generated externally). This would seem to mean that external events are restricted to occur at the situations defined by the primitive events. [I am looking forward to being told I am wrong.] Moreover, with respect to the mix of the situation calculus and time, it is clear now that we can do what we want when we have both the situation calculus and time. But when you have statements saying when an event occurred, you don't need situations. Thus parsimony would suggest that we do away with situations. [I would expect that this should form a different thread.]
Comparison with Bacchus, Halpern and Levesque
So far as I know, there have only been two proposals for augmenting the sitcalc with probabilities, yours and BHL. As I indicate below, these two approaches are considerably different, and therefore I believe it is important to elaborate on these differences more than you do (top p. 25) especially the substantial ontological differences.Yes, I do need to discuss this further (that is the joy of ETAI discussion of papers; we can retrospectively expand those parts that are of most interest). First let me say that I am not claiming that mine is better than theirs, nor that it is different just for the sake of being different. We start from very different perspectives. In their introduction, they explicitly contrast their work with the work starting from Bayesian networks. In particular, they claim "...they [Bayesian networks] have difficulties in dealing with features like disjunction ...". I take a Bayesian perspective that disjunction is not a feature we want! What defines a Bayesian is that probability is a measure of belief, that any proposition can have a probability (both of which BHL would agree with) and that all uncertainty should be measured by probability. Thus you need to keep in mind that the different design choices could stem from this different perspective.
For BHL, there is noExcept that BHL don't model probabilistic actions (in their IJCAI-95 paper, which is the only one I had seen) although I believe they could. All of their probabilities are within the agent, so such axioms would have to be part of the formula for updating belief. [I just found a new paper, Reasoning about Noisy Sensors and Effectors in the Situation Calculus, 1998, at Faheim Bacchus's web site that does this. I'll call this BHL98.] One other important point to note is that the probabilistic part of the ICL is a restricted form of Bacchus's and Halpern's logics of probability as belief. The translation is that each atom that isn't an atomic choice is defined by Clark's completion of the rules defining that atom. We need statements that the elements of an alternative are exclusive and covering and that the atomic choices that aren't in the same alternative are probabilistically independent. Thus the translation doesn't require "suitable conditions on S and p". A general framework for probability and action is intuitively straightforward (see Halpern and Tuttle (1993) for a general semantic framework). You can think about forward simulating the system. Nondeterministic/stochastic actions split the worlds and impose a probability over the resulting worlds. In the ICL we handle splitting the worlds using one simple mechanism: independent choice alternatives. I treat actions by the agent and other stochastic mechanisms (e.g., the ramifications of actions) in exactly the same way. BHL-98 split the worlds by nondeterministic actions (as you outline). They would then need a different mechanism for stochastic ramifications. These are very different ontological and representational commitments than yours, and I think they deserve a deeper analysis than your discussion provides. Here are a few important differences these commitments lead to:Yes. Yes. I tried to be explicit about both of these points. 3. In BHL, all primitive actions (Except I would think that my successor state axioms are perfectly straightforward. I get by with just Clark's completion, in much the same way as other logic programming representations of action. Frame and ramification axioms are treated the same (in fact I don't even think of them as different, and the paper doesn't distinguish them). I get by with Clark's completion because it is defined with respect to each world, where I just have a simple acyclic logic program. Maybe the guide to the perplexed below will help. Moreover, BHL use "classical" action precondition axioms, in contrast to you (see above about action preconditions).So what happens if the agent doesn't know if the precondition holds, but still wants to do the action? This is important because, with noisy sensors, agents know the actual truth values of virtually nothing outside of their internal state, and most preconditions of actions are properties of the world, not properties of the agent's beliefs. 4. For BHL, your nondeterministic agent actions likeOne major difference is that in BHL, it is the agent who does probabilistic reasoning. In my framework, the agent doesn't (have to) do probabilistic reasoning, it only has to do the right thing (to borrow the title from Russell and Wefald's book). The role of the utility is to compare agents. This is important when we consider what Good calls type 2 rationality and Russell calls bounded rationality (see the work by I.J. Good, Eric Horvitz, and Stuart Russell for example), where we must take into account the time of the computation done by the agent in order to compute utility. I would expect that optimal agents wouldn't do (exact) probabilistic reasoning at all, because it is too hard! That is why we want a language that lets us model agents and their environment, and defines the expected utility of an agent in an environment. One of the reasons I didn't want to just add "time" into the framework is that I want a way to take into account the computation time of the agent (the thinking time as well as the acting time), and that makes it much more complicated (I wanted to get the foundations debugged first).
Suggestion: A Guide for the Perplexed
Your paper uses a single, ongoing example to illustrate the axiomatization style required by your approach. Personally, I couldn't abstract, from the example, a set of guidelines by which I could invent my own axiomatization for a different example of my choosing. I think it would considerably enhance the paper if you provided such guidelines. For example, for each action, tell the reader what axioms need to be specified. What choice of fluents needs to be made? (For example, there seem to be at least two categories of fluents - "normal" fluents likeThanks for the suggestion. The following is a paraphrase of how we tell people to represent knowledge in Bayesian networks, translated into the language of the ICL.
First I'll do the propositional (ground) case. We totally order the
propositions. The idea is that we will define each proposition in
terms of its predecessors in the total ordering. For each proposition The case for the ICL with the situation calculus is similar. Intuitively, we make the total ordering of the propositions respect the temporal ordering of situations. We write how a fluent at one situation depends on fluents (lower in the ordering) at that situation and on fluents at previous situations. Note that this rule automatically handles ramifications as well as frame axioms. The total ordering of the fluents guarantees the acyclicity of the rule base and that we don't have circular definitions. The initial situation is handled as any other, but the predecessors in the total ordering are only initial values of fluents (and perhaps atoms that don't depend on the situation). The only thing peculiar about this is that we often have fluents that depend on values at the current as well as the previous situation. This is important when there are correlated effects of an action; we can't just define each fluent in terms of fluents at the previous situation. But this also means that we don't treat frame axioms and ramification axioms differently.
Let's look at how we would do Examples 2.8 & 2.9, with
putting-down the key possibly failing. Lets consider when carrying would be
true. Suppose carrying doesn't depend on anything at the same time, so
we can ask what are the contexts on which
The robot would be carrying carrying(O,do(pickup(O),S)) <- at(robot,Pos,S) & at(O,Pos,S) & pickup_succeeds(O,S). The robot would be carrying carrying(O,do(A,S)) <- carrying(O,S) & A \= putdown(O) & A \= pickup(O) & pickup_succeeds(O,S). The robot would be carrying carrying(O,do(putdown(O),S)) <- carrying(O,S) & putdown_fails(O,S).where
Note that putdown may succeed in doing other things, it just fails in
stopping the robot from carrying Footnote 9 is there to try to explain what the rules mean if the rule bodies aren't disjoint. We can interpret what these rules mean, but it usually isn't what you want (for the cases where both rules are applicable). When we write these rules, having non-disjoint rules is usually a bug (the implementation available from my web page warns when it finds non-disjoint rules.)
In connection with this last question, notice that on p. 20, the clause with headIt is a ramification of the robot moving. The above methodology handles these ramifications in the same way it handles frame axioms (and any other axioms). Thanks for your comments and questions. I hope this made the paper clearer. I would be happy to answer any other questions you may have (or engage in debate about the usefulness of this). David Q3. Hector Geffner (19.11):
David, I've enjoyed reading your paper "Decision Theory, the Situation Calculus and Conditional Plans". I find it hard to disagree on your general point that a general model of action must combine elements from logic and probability/utility theory, so my questions and comments are addressed more to the question of how this combination is achieved in your proposal. I'll try to comment first on the more general issues. 1. In page 24, you say "The representation in this paper can be seen as a representation for POMDPs". If so, I wonder, why not present the representation in that way from the very beginning? Wouldn't that make things much clearer? Namely, a POMDP is described by (see [1]):
A - state space Then the constructions in your language could be seen as a convenient/concise/modular way for specifying these components. Actually, once the mapping from the language to these components is determined, no further semantics would be needed (i.e., the probabilitities of trajectories, the expected utility of policies, etc., are all defined as usual in POMDPs). This is actually the way we do things in [2]. 2. I can see some reasons why not to present your representation as a "front end" to POMDPs.
Let's say that initially I don't see how this behavior could be captured by priors over choice space. This seems to be a strong limitation. It looks as if probabilistic actions cannot be handled after all. Is this right? References [1] L. Kaebling, M. Littman, and T. Cassandra. Planning and Acting in Partially Observable Stochastic Domains, AIJ 1998 [2] H. Geffner, B. Bonet. High level planning with POMDPs. Proc. 1998 AAAI Fall Symp on Cognitive Robotics (www.ldc.usb.ve/~hector) [3] H. Levesque. What's planning in the presence of sensing? AAAI96 [4] H. Geffner and J. Wainer. A model of action, knowledge and control, Proc. ECAI 98 (www.ldc.usb.ve/~hector)
A3. David Poole (24.11):
I thought I did do this. (At least I was trying to do this, and I knew
when I was finished when I had defined all of the components of a
POMDP). The actions and observables of Definition 2.6 are the same as
B and E. I specify C, D & F using rules. In particular I represent
the cost function in terms of rules that imply the
(a) everything that is true at a stage. This would include the actual time (if it is relevant), the values of sensors, the accumulated reward, etc. (b) a sufficient statistic about the history to render the future independent of the past given the state (which, by definition is Markovian). This typically doesn't include the time, the output of the sensors, or the accumulated reward. I have taken the approach of (a). Most POMDP researchers take the approach of (b) because their algorithms reason in terms of the state space. They therefore want to keep the state space as small as possible. I don't want to reason at the level of the state space, but instead at the level of the propositions.
I am interested in determining an optimal or near-optimal solution (but one that takes computation time and space onto account). I want to be able to define the expected utility of any strategy the agent may be carrying out, whether it maintains a belief state or not. No matter what program the agent is following, it can be described in terms of a conditional plan that describes what the agent will do based on the alternate sequences of observations (i.e., its function from observation sequences into actions). Note that this doesn't refer to anything inside the agent. The agent could be reasoning with a probability distribution over states, remembering a few previous observations, just reacting to its current observations, or actually following a conditional plan. The important thing is what it does is based on what it observes. What I am advocating is good hacks; the best agent won't do exact probabilistic reasoning. We need good strategies, perhaps like your RTDP-BEL. However, it is not obvious (to me anyway) that the bounded optimal solution will necessarily be an approximation to the (unbounded) optimal solution. The best real (bounded time and bounded space) agent may do something quite different than approximating belief states. Unless we have some representation that lets us model the expected utility of an agent following its function from observation and action history into actions (i.e., its transduction), we won't be able to specify when one agent is better than another. The conditional plans (policy trees) let us model this, without any commitment to the internal representation of the agent.
Let's say that initially I don't see how this behavior could be captured by priors over choice space. This seems to be a strong limitation. It looks as if probabilistic actions cannot be handled after all. Is this right? |
Here is how I would represent this: Facts:
[Note that the completion of the facts for
We can then derive
_{ } P(on(do(turn_off, do(turn_off, do(turn_off, init))))) = 0.5^{3} |
_{ } on(do(turn_off, do(turn_off, do(turn_off, init)))) |
_{ } off_failed(init) | ||
_{ } off_failed(do(turn_off, init)) | ||
_{ } off_failed(do(turn_off, do(turn_off, init))) |
Any probabilistic actions that you can represent in a dynamic Bayesian network, I can represent in the ICL_SC. Moreover I can represent it at least as succinctly as the dynamic Bayesian network (up to a constant factor) and sometimes exponentially (in the number of state variables) more succinctly (when there is context-specific independence). The mapping is local and modular.
Thanks for taking the time to read and comment on the paper. I hope my comments help makes the paper and what I am trying to do clearer. Let me stress that this paper has nothing to say about computation. Whereas your work was motivated by being a front end to an actual POMDP algorithm; the motivation here relies on being able to deliver on POMDP algorithms that can exploit the conditional and contextual independencies of the representation. Unfortunately, I can't show you such algorithms now. But we are working on it.
David
C3-1. Hector Geffner (26.11):
David,
thanks for your answers; they helped me a lot. Just a brief follow up on the status of your proposed framework: a new model? a new language? a new algorithm? all of them? ...
Best regards. -hector
1. In page 24, you say "The representation in this paper can be seen
as a representation for POMDPs".
If so, I wonder, why not present the representation in that way from the very beginning? Wouldn't that make things much clearer? |
One of the problems with writing a paper that is trying to bridge different areas is to try to explain it for all readers (or to keep them all equally unhappy). I suppose I have succeeded when POMDP researchers say "this is just a representation for POMDPs" and when situation calculus researchers claim "this is just the situation calculus with some some probability added" or those studying Bayesian networks claim that "this is just a rule-based representation for dynamic Bayesian networks with actions and observables". Describing it explicitly in terms of POMDPs may have made it easier for POMDP researchers, but not necessarily for everyone else. |
I think I don't agree with this. I believe that POMDPs are a general and natural model for sequential decision problems that involve sensing. They are not for POMDPs researchers only; in the same sense, that logic is not only for logicians. And they are simple too (unlike some of the POMDP algorithms that are indeed complex).
I think that anyone dealing with sequential decision problems that involve sensing should know about POMDPs, whether or not they appeal to POMDP algorithms for solving them, and whether or not they deal with probabilities. I know this sounds dogmatic ... but it is the truth!!! :-)
No, really, POMDPs are very useful for understanding and identifying the different dimensions of a decision problem: transitions, costs, information; and at the same time they have little to do with probabilities (namely, even if all probabilities become zero or one, POMDPs are still very meaningful and don't reduce to anything else as 0-1 probabilities that would reduce for instance to logic).
On a more general note, I think there are three essential aspects to the work in planning and control in AI:
Models are about the mathematics of the task: what is a problem? what is a solution? what is an optimal solution? Languages are for describing these models in a convenient way. Algorithms are for computing the solutions.
I think these three ingredients are always present in approaches to planning and control in AI, and I believe it is useful to make to them explicit; even when the algorithms may take advantage of the particular language in which the model is represented (as in Strips planning).
I understand that you expect your language to be useful not only for specifying POMDPs in a convenient way, but also for solving them conveniently. That would be great. Yet even in that case, people using different POMDP algorithms could in principle benefit from your language for setting up their POMDP models. This could also be important, and indeed, in the recent AAAI Symposium on POMDPs, the need for good languages for building POMDPs for particular applications and for exchanging benchmarks was emphasized. May be your language, as well as other action languages suitably extended, could fill up that need. As you know, we have also been doing work in that direction.
Best regards.
- hector
C3-2. Chitta Baral (22.12):
Hi Hector and David:
I have read your exchanges in the ENRAC about the importance of POMDPs in decision problems involving sensing. I have the following two questions to Hector.
(i) I think POMDPs are very good for this particular type of problems.
However, I am not sure they scale up in the following sense:
If we elaborate sensing as obtaining knowledge then
when we consider obtaining knowledge of the form
On the other hand it seems that the approach in Bachus, Levesque and Halpern (IJCAI 95) can scale up to this and they can be shown equivalent to a POMDP based approach in the special case where we only deal with one level of knowledge. (eg: S5 guarantees that.)
(ii) I don't understand why (perhaps you can help me
here), in a POMDP formulation (for example, in Hector's AAAI fall
symposium paper this year)
we have the action
Best regards
Chitta
Q4. Anonymous Referee 1 (12.6.1999):
1. Are the results of the article, as specified in the summary, of significant interest for researchers in our area?
The themes considered in the paper are of great interest.
2. Does the full text substantiate the claims made in the summary?
There are many claims, many lacking precision and clarity. The paper delivers on some of them (see below).
3. Is the text intelligible and concise?
The main ideas in the paper are intelligible, but the presentation is often confusing (see below).
4. Do you know of any previous publication of the same result?
No.
5. Are there any other considerations that may affect the decision?
No.
Based on my understanding, the best way to describe the paper is as proposing a way of representing actions, probabilities, utilities, sensors, and conditional plans in logic programs where:
Once a logic program is defined along these lines and certain conditions hold, the expected utility of conditional plans is well defined. There are also hints about how to compute this expected utities by building suitable "explanations", by this is not explained in detail.
This would be good enough. The main problem is that the author does not see and does not present the paper in this way.
He sees the logic program in which the truth of certain combinations of atoms is defined externally as an "independent choice logic", a logic that is "based on Bayesian decision theory". He also says that the paper proposes "a new model; not an alternative to POMDPs, Bayesian Networks, logic programs, and the standard situation calculus, but a way to combine them", etc.
I think this is quite confusing.
Personally, I'd also prefer less general discussion about decision theory, logic, uncertainty, etc; and more details on the procedure for evaluating the expected utilities of plans and its performance over some examples.
Q5. Anonymous Referee 2 (12.6.1999):
1. Are the results of the article, as specified in the summary, of significant interest for researchers in our area?
Yes, definitely.
2. Does the full text substantiate the claims made in the summary?
Yes, fully.
3. Is the text intelligible and concise?
Intelligible: yes. Concise: yes and no; see below.
4. Do you know of any previous publication of the same result?
No.
5. Are there any other considerations that may affect the decision?
None.
General remark. The article is lucid, well written, and easy to understand. It is relatively long (40 pages), but this is necessary for covering the topic it sets out for: to propose an integration of the author's "independent choice logic" with a variety of Toronto situation calculus, including background, definitions, examples, and comparisons with other approaches. Everyone may not agree with the approach, but it is a valuable contribution.
The referees make a recommendation whether the article is to be accepted or declined, as usual. The article and the discussion remain on-line regardless of whether the article was accepted or not. Additional questions and discussion after the acceptance decision are welcomed.
The Review Protocol Page is used as a working structure for the entire reviewing process. During the first (review) phase it accumulates the successive debate contributions. If the referees make specific comments about the article in the refereeing phase, then those comments are posted on the RPP as well, but without indicating the identity of the referee. (In many cases the referees may return simply an " accept" or " decline" recommendation, namely if sufficient feedback has been obtained already in the review phase).