David Poole:

Decision Theory, the Situation Calculus and Conditional Plans.

Summary of the article.

This summary is also available in postscript.

This paper shows how to use the situation calculus within a logic based on Bayesian decision theory. Within this logic we can axiomatize the dynamics of a domain (including stochastic frame axioms, and stochastic ramification axioms), how noisy sensors work, and the utility of final states. This paper defines the expected utility of conditional plans (containing sequential composition and conditioning on sensor values) given a representation of the domain.

The first part of this paper argues that many of the problems with representations of discrete actions have arisen because the primitive mechanisms in first order logic to handle uncertainty (namely disjunction and existential quantification) are inadequate to handle the choices (and their consequences) that an agent needs to make. For example, logics with no uncertainty, such as acyclic logic programs, seem to handle frame and ramification axioms unproblematically. We build on these acyclic logic programs, but handle uncertainty with more sophisticated tools. The paper argues that the appropriate representation for uncertainty for an artificial agent is Bayesian decision theory. In Bayesian decision theory all uncertainty is handled by probability and all preference by utility.

The representation that we use to combine logic and Bayesian decision theory is the Independent Choice Logic (ICL). The ICL grew (a) from work on abduction and default reasoning where we want to abduce to causes and use assumptions to make predictions from these, (b) from building rule-structured conditional probability tables in Bayesian networks, and (c) from work on different agents choosing assumptions modelled on game theory. The ICL is built from a choice space (a set of mutually exclusive alternatives, where an alternative is a set of ground atomic formulae, called atomic choices), and an acyclic logic program that gives the consequences of the atomic choices. The logic program can't have any clauses that imply an atomic choice. The semantics is defined in terms of possible worlds; there is possible world for every selection of one element from each alternative. A formula is true in a possible world if it it true in the (unique) stable model of logic program together with the atomic choices selected. The ICL is naturally abductive: the explanations of  g  form a concise specification of the possible worlds in which  g  is true.

In this paper all choices are made by nature. There is a probability distribution over the alternatives (a non-negative number associated with each atomic choice so the values for the atomic choices in an alternative sum to one). When there are finitely many alternatives, the probability of a world is the product of the probabilities of the atomic choices selected by the world (when there are infinitely many alternatives we use a measure over sets of worlds in the standard manner, but where the alternatives are unconditionally independent). The probability of a formula is the sum of the probabilities of the worlds in which it is true. There is a close relationship between the ground (variable-free) ICL and Bayesian networks; the elements of the body of a rule correspond to the parents in the Bayesian network. The ICL gives rule-structured conditional probability tables.

The paper shows how the situation calculus can be embedded in the ICL, where each possible world specifies a complete history; it specifies what is true in each situation. The actions that make up the situation are the actions of the agents (that is, the motor controls sent by the agent). Thus the agent can know what situation it is in, even if it doesn't know what is true in any situation. The ICL specifies a probability distribution over fluents in each situation.

We also allow for (noisy) sensors where the sensor value is some stochastic function of what is true in a situation. The agent can condition its actions on its sensor readings. The agent then adopts a conditional plan, where the conditions are sensor reading. A conditional plan selects a final situation in each possible world, where the situation selected depends on the sensor readings in that possible world.

Utility is axiomatized as any other fluent; we assume there is a unique utility for each situation in each possible world (the utility that is obtained if the agent finishes in that situation). This can be used to define an expected utility for each conditional plan. The expected utility of a conditional plan can be computed by abduction; we explain the final utility and the sensor readings in the plan. Each explanation has an associated probability and implies a final situation, and the utility of that situation. The expected utility can be obtained directly from the explanations by computing the expectation over the utilities.

The aim of this work is not to see how many different formalisms we can put together, but to see how few we can get away with. We use a very simple logic of acyclic logic programs, a simple way to handle uncertainty in terms of independent choices, and the standard situation calculus. This lets us define the expected utility of conditional plans. It is an open question as to how far we can build on this without compromising the simplicity of the formal foundations.