******************************************************************** ELECTRONIC NEWSLETTER ON REASONING ABOUT ACTIONS AND CHANGE Issue 00004 Editor: Erik Sandewall 27.6.2000 Back issues available at http://www.etaij.org/rac/ ******************************************************************** ********* TODAY ********* Today we present questions and comments by Judea Pearl for the the ETAI submitted article by Tom Costello and John McCarthy, *Useful Counterfactuals*, as well as the authors' answers. ********* ETAI PUBLICATIONS ********* --- DISCUSSION ABOUT RECEIVED ARTICLES --- The following debate contributions (questions, answers, or comments) have been received for articles that have been submitted to the ETAI and which are presently subject of discussion. To see the full context, for example, to see the question that a given answer refers to, or to see the article itself or its summary, please use the web-page version of this Newsletter. ======================================================== | AUTHOR: Tom Costello and John McCarthy | TITLE: Useful Counterfactuals | PAPER: http://www.ep.liu.se/ea/cis/1999/012/ | REVIEW: http://www.ida.liu.se/ext/etai/ra/rac/021/ ======================================================== -------------------------------------------------------- | FROM: Judea Pearl -------------------------------------------------------- The title of this paper, "useful counterfactuals", seems to suggest (1) that plain ordinary counterfactuals are useless, (2) that the paper will teach us how to discriminate useful from useless counterfactuals and (3) that the paper will teach us how useful counterfactuals can be interpreted and put into use. I will start by arguing that **all** (true) counterfactuals are useful. This implies that (2) is not needed. Finally, I will discuss whether the paper is specific enough to accomplish (3). The word ``counterfactual'' connotes contradiction and/or metaphysical speculation but, in fact, counterfactuals are neither contradictory nor metaphysical. Counterfactuals carry as clear an empirical message as any scientific laws, and are fundamental to them. The essence of any scientific law lies in the claim that certain relationships among variables remain invariant when the values of those variables change relative to our immediate observations. Counterfactuals, likewise, tell us what remains invariant when the world undergoes change, so, they are not different than any scientific knowledge. Every scientific law can be expressed counterfactually; for example, Ohm's law can be stated: ``had the current in the resistor been I instead of I', the voltage would have been V' (=I'R), instead of V." Thus, to say that counterfactuals are useful amounts to saying that scientific knowledge is useful. (In my IJCAI-99 paper I discussed how even a hind-sighted sentence such as "had I bet differently, I would have won a dollar", which refers to a non-repeatable circumstance, conveys useful knowledge. (Paper available on www.cs.ucla.edu/~judea/)) Costello and McCarthy list several usages of counterfactuals, including the conveyance of facts, learning, and prediction, yet when we examine those usages, we find that none is distinct to counterfactuals, and each may equally be served by what we normally call "domain knowledge". Indeed, none of the sentences on pages 2-3 would turn false if we replace the word "counterfactual" with the word "knowledge". The distinct puzzling questions about counterfactuals are (1) why humans resort to this subjunctive mode of expression in conveying simple chunks of knowledge, and (2) how we should interpret counterfactual sentences so as to extract the knowledge they convey. The paper does not touch on question (1) and the answer it provides to question (2) is not presented in a form that is transportable beyond the example in which it is embedded. The paper begins by emphasizing two notions, 1. Counterfactuals are evaluated with the aid of "approximate theories", and 2. The truth of counterfactuals depends on representation of situations in a "Cartesian space". By "approximate theories" the authors mean: "not complete theories of the world". I agree with the ubiquity of such theories, but I fail to see what makes approximate theories particularly akin to counterfactuals. It seems to me that EVERY theory is approximate, or else we would call it a "model" i.e., a mathematical object that assigns truth value to every sentence of interest. Moreover, I fail to see what is wrong with the traditional approach of first defining the truth of a sentence in a model (i.e., "complete theory") and then, if we do not have a complete specification of a model, we say that the sentence is true in a theory just in case it is true in every model of the theory. Do the authors suggest that this approach should be abandoned when it comes to counterfactuals? If so, why? And if not, I would have liked the authors to tell us what they consider to be an appropriate MODEL for counterfactual sentences. In other words, what kind of things we must specify before we can assign truth value to a counterfactual sentence q >p , where p and q are arbitrary propositions. I have not found the answer in this paper, and this made the reading very difficult. The second notion emphasized in the paper is "Cartesian space", which is a space of points defined by coordinates, such that we can always change one coordinate while keeping the others unchanged. When we can do that to every point (X, Y), it makes sense to say, "if X were 3, we would be closer to the origin", because it is possible then to infer the final location from the initial one. This Cartesian metaphor corresponds to what philosophers called "ceteris paribus" (keeping everything else constant), first suggested by J S Mills (1843). as a key element in understanding counterfactuals. Given this basic intuition, I concur with the authors that some form of ceteris paribus must govern the interpretation of counterfactuals no matter what formalism we use. However, this is only the first step. The remaining steps are to decide WHERE the Cartesian space is to be found in any given story, how to represent the points in that abstract space, how to compute the coordinate change dictated by a given counterfactual q >p and, finally, how to compute the ramifications of this coordinate change on other propositions. I felt that the paper leaves these questions either unanswered or implicit in the formulation of the skiing example. If the latter is the case, I suggest that the authors cast the answers in the form of generic principles, to permit their transportation across domains. I was unable to understand the skiing example; it is too involved and too skiing-domain-sensitive for me to master. I strongly recommend that the authors choose another example, one that leaves no ambiguities as to what theory resides in the mind of each speaker, and what the right answer is in each theory. In my IJCAI-99 paper I chose, for example, a firing squad scenario, where the entire theory can be communicated unequivocally to skiers and non-skiers alike, and where the truth value of every counterfactual sentence is obvious to all readers (e.g., that if rifleman-1 had not shot, the prisoner would still be dead). I would be glad to comment on the authors' proposed axiomatization, once it is cast in a more familiar domain, and if the authors demonstrate explicitly how counterfactual sentences can be evaluated IN GENERAL. Examples of explicit demonstrations can be found in Charles Ortiz's AIJ-9 paper, using a domino-tiles example, and in my IJCAI-99 paper (using the firing squad scenario). I can comment on section 8, entitled Bayesian Network, with which I am somewhat more familiar. First, the title may be confusing; Bayesian networks cannot support counterfactual reasoning, for reasons described in Balke and Pearl 1994 a b (see also Causality 2000, p.33-37) The authors probably meant to discuss probabilistic Structural Equations Models (SEM), of which Bayesian networks are an abstraction. An SEM is defined as a set of deterministic functions, while a Bayesian Network is defined as a set of conditional probability constraints. Costello and McCarthy identify two differences and one commonality between their formulation and SEM. I will explain why I find these two differences to be illusionary and the commonality to be tangential. I will then discuss what I consider to be the essential difference between the two formulations. Illusionary Difference 1. ------------------------- Costello and McCarthy write: > One major difference between our approach > and structural equation models or Bayesian networks is that we > consider arbitrary proposition, and consider these relative to a > background approximate theory .... The structural equation approach also considers arbitrary propositions relative to a background approximate theory. Contrary to Costello and McCarthy interpretation, the structural equation approach is NOT committed to equational specification. True, the equations (or functions) are used in defining a complete causal model, but "arbitrary propositions" can be used to specify "approximate causal theories", much the same as a collection of clauses defines a "theory" in propositional calculus; while a truth assignment to all the elementary propositions defines a "model". In fact, the most common causal theories used in SEM are in the form of graphs, which make no commitment whatsoever to the functional form of the equations; graphs merely restrict the set of arguments in each equation. Arbitrary domain constraints are not excluded from SEM, they are merely interpreted as constraints over the functional relationship between A and B. To illustrate, suppose our approximate theory contains just one (causal) rule: "If A then B". Suppose further A and B are true. Question: Is the counterfactual "not-A >not-B" true? Answer: we dont know. We need to complete the rule "If A then B" into a function, to specify what happens to B when A is false. This completion, whether done explicitly, or my minimization, or by some other principle, turns our theory into a collections of functions, a collection which I called a "structural causal model" . Illusionary Difference 2. ------------------------- The paper states: "The other major difference is that Bayesian networks focus on the probability distribution of certain variables, rather than on facts in general." I would like to believe that the authors did not mean it literally, because this kind of psuedo-differences were used by some AI-ers in the 1980's as an excuse for not reading the probabilistic literature. Those who venture to read that literature would discover quickly that probabilistic reasoning (in SEM) proceeds in two steps: First, reasoning about "facts in general" in a deterministic theory, and second, computing the probability of those "facts in general" when we have additional knowledge on how likely the background (or "frame") facts are. In my IJCAI-99 paper, for example, I first demonstrate how to compute the truth value of the deterministic counterfactual "if rifleman-1 had not shot, the prisoner would still be dead", and ONLY THEN I go to computing the probability of this sentence, assuming that rifleman-1 is somewhat likely pull the trigger out of nervousness. The same sequence is followed in Balke and Pearl (UAI-1995) in Galles and Pearl (1997, 1998) and in my new book Causality, chapter 7 (partly on www.cs.ucla.edu/~judea/) Thus, probabilities are options, not barriers to students of counterfactuals. Tangential commonality ---------------------- Costello and McCarthy write that their approach "can be seen to be similar to modeling systems with structured equations.. or Bayesian networks..." In their Theorem 4, they prove that a counterfactual sentence "is true in a causal model M if and only if [it] is true in the Cartesian frame MF..." What Theorem 4 states is that the evaluation of counterfactual sentences in a causal model M (according to the SEM formalism) involves a Cartesian-product-like assumption, and that one can identify the Cartesian space with the set of functions F. This is correct: structural models indeed assume ceteris paribus relative to the set of equations -- when we change one equation, the others remain intact. The reason I consider this commonality to be tangential is that I (and most people I know) take for granted the invocation of ceteris paribus in counterfactual analysis. The interesting question, in my opinion, is not whether a ceteris paribus assumption is present in a given theory of counterfactuals -- such presence is inevitable -- but rather, what space should we apply ceteris paribus to and how. Balke, Galles and Pearl make a specific commitment in this regard. They claim that ceteris paribus should be applied to the space of MECHANISMS (read: functions), and NOT to the space of propositions and not to the space of variables, and not to some other space that one can dream up. Thus, if Costello and McCarthy buy this commitment, they can safely claim that their approach "can be seen to be similar to modeling systems with structured equations.. ". But the mere existence of ceteris paribus (or Cartesian product space) someplace in a system does not make their approach similar to that system. And this brings me to the main point of my comments: Have Costello and McCarthy made this commitment (i.e., to identify the Cartesian space with a set of mechanisms)? Incidently, is anyone on this Newsletter prepared to make this commitment? A word of caution to those who answer YES or WHY NOT: the commitment to mechanisms does not come cheap. First, it requires that we proclaim certain sentences as "mechanisms", and that we assign to those sentences a different status and a different syntactic representation than that assigned to other sentences (e.g., facts, observations, assumptions, implications) It also requires a one-to-one correspondence between mechanisms and variables (see my IJCAI-99 paper, Sections 4.4-4.5, on www.cs.ucla.edu/~judea/) I have not seen these two elements in the authors' analysis of the skiing example, but I may have overlooked them, given my ignorance of skiing instructions. The reason I emphasize this commitment to mechanisms is that I do not believe counterfactual reasoning (or causal reasoning in general) is feasible without it. The puzzle with a counterfactual and a sentence, say q p, is that it involves a relationship between two PROPOSITIONS, q and p, not between an action proposition, and yet we treat q as an ACTION. How? In order to change the actual world to satisfy q we need to translate q into some action and to decide which mechanisms are to be altered by that action. Every theory of counter- factuals ought to explicate how these decisions are made in the representational scheme employed. It is quite possible that Costello and McCarthy's theory embeds these decisions implicitly in their analysis of the skiing example (which I missed). If they did, I believe they should make them formal, general and explicit. Here is an example of some general principles that people have proposed for counterfactual analysis. Balke, Galles and Pearl (ijcai-99) identified 3 necessary steps in the evaluation of a counterfactual sentence: 1. Abduction, 2. Action and 3. Prediction. For example, to evaluate the sentence: "if rifleman-1 had not shot, the prisoner would still be dead. we must execute the following three steps: 1. abduction: use the fact that the prisoner IS dead to infer that the captain gave the order to shoot. 2. action: alter the mechanism which originally made rifleman-1 obedient to the Captain's order. 3. prediction: test whether the prisoner is alive in the new theory, created by steps 1 and 2. If Costello and McCarty consider these steps NOT necessary for the evaluation of counterfactuals, I would invite them to posit alternative generic steps which they do consider necessary. (In this case, I would also challenge them to evaluate the sentence above with their alternative steps -- we must be concrete.) If they do consider these steps to be necessary, then I would ask them to identify where in the Costello-McCarthy formalization we find traces of these steps, and how we should go about deciding (in step (2)), what changes to make to the theory so as to accommodate the counterfactual antecedant "had rifleman-1 not shot". Have I left out another possibility? Yes, that Costello and McCarthy consider these three steps to be necessary but not sufficient. In this case, the readers of this paper would want to learn what additional principles they deem necessary, and in what kind of theories the need for the new principles will become urgent. The difficulty I had in reading this paper stemmed from not knowing where the authors stand on these possibilities and, consequently, I could not see the principles that the paper advocates for the analysis of counterfactuals . I hope the authors will provide this information in a revised version. Judea Pearl PS. Transcript and slides of my IJCAI-99 lecture are now available on http://www.cs.ucla.edu/~sunshine/pres/ijcai99.htm -------------------------------------------------------- | FROM: The authors -------------------------------------------------------- 1. Judea Pearl considers all counterfactuals as useful and therefore has no use for our singling out useful counterfactuals. We thought our distinction was clear, but we have added some material to the article to make it more clear. An example of a not demonstably useful counterfactual is "If Caesar was in charge in Korea, he would have used catapults". It is difficult to see how the truth of this might teach us something. Pearl's example of the firing squad is probably (at least 0.8) useless in most Americans daily lives. No use of this counterfactual was offered. We wrote that a counterfactual is useful if believing it can affect behavior. Our example was "If another car had come over the hill when you passed, there would have been a head-on collision." If the driver believes it, he will be more conservative about passing in the future. While some theory is involved in accepting the counterfactual, it is basically about a single experience and an associated almost experience (i.e. the collision). Contrast this counterfactual with Pearl's examples about the firing squad, e.g. "If A had not fired the prisoner would have died anyway." Pearl offers no suggestion about how believing it would help design better firing squads or would help the victim escape. Indeed the counterfactual is entirely derived from theory - no specific experience plays any role. Useful counterfactuals like the car example have another property. They have non-counterfactual consequences, e.g. "Passing under the conditions of the example is unsafe." This is in contrast with David Lewis's theories, where counterfactuals have no non-tautologcal non-counterfactual consequences. We have modified the article to emphasize this aspect of useful counterfactuals. 2. Pearl considers our skiing example exotic. We admit to more experience with skiing than with firing squads. We doubt Pearl's experience is otherwise. More to the point, the counterfactual sentences about skiing have many useful non-counterfactual consequences. 3. Very likely, we should have compared our useful common sense counterfactuals with those involved in stating scientific laws. However, so far as we know, the literature relating them does not describe drawing non-counterfactual conclusions from the counterfactuals themselves. It concerns the interpretation of counterfactuals rather than their use. Maybe the defenders of historical counterfactuals discuss non counterfactual consequences. 4. Pearl asks why humans resort to counterfactuals. We think it is because of their useful non-counterfactual consequences. 5. Pearl suggests that the two differences we point out between Structured Equational Models and Cartesian Counterfactuals are illusionary. We agree that they are not major technical differences, but are mainly differences in subject matter, and presentation. Cartesian Counterfactuals are applied to commonsense theories expressed in First Order Logic, in contrast to SEMs which are applied to other domains, and are expressed in equations. Pearl also suggests that the theorem we give is tangential. We agree that it does not address the underlying question of where Cartesian Frames or Causal Mechanisms come from. Tom Costello and John McCarthy ******************************************************************** This Newsletter is issued whenever there is new news, and is sent by automatic E-mail and without charge to a list of subscribers. To obtain or change a subscription, please send mail to the editor, erisa@ida.liu.se. Contributions are welcomed to the same address. Instructions for contributors and other additional information is found at: http://www.etaij.org/rac/ ********************************************************************