Moderated by Salem Benferhat, Henri Prade.

Thomas Lukasiewicz

Probabilistic Default Reasoning with Conditional Constraints

The article mentioned above has been submitted to the Electronic Transactions on Artificial Intelligence, and the present page contains the review discussion. Click here for more explanations and for the webpage of theauthors: Thomas Lukasiewicz.
Overview of interactions

N:o Question Answer(s) Continued discussion

1 17.5  Angelo Gilio
19.5  The author

2 17.5  Salem Benferhat
19.5  The author

3 13.6  Salem Benferhat
13.6  The author

4 28.6  Jürg Kohlas
28.6  The author

Q1. Angelo Gilio (17.5):

The text of this question is presented in postscript due to its extensive formula content.

A1. Thomas Lukasiewicz (19.5):

Dear Angelo,
I want to give some comments/replies on your comments/answers.
Let me first address the issues 1-3. Well, you must distinguish between the concept of coherence and the way to represent this concept. More precisely, the concept of coherence does not depend on its representation by de Finetti's three-valued entities. That is, the concept of coherence and its notion of entailment can also be expressed in terms of probability functions over possible worlds.
In the same sense, the classical notions of satisfiability and entailment can be expressed in terms of de Finetti's three-valued entities. More precisely, the three-valued entitities of a set of conditional probability statements give rise to a partition of the set of all possible worlds, that is, especially a more efficient representation of the problems "satisfiability" and "entailment" (which is discussed and exploited in the following papers in the framework of probabilistic logic programming:
http://www.kr.tuwien.ac.at/staff/lukasiew/ecsqaru99.ps.gz http://www.kr.tuwien.ac.at/staff/lukasiew/rr0001.ps.gz).
That is, in summary, the concept of coherence and the concept of classical model-theoretic satisfiability can both be described in terms of probability functions over possible worlds and in terms of de Finetti's three-valued entities.
That is, your proposal 3 just carries us to another (of course, more efficient) representation of possible worlds.
Your comments 1 and 2 somehow also say that you prefer the notions of coherence and the connected entailment relation to the classical notions of satisfiability and logical consequence. Well, I think that they are two alternative ways of formulating probabilistic reasoning, which have both their own "right of existence". That is, rather than preferring one to another, we should better precisely analyze their properties and their relationship to each other (and thus identify their different fields of application).
We actually did so in an extensive email-discussion "backstage". So, perhaps you may permit me to make some of our results public:
(1) The notion of coherence is generally stronger than classical satisfiability, due to additional positive probability statements.
(2) Entailment under coherence is generally weaker than classical entailment, since it replaces the positivity of the conditioning event by a weaker positivity requirement.
(3) For probability statements of the form (A|B)[1,1], the notions of coherence and of coherence-entailment seem to coincide with consistency and entailment, respectively, in system P.
Perhaps, the difference between the coherence-based and the classical notions of tight logical consequence is clear through some examples:
The knowledge base
KB={(fly|bird)[1,1],(mobile|fly)[1,1]}
implies
(mobile|bird)[0,1]
using coherence-tight-entailment, but it implies
(mobile|bird)[1,1]
using the classical notion of tight entailment.

KB={(fly|bird)[1,1],(bird|penguin)[1,1],(not fly|penguin)[1,1]}
implies
(not fly|penguin)[1,1]
using coherence-tight-entailment, but it implies
(not fly|penguin)[1,0]
using the classical notion of tight entailment.
So, in summary, the coherence-based formalism and the classical model-theoretic formalism express simply two alternative ways of performing probabilistic reasoning.
Let me next consider your comment 4. Well, first note that the positivity of the premise can be seen as a choice that makes the notion of entailment stronger. I mean that I could also simply remove this positivity. As a result, I would then get the same interval as before, if the positivity already follows from the assessment; and the interval [0,1], otherwise. That is, I would simply weaken the notion of entailment.
Finally, note that the derived [1,0]-intervals carry important information, namely that the considered premise has always the probability 0. So, they should not be given up in classical probabilistic reasoning. ...by the way, the classical notion of entailment is defined for all sets of conditional probability statements KB, while the coherence-based entailment is defined only for coherent KB. So, formally, you should actually also use [1,0]-intervals, namely to identify incoherent KB.
Let me finally address your last comment 5: As already expressed by my comments above, also the notion of coherence is closely related to default reasoning. The only difference is that the inconstencies in my approach (for computing the preference relations on defaults) are generated through strong statements of the form Pr(E) = 1 , while the inconsistencies related to coherence are generated through weaker statements of the form Pr(E) > 0 . For this reason, my notions of probabilistic default entailment are stronger than the classical (conditioning-based) entailment, while the notion of coherence-based entailment is weaker than classical (conditioning-based) entailment.
Best wishes,
Thomas

Q2. Salem Benferhat (17.5):

Dear Thomas,
I have 2 brief questions/comments concerning your paper.
1. In the introduction, you explain that a way to solve the conflict in the rules {G1, G2 * , G3 * } is simply to remove the rule G2 *  . I can understand this approach of dealing with inconsistencies in classical knowledge bases since beliefs are either true or false, and there are not lot of choices. However, one may think to other ways due to the presence of probabilities on the beliefs. For instance we may think of one of these two options:

enlarge the bounds associated with default information, or
use another form of combination operator.
Namely if we associate with each piece of probabilistic information a set of probability distributions (its models). Then, given probabilistic default theory D , we define its model as the intersection of models of all default rules in D . However, if this intersection is empty we take for instance the union.
2.The extensions provided in your paper are semantically defined, while some classical systems, like System Z , have efficient syntactic inference. Have you look to the syntactic counterparts of your extended systems? For instance, can you construct syntactically the z -partition of D ? By the way, what is the relationship between a given z -partition of a probabilistic default theory D and the one recovered by system Z (obtained by only looking to default rules and ignoring the weights)?
Best regards
Salem
A2. Thomas Lukasiewicz (19.5):

Dear Salem,
Here is my reply to Salem's questions/comments:

1. In the introduction, you explain that a way to solve the conflict in the rules {G1, G2, G3} is simply to remove the rule G2. I can understand this approach of dealing with inconsistencies in classical knowledge bases since beliefs are either true or false, and there are not lot of choices. However, one may think to other ways due to the presence of probabilities on the beliefs. For instance we may think of one of these two options: (a) enlarge the bounds associated with default information, or (b) use another form of combination operator. Namely if we associate with each piece of probabilistic information a set of probability distributions (its models). Then, given probabilistic default theory D, we define its model as the intersection of models of all default rules in D. However, if this intersection is empty we take for instance the union.
The formalisms were originally designed to solve inconsistency problems in probabilistic logic programming. Roughly speaking, inconsistencies that result from incompatibilities between generic and concrete knowledge. Similar problems arise when you try to combine statistical knowledge with degrees of belief. That is, the formalisms presented in the paper are actually tailored to solve this kind of problems.
I agree with you that other techniques to resolve inconsistencies are possible. So, for example, if you do not like the principle of preferring more specific information, then you could simply equally relax the bounds. E.g.,
({bird|penguin)[1,1]},{(fly||bird)[0.95,1],(fly||penguin)[0,0.05]})
would then imply that penguins fly with a probability in [0.5,0.5] (instead of [0,0.05]). Or, you want some preference, but you also want to satisfy the nonpreferred conditional constraints "as much as possible". In this case,
({bird|penguin)[1,1]},{(fly||bird)[0.95,1],(fly||penguin)[0,0.05]})
would imply that penguins fly with a probability in [0.05,0.05].
But, of course, all these principles (including your proposal of introducing new combination operators) strongly depend on the application you want to model.
My proposal is directed towards handling the interplay between statistical knowledge and degrees of belief, which is considered an important problem in literature. It gives a new view on how to tackle this problem.

2.The extensions provided in your paper are semantically defined, while some classical systems, like System Z, have efficient syntactic inference. Have-you look to the syntactic counterparts of your extended systems? For instance, can you construct syntactically the z-partition of D?

The z-partition can be constructed by a quadratic number of probabilistic satisfiability tests. Unfortunately, moving from the classical to the probabilistic framework, we lose the tractability in the Horn case. See the long version of the paper for algorithms and complexity issues:
ftp://ftp.kr.tuwien.ac.at/pub/tr/rr0002.ps.gz

By the way, what is the relationship between a given z-partition of a probabilistic default theory D and the one recovered by system Z (obtained by only looking to default rules and ignoring the weights)?

The z-partition of a default theory with only [1,1]-intervals coincides with the partition of the corresponding classical default theory in system Z.
But, in general, you cannot simply "ignore" the intervals, since an interval can "express" the two classical extremes [0,0] and [1,1].
Best regards,
Thomas

Q3. Salem Benferhat (13.6):

Thank you for your replies. A brief comment on the last reply

But, in general, you cannot simply "ignore" the intervals, since an interval can "express" the two classical extremes [0,0] and [1,1].

I understand, but in fact my question was, what is the relationships between the Z-partitions of a set of defaults that you get if you have the [1,1]-intervals (hence classical System Z partition) or if you the have [a,b]-intervals with a and b different from 0 and 1? Do you get a refined partition of the one of System Z? Are there incomparable, namely are-there some a and b, where using classical Z you get d1 > d2 but with the interval [a,b] you have d2 > d1, where d1 and d2 are two defaults.
Best regards
Salem
A3. Thomas Lukasiewicz (13.6):

I understand, but in fact my question was, what is the relationships between the Z-partitions of a set of defaults that you get if you have the [1,1]-intervals (hence classical System Z partition) or if you have the have [a,b]-intervals with a and b different from 0 and 1? Do you get a refined partition of the one of System Z? Are there incomparable, namely are-there some a and b, where using classical Z you get d1 > d2 but with the interval [a,b] you have d2 > d1, where d1 and d2 are two defaults.

Generally, they should be incomparable. But I do not think that the preference order between two defaults is reversed: Changing the intervals may generate or remove inconsistencies. These inconsistencies are then resolved by using the antecedents of the defaults. So, I think that changing the intervals just means adding strict preference relationships or removing them, but not reversing them.
If we consider the special case in which [1,1]- and [0,0]-intervals are relaxed to intervals of the form [a,1] and [0,b], then we should generally move to a "more rough" partition in system Z (since some inconsistencies may be removed). For example, the "penguin knowledge base"
{(fly|bird)[1,1],(bird|penguin)[1,1],(fly|penguin)[0,0]}
has the z-partition
({(fly|bird)[1,1]}, {(bird|penguin)[1,1],(fly|penguin)[0,0]}),
while the knowledge base with relaxed bounds
{(fly|bird)[0.5,1],(bird|penguin)[1,1],(fly|penguin)[0,0.5]}
has the z-partition
({(fly|bird)[0.5,1],(bird|penguin)[1,1],(fly|penguin)[0,0.5]}).
Best wishes,
Thomas

Q4. Jürg Kohlas (28.6):

This question is available in Word format. The following is the same text, converted to HTML (some font shifts and other typographical details may have been lost):
Probabilistic reasoning, as understood in this paper, is concerned more with reasoning a b o u t probabilities, than with reasoning w i t h probabilities. This may be a trivial remark; but for me this remark is essential to understand the paper. Reasoning about probabilities concerns in particular probabilistic modeling and in this context one may have differing opinions about the correct model of a given situation. In this sense, if I take the motivating examples in the paper, I have different views on some issues than the author.
In example 1, knowing nothing about penguins (except that they are birds), for me the preferred conclusion is that 0 = p(legs/penguin) = 1. The assumption that penguins are a ìnonexceptionalî subclass of birds seems to amount to assume that event ìpenguinî is stochastically independent of the event ìhas legsî: Then, and only then, do we obtain from probability calculus that
P(legs|penguin) = p(legs)
(everything of course conditioned on ìbirdsî, I omit this condition to simplify notation). Do we really want to make such a strong assumption?
We can dispute whether or not such an assumption is justified, desirable, correct or whatever (I can understand, if somebody wants to make this assumption, although I would like to be more conservative). This is outside probability theory. That is what I mean, when I say above the paper is concerned with reasoning about probabilities, not with probabilities.
I add, that Nilssonís probabilistic logic, gives no answer to this question, since no sample space is defined and the event ìpenguinî is even less defined.
Similar remarks apply to most of the other examples (examples 2 ñ 6). An exception is example 2. If I know p(fly|penguin) = 0 ñ 0.05 and I learn that the bird is a penguin, then probability calculus gives me indeed the prob. Interval of 0 ñ 0.05 that it flies. This is reasoning with probabilities and any other result would be in contradiction to probability theory. There is no freedom here.
Given the considerations above about different possible views on probability modeling, the question I pose myself is the following: If we want to develop a framework for reasoning a b o u t probabilities, should we then not be a lot more explicit in the representations of the basic ingredients of a probability framework like sample space, events, etc. It seems to me that these elements are too implicit, too hidden, in the formalism developed in this paper.

Prof. Juerg Kohlas Institute for Informatics University of Fribourg CH 1700 Fribourg (Switzerland) e-Mail juerg.kohlas@unifr.ch

A4. Thomas Lukasiewicz (28.6):

Dear Juerg,
Here are my answers/comments to your questions/comments:

Probabilistic reasoning, as understood in this paper, is concerned more with reasoning a b o u t probabilities, than with reasoning w i t h probabilities. This may be a trivial remark; but for me this remark is essential to understand the paper. Reasoning about probabilities concerns in particular probabilistic modeling and in this context one may have differing opinions about the correct model of a given situation. In this sense, if I take the motivating examples in the paper, I have different views on some issues than the author.
In example 1, knowing nothing about penguins (except that they are birds), for me the preferred conclusion is that 0 = p(legs/penguin) = 1. The assumption that penguins are a "nonexceptional" subclass of birds seems to amount to assume that event "penguin" is stochastically independent of the event "has legs": Then, and only then, do we obtain from probability calculus that
P(legs|penguin) = p(legs)
(everything of course conditioned on "birds", I omit this condition to simplify notation). Do we really want to make such a strong assumption?

Yes! In detail, there is significant work on reference class reasoning (especially by Reichenbach, Kyburg, and Pollock), which argues that such strong assumptions are desirable. The following is a citation from Reichenbach's work:

"If we are asked to find the probability holding for an individual future event, we must first incorporate the case in a suitable reference class. An individual thing or event may be incorporated in many reference classes. ...We then proceed by considering the narrowest reference class for which suitable statistics can be compiled."

...applied to the concrete example described above, this means that we can consider the class of birds, which is the only class about which we know something on the property of having legs, as a reference class for penguins. So, we we can use the knowledge that at least 95% of all birds have legs to conclude that also penguins have legs with a probability of at least 0.95.
Moreover, note that the magpie example (Example 4) is also taken from the field of reference class reasoning. It shows that my new notions of probabilistic default reasoning model correctly what stands behind Kyburg's "strength rule".
So, my notions of probabilistic default reasoning naturally incorporate ideas that stand behind reference class reasoning.

We can dispute whether or not such an assumption is justified, desirable, correct or whatever (I can understand, if somebody wants to make this assumption, although I would like to be more conservative). This is outside probability theory. That is what I mean, when I say above the paper is concerned with reasoning about probabilities, not with probabilities.
I add, that Nilsson's probabilistic logic, gives no answer to this question, since no sample space is defined and the event "penguin" is even less defined.

The probabilistic background of the paper is described in "Preliminaries" (Section 2, pages 2-3). More precisely, I use a possible worlds semantics. That is, probabilistic interpretations are probability functions on the set of all truth assignments [called possible worlds] to the basic events. So, any event E in my approach corresponds to a set S of possible worlds. The probability of E is then defined as the sum of the probabilities of all s in S.
So, the notions of sample space and events are defined in the spirit of Nilsson's work (actually, Nilssons approach is slightly different, as he uses the sentences of a knowl- edge base KB to generate a set of possible worlds, and not the basic events. So, Nilsson's set of possible worlds is actually more rough than mine. However, this does not make any conceptual difference).
The main reason why classical probability theory does not give any answer on how to perform default reasoning about conditional constraints is that we now have to deal with two different kinds of knowledge: generic knowledge and some concrete evidence. That is, the main question is how to combine these two kinds of knowledge. We can do it by (1) classical conditioning, (2) simply unifying generic and concrete knowledge, or (3) using the new notions for probabilistic default reasoning defined in the paper.
The points (1) and (2) guide us to the notions of 0- and 1-entailment, which stay entirely in the classical framework of probability theory. Note, however, that the examples in the paper show that 0- and 1-entailment do not have nice properties from a probabilistic default reasoning perspective.

Similar remarks apply to most of the other examples (examples 2 - 6). An exception is example 2. If I know p(fly|penguin) = 0 - 0.05 and I learn that the bird is a penguin, then probability calculus gives me indeed the prob. Interval of 0 - 0.05 that it flies. This is reasoning with probabilities and any other result would be in contradiction to probability theory. There is no freedom here.

...yes, this result in Example 2 can be obtained by classical conditioning (that is, 0-entailment) only.

Given the considerations above about different possible views on probability modeling, the question I pose myself is the following: If we want to develop a framework for reasoning a b o u t probabilities, should we then not be a lot more explicit in the representations of the basic ingredients of a probability framework like sample space, events, etc. It seems to me that these elements are too implicit, too hidden, in the formalism developed in this paper.

The probabilistic background of the paper is well-defined (see comments above). I will try to add some more comments and explanations in the "Preliminaries"-Section to make this point more explicit.
Best regards,
Thomas

Background: Review Protocol Pages and the ETAI
This Review Protocol Page (RPP) is a part of the webpage structure for the Electronic Transactions on Artificial Intelligence, or ETAI. The ETAI is an electronic journal that uses the Internet medium not merely for distributing the articles, but also for a novel, two-stage review procedure. The first review phase is open and allows the peer community to ask questions to the author and to create a discussion about the contribution. The second phase - called refereeing in the ETAI - is like conventional journal refereeing except that the major part of the required feedback is supposed to have occurred already in the first, review phase.
The referees make a recommendation whether the article is to be accepted or declined, as usual. The article and the discussion remain on-line regardless of whether the article was accepted or not. Additional questions and discussion after the acceptance decision are welcomed.
The Review Protocol Page is used as a working structure for the entire reviewing process. During the first (review) phase it accumulates the successive debate contributions. If the referees make specific comments about the article in the refereeing phase, then those comments are posted on the RPP as well, but without indicating the identity of the referee. (In many cases the referees may return simply an " accept" or " decline" recommendation, namely if sufficient feedback has been obtained already in the review phase).