Linköping University Electronic Press:    Electronic Articles in Computer and Information Science

Simple Pre- and Post-Pruning Techniques for Large Conceptual Clustering Structures.

Title:Simple Pre- and Post-Pruning Techniques for Large Conceptual Clustering Structures.
Authors: Guy Mineau, Akshay Bissoon, and Robert Godin
Series:Linköping Electronic Articles in Computer and Information Science
ISSN 1401-9841
Issue:Vol. 5 (2000): nr 002
URL: http://www.ep.liu.se/ea/cis/2000/002/

Abstract: In (Godin et al, 1995a) we proposed an incremental conceptual clustering algorithm, derived from lattice theory (Godin et al., 1995b), which is fast to compute (Mineau $\&$ Godin, 1995). This algorithm is especially useful when dealing with large data or knowledge bases, making classification structures available to large size applications like those found in industrial settings.

However, in order to be applicable on large data sets, the analysis component of the algorithm had to be simplified: the thoroguh comparison of objects normally needed to fully justify the formation of classes had to be cut down. Of course, from less analysis results classes which carry les semantics, or which should not have been formed in the first place. Consequently, some classess are useless in terms of the information needs of the applications that will later on interact with the data. Pruning techniques are thus needed to eliminate these classes and simplify the classification structure.

However, since these classification structures are huge, the pruning techniques themselves must be simle so that they can be applied in reasonable time on large classification structures. This paper presents three such techniques: one is based on the definition of constraints over the generalization language, the other two are based on discrimination metrics applied on links between classes or on the classes themselves. Because the first technique is applied before the classification structure is built, it is called the pre-pruning technique, while the other two are called post-pruning techniques.

Keywords:

First posting
2000-03-08
In ETAI area "Concept Based Knowledge Representation"
Original publication
2000-12-05
Postscript part I -- Checksum
Postscript part II -- Checksum II
Revised publication
2001-03-20
Postscript Checksum
Info from authors  
Third-party information  

This article was first posted on the Internet as specified under "First posting", and appeared on the E-press server on the date specified under "Original publication".


[About LiEP] [About Checksum validation] [About compression formats]

Editor-in-chief: editor@ep.liu.se
Webmaster: webmaster@ep.liu.se