All Packages  Class Hierarchy  This Package  Previous  Next  Index  WEKA's home

Class weka.classifiers.MetaCost

java.lang.Object
   |
   +----weka.classifiers.Classifier
           |
           +----weka.classifiers.MetaCost

public class MetaCost
extends Classifier
implements OptionHandler
This metaclassifier makes its base classifier cost-sensitive using the method specified in

Pedro Domingos (1999). MetaCost: A general method for making classifiers cost-sensitive, Proceedings of the Fifth International Conference on Knowledge Discovery and Data Mining, pp. 155-164. Also available online at http://www.cs.washington.edu/homes/pedrod/kdd99.ps.gz.

This classifier should produce similar results to one created by passing the base learner to Bagging, which is in turn passed to a CostSensitiveClassifier operating on minimum expected cost. The difference is that MetaCost produces a single cost-sensitive classifier of the base learner, giving the benefits of fast classification and interpretable output (if the base learner itself is interpretable). This implementation uses all bagging iterations when reclassifying training data (the MetaCost paper reports a marginal improvement when only those iterations containing each training instance are used in reclassifying that instance).

Valid options are:

-W classname
Specify the full class name of a classifier (required).

-C cost file
File name of a cost matrix to use. If this is not supplied, a cost matrix will be loaded on demand. The name of the on-demand file is the relation name of the training data plus ".cost", and the path to the on-demand file is specified with the -D option.

-D directory
Name of a directory to search for cost files when loading costs on demand (default current directory).

-I num
Set the number of bagging iterations (default 10).

-S seed
Random number seed used when reweighting by resampling (default 1).

-P num
Size of each bag, as a percentage of the training size (default 100).

Options after -- are passed to the designated classifier.

Author:
Len Trigg (len@intelligenesis.net)

Variable Index

 o MATRIX_ON_DEMAND
 o MATRIX_SUPPLIED
 o TAGS_MATRIX_SOURCE

Constructor Index

 o MetaCost()

Method Index

 o buildClassifier(Instances)
Builds the model of the base learner.
 o classifyInstance(Instance)
Classifies a given test instance.
 o getBagSizePercent()
Gets the size of each bag, as a percentage of the training set size.
 o getClassifier()
Gets the distribution classifier used.
 o getCostMatrix()
Gets the misclassification cost matrix.
 o getCostMatrixSource()
Gets the source location method of the cost matrix.
 o getNumIterations()
Gets the number of bagging iterations
 o getOnDemandDirectory()
Returns the directory that will be searched for cost files when loading on demand.
 o getOptions()
Gets the current settings of the Classifier.
 o getSeed()
Get seed for resampling.
 o listOptions()
Returns an enumeration describing the available options
 o main(String[])
Main method for testing this class.
 o setBagSizePercent(int)
Sets the size of each bag, as a percentage of the training set size.
 o setClassifier(Classifier)
Sets the distribution classifier
 o setCostMatrix(CostMatrix)
Sets the misclassification cost matrix.
 o setCostMatrixSource(SelectedTag)
Sets the source location of the cost matrix.
 o setNumIterations(int)
Sets the number of bagging iterations
 o setOnDemandDirectory(File)
Sets the directory that will be searched for cost files when loading on demand.
 o setOptions(String[])
Parses a given list of options.
 o setSeed(int)
Set seed for resampling.
 o toString()
Output a representation of this classifier

Variables

 o MATRIX_ON_DEMAND
 public static final int MATRIX_ON_DEMAND
 o MATRIX_SUPPLIED
 public static final int MATRIX_SUPPLIED
 o TAGS_MATRIX_SOURCE
 public static final Tag TAGS_MATRIX_SOURCE[]

Constructors

 o MetaCost
 public MetaCost()

Methods

 o listOptions
 public Enumeration listOptions()
Returns an enumeration describing the available options

Returns:
an enumeration of all the available options
 o setOptions
 public void setOptions(String options[]) throws Exception
Parses a given list of options. Valid options are:

-W classname
Specify the full class name of a classifier (required).

-C cost file
File name of a cost matrix to use. If this is not supplied, a cost matrix will be loaded on demand. The name of the on-demand file is the relation name of the training data plus ".cost", and the path to the on-demand file is specified with the -D option.

-D directory
Name of a directory to search for cost files when loading costs on demand (default current directory).

-I num
Set the number of bagging iterations (default 10).

-S seed
Random number seed used when reweighting by resampling (default 1).

-P num
Size of each bag, as a percentage of the training size (default 100).

Options after -- are passed to the designated classifier.

Parameters:
options - the list of options as an array of strings
Throws: Exception
if an option is not supported
 o getOptions
 public String[] getOptions()
Gets the current settings of the Classifier.

Returns:
an array of strings suitable for passing to setOptions
 o getCostMatrixSource
 public SelectedTag getCostMatrixSource()
Gets the source location method of the cost matrix. Will be one of MATRIX_ON_DEMAND or MATRIX_SUPPLIED.

Returns:
the cost matrix source.
 o setCostMatrixSource
 public void setCostMatrixSource(SelectedTag newMethod)
Sets the source location of the cost matrix. Values other than MATRIX_ON_DEMAND or MATRIX_SUPPLIED will be ignored.

Parameters:
newMethod - the cost matrix location method.
 o getOnDemandDirectory
 public File getOnDemandDirectory()
Returns the directory that will be searched for cost files when loading on demand.

Returns:
The cost file search directory.
 o setOnDemandDirectory
 public void setOnDemandDirectory(File newDir)
Sets the directory that will be searched for cost files when loading on demand.

Parameters:
newDir - The cost file search directory.
 o setClassifier
 public void setClassifier(Classifier classifier)
Sets the distribution classifier

Parameters:
classifier - the distribution classifier with all options set.
 o getClassifier
 public Classifier getClassifier()
Gets the distribution classifier used.

Returns:
the classifier
 o getBagSizePercent
 public int getBagSizePercent()
Gets the size of each bag, as a percentage of the training set size.

Returns:
the bag size, as a percentage.
 o setBagSizePercent
 public void setBagSizePercent(int newBagSizePercent)
Sets the size of each bag, as a percentage of the training set size.

Parameters:
newBagSizePercent - the bag size, as a percentage.
 o setNumIterations
 public void setNumIterations(int numIterations)
Sets the number of bagging iterations

 o getNumIterations
 public int getNumIterations()
Gets the number of bagging iterations

Returns:
the maximum number of bagging iterations
 o getCostMatrix
 public CostMatrix getCostMatrix()
Gets the misclassification cost matrix.

Returns:
the cost matrix
 o setCostMatrix
 public void setCostMatrix(CostMatrix newCostMatrix)
Sets the misclassification cost matrix.

Parameters:
the - cost matrix
 o setSeed
 public void setSeed(int seed)
Set seed for resampling.

Parameters:
seed - the seed for resampling
 o getSeed
 public int getSeed()
Get seed for resampling.

Returns:
the seed for resampling
 o buildClassifier
 public void buildClassifier(Instances data) throws Exception
Builds the model of the base learner.

Parameters:
data - the training data
Throws: Exception
if the classifier could not be built successfully
Overrides:
buildClassifier in class Classifier
 o classifyInstance
 public double classifyInstance(Instance instance) throws Exception
Classifies a given test instance.

Parameters:
instance - the instance to be classified
Throws: Exception
if instance could not be classified successfully
Overrides:
classifyInstance in class Classifier
 o toString
 public String toString()
Output a representation of this classifier

Overrides:
toString in class Object
 o main
 public static void main(String argv[])
Main method for testing this class.

Parameters:
argv - should contain the following arguments: -t training file [-T test file] [-c class index]

All Packages  Class Hierarchy  This Package  Previous  Next  Index  WEKA's home