Title: | Using the Real Dimension of the Data. |
Authors: | Christian Zirkelbach |
Series: | Linköping Electronic
Articles in Computer and Information Science ISSN 1401-9841 |
Issue: | Vol. 5(2000): nr 004 |
URL: | http://www.ep.liu.se/ea/cis/2000/004/ |
Abstract: |
This paper presents a method for extracting the real
dimension of a large data set in a high-dimensional data cube
and indicates its use for visual data mining. A similarity
measure structures a data set in a general, but weak sense.
If the elements are part of a high-dimensional host space
(primary space), for instance a data warehouse cube, the resulting
structure doesn't necessarily reflect the real dimension of the
embedded (secondary) space. We show that a metric-structured set
has, in general, a fractal dimension. This means that the data set is a
finite subset of a fractal secondary space of lower dimension.
Mapping the set into the secondary space of lower dimension will not result in loss of information with regard to the semantics defined by the measure. However, it helps to reduce storage and computing efforts. Additionally, the secondary space itself reveals much about the set's structure and can facilitate data mining.
The main problem with the secondary space is that it is unknown,
and if it is not a linear sub-space of
|
---|---|
Keywords: |
First posting 2000-03-08 | In ETAI area "Concept Based Knowledge Representation" |
---|---|
Intended publication 2000-12-05 |
Postscript part I --
Checksum Postscript part II -- Checksum II |
Info from authors | |
Third-party information |
Editor-in-chief: editor@ep.liu.se Webmaster: webmaster@ep.liu.se | ~ |