KitEGA logo
Home   Online   Downloads   Publications   Credits

Welcome to the KitEGA web site.

During the last decade an enormous amount of biological data has been generated and techniques and tools to analyze this data are being developed. Many of these tools use some form of grouping. They organize the data according to a certain aspect or a combination of aspects.

Grouping of data entries in one or more data sources is an operation underlying many different data management tasks. Grouping can be used to structure and visualize search results. This is especially important when large data sources are studied. It may lead to the discovery of new knowledge or may allow to locate the information of interest faster. The identification of similar data entries and their grouping are also core operations for data cleaning and data integration.

A number of aspects influence the quality of the grouping results: the quality of the data sources, the selection of the grouping attributes and the algorithms implementing the grouping procedure. Many methods exist, but it is often not clear which methods perform best for which grouping tasks. The study of the properties, and the evaluation and the comparison of the different aspects that influence the quality of the grouping results, would give us valuable insight in how the grouping procedures could be used in the best way. It would also lead to recommendations on how to improve the current procedures and develop new procedures. To be able to perform such studies and evaluations we need environments that allow us to compare and evaluate different grouping procedures. KitEGA is such an environment.