Abstract - Ph D thesis Vaida Jakoniene

Integration of Biological Data

Data integration is an important procedure underlying many research
tasks in the life sciences, as often multiple data sources have to
be accessed to collect the relevant data. The data sources vary in
content, data format, and access methods, which often vastly
complicates the data retrieval process. As a result, the task of
retrieving data requires a great deal of effort and expertise on the
part of the user. To alleviate these difficulties, various
information integration systems have been proposed in the area.
However, a number of issues remain unsolved and new integration
solutions are needed.

The work presented in this thesis considers data integration at
three different levels. 1) Integration of biological data sources
deals with integrating multiple data sources from an information
integration system point of view. We study properties of biological
data sources and existing integration systems. Based on the study,
we formulate requirements for systems integrating biological data
sources. Then, we define a query language that supports queries
commonly used by biologists. Also, we propose a high-level
architecture for an information integration system that meets a
selected set of requirements and that supports the specified query
language. 2) Integration of ontologies deals with finding
overlapping information between ontologies. We develop and evaluate
algorithms that use life science literature and take the structure
of the ontologies into account. 3) Grouping of biological data
entries deals with organizing data entries into groups based on the
computation of similarity values between the data entries. We
propose a method that covers the main steps and components involved
in similarity-based grouping procedures. The applicability of the
method is illustrated by a number of test cases. Further, we develop
an environment that supports comparison and evaluation of different
grouping strategies.

The work is supported by the implementation of: 1) a prototype for a
system integrating biological data sources, called BioTRIFU, 2)
algorithms for ontology alignment, and 3) an environment for
evaluating strategies for similarity-based grouping of biological
data, called KitEGA.



Travel reports

Licentiate seminars


Courses Spring 2016


Last modified on August 2006 by Anne Moe