Vaida Jakoniene

A Study in Integrating Multiple Biological Data Sources

Life scientists often have to retrieve data from multiple biological
data sources to solve their research problems. Although many data
sources are available, they vary in content, data format, and access
methods, which often vastly complicates the data retrieval process.
The user must decide which data sources to access and in which
order, how to retrieve the data and how to combine the results - in
short, the task of retrieving data requires a great deal of
effort and expertise on the part of the user.

Information integration systems aim to alleviate these problems by
providing a uniform (or even integrated) interface to biological
data sources. The information integration systems currently
available for biological data sources use traditional integration
approaches. However, biological data and data sources have unique
properties which introduce new challenges, requiring development of
new solutions and approaches.

This thesis is part of the BioTrifu project, which explores
approaches to integrating multiple biological data sources. First,
the thesis describes properties of biological data sources and
existing systems that enable integrated access to them. Based on the
study, requirements for systems integrating biological data sources
are formulated and the challenges involved in developing such
systems are discussed. Then, the thesis presents a query language
and a high-level architecture for the BioTrifu system that meet
these requirements. An approach to generating a query plan in the
presence of alternative data sources and ways to integrate the data
is then developed. Finally, the design and implementation of a
prototype for the BioTrifu system are presented.

This work has been supported by CUGS (the national graduate
school in computer science) and by the EU Network of Excellence
REWERSE (Sixth Framework Programme project 506779).

