Hide menu

Project description:
Hybrid XML storage mappings for efficient data management on the Web.

The Web is rapidly evolving into a ubiquitous computing platform for a new generation of information systems. Cheap and more accessible computers have enabled timely and efficient access to a variety of data in different domains, and increasingly, these data are represented and exchanged in XML. XML is a flexible exchange format that can represent many classes of data, including unstructured text; well-structured records such as those in relational databases; and semi-structured and hierarchical data. However, supporting the flexibility that makes XML attractive to different applications is challenging. As XML data becomes central to applications, there is a growing need for efficient and reliable XML data management tools and techniques.

Two main approaches have been used for managing XML data: native databases designed specifically for XML; and mapping strategies which shred XML documents to be stored in traditional relational databases. More recently, hybrid approaches have been proposed which try to combine native and mapping-based approaches. Hybrid implementations are currently provided by the major relational database vendors (Oracle, IBM and Microsoft). It is well known that the choice of storage greatly impacts querying efficiency.

One important issue for the future is to design novel mechanisms that help users define efficient hybrid mappings. Based on our previous research we have seen a number of features that may have an impact on how to achieve efficient storage; the complexity and regularity of the XML structure; how the data is queried, i.e. the access patterns for different entities in the data set; the frequency of references to other sources; use of ontology information and application specific methods for comparison.

To learn more about this there is a need for tools which easily, based on an XML schedule or the content of an XML file, can suggest and translate the schedule into a hybrid data model. The purpose of this would be to provide an environment where efficiency easily can be evaluated and compared between translation alternatives. The goal for this project is to explore and develop such tools. Based on the experience gained in building and using the tool we build knowledge on which features and what properties of the dataset, query load, or schema definition that is important for the design of efficient storage for XML.

More information (in swedish) is available here.

Page responsible: Lena Strömbäck
Last updated: 2009-06-25