TDDD43 Advanced Data Models and Databases

Data sets and schemas for theme 3 laboration

General tips

Annotation/shredding procedure

Open the XML schema in HShreX
Apply annotations
Create relational schema In database
Open MSSQL Server Management Studio and run
Load XML document in HShreX
Open generate run

Namespaces

Some of the data sets have defined namespaces and then you have to state the namespace in the query by defining a prefix to the namespace in question. For more information on XQuery capabilities in MSSQL see the lab material links.

Timing and query plans

To get timing information from MSSQL, you add the line

SET STATISTICS TIME ON

in the beginning of your script. Remember to reissue the queries a number of times to overcome issues with fluctuations in speed. (What causes them?)

MSDN help document for SET STATISTICS TIME

To show the query plan you add the line

SET STATISTICS PROFILE ON

in the beginning of your script.

MSDN help document for SET STATISTICS PROFILE

The permission to run this command was added for you on Thu, Oct 1st. Try again if you had problem with this before.

Data sets

SBML (Reactome)

A simplified example of the SBML format was given in the first lectures of the course. The schema file, sbmlc.xsd, has the ShreX namespace added but no annotations are using that namespace as is. You have to add them yourself according to the mappings you want to try out. This dataset is from a model for reactions within a human. For this dataset you have to turn off validation of XML documents (see page 9 in the HShreX how-to PDF file).

One interesting query to try is graph traversal (refer to question e in Lab 2), beginning with some species and going forward, but try some other queries as well. In SBML a list of reactants for a reaction is connected to a list of products. Two reactions can be connected if a speciesReference in the listOfProducts for the first reaction is equal to a speciesReference in the listOfReactants in the second reaction. For a fixed number of steps you can write an SQL query with joins. For a varying number of steps you need some recursive solution.

Schema and data file

SBML (Biomodels)

This dataset uses the same dataformat and schemas as the previous one. The dataset is however richer and contains more innformation. For this dataset it is also possible to try queries on graph traversal. Other options are queries on finding more information for the reactants and products in a reaction.

Note that this dataset contains a large number of files. Use HShrex option for direktory load of data files.

Schema and data files

Michigan Benchmark

This is a schema and data file from an XML benchmark that has been modified to work with HShreX and reduced in size. The data is purely artificial but attributes are present on all elements (e.g. aUnique1, aSixteen and aString) with values that you can use to formulate a query to select a number of nodes. Example queries for the original schema are presented on their website, in the schema and data provided here elements names have been shortened and the number of levels reduced to 14.

Note that some annotations have already been added to this schema and you may have to remove existing shrex:tablename annotations if you add a shrex:maptoxml annotation to the same element.

For this dataset you have to turn off validation of XML documents (see page 9 in the HShreX how-to PDF file).