Semantic Technologies in Practice - Exercises for Part II
Below you find the exercises for the seminars.
Session 1 - Re-engineering and publishing RDF data
In this session we will do two (or more if you have time) small exercises in order to experience some tools for re-engineering legacy data into RDF, and querying the resulting RDF datasets. Most of the tools your are going to use are built by researchers, hence, they can be a bit "buggy" so be patient. Also the transformations may need a bit of "handywork", but this is one of the points of this exercise, to see the low-level bits and pieces up close!
Note that these tools have not been chosen because they are necessarily the best tools to use, rather they have been chosen because they are 1) quick to set up and get going with, and 2) display typical functionalities that you would find in such a tool. Also, the workflow is not an optimal one, but illustrates some of the steps that data might go through before it actually enters your application. If you need a tool for a real task in your research or a software project, check out the W3C tool pages that list some of the current options and their functionalities: http://www.w3.org/2001/sw/wiki/Tools
When you are done, export/save and hand in (by e-mail) the resulting RDF datasets from the two first exercises, and your SPARQL query that asks for data from both the RDF graphs using owl:sameAs. The datasets will not be assessed for "correctness", but simply act as a receipt that you have tried all the tools and were able to produce some results. Deadline for handing in your solutions is 1/11 2012, but if you are done during the session hand it in immediately.
For some of the exercises you will use the FOAF vocabulary. If you are not familiar with it, take a look here. You can also download it and lod it in an ontology editor of your choice.
Task 1 - D2R server
The task is to use the D2R server to set up a virtual RDF dataset that mirrors the content of a database, and use the D2R graphical interface to ask SPARQL queries against the graph. Note that you will all be working on the same database, hence updates will be visible to all.
Install D2R server on your laptop, and then follow steps 6-11 of the D2R tutorial that you can find here. The database already set up is the same as suggested by the tutorial, so if you are curious what it contains, take a look at the file you can find in the tutorial - or what until you start to ask queries against it.
The credentials for the database to use are the following:
- host: mysql315.loopia.se
- user name(s): d2rX@o45820 (where X is a number between 1 and 5)
- password: (given at the course, or by e-mail if you do the exercise on your own)
- database name: ontology_se_db_2
Export a dump of the RDF graph from the D2R server, by using the following console command: dump-rdf -f RDF/XML -b "a URI of your choice" -o "name of outfile".rdf -m mapping-iswc.n3
Submit the resulting RDF graph as the result of this exercise, but you will also continue to use it in the next task.
Task 2 - Google Refine + Jena Fuseki
The task is to use Google Refine to transform an Excel sheet into an RDF graph, load it into a simple SPARQL server that exposes the dataset over HTTP for posing SPARQL queries against it. In addition you will link your two datasets (from task 1 and 2) through one or more owl:sameAs statements, and pose a query that retrieves data from both sets.
If you are not familiar with Google Refine, you may want to start by watching the three tutorial videos, but for this task you are probably not going to need any of the data transformations. What you will need to use is the RDF export facility of the plugin you installed, hence, familiarize yourself with its functionality (in particular you may want to look at the explanation for the RDF export here).
Then start Google refine and import this Excel sheet. Your task is to re-engineer the data in the Excel sheet by using either existing vocabularies suggested by Google refine, e.g. foaf, or any vocabulary you would like to refer to (or one defined locally). Include all the data in the Excel sheet, and export it as an RDF graph.
Next, download and install the Jena Fuseki SPARQL server software from this location (pick one of the first ones named *distribution*). Perform the steps under the heading "Getting started with Fuseki" including to start the server, as described on this page.
While you are downloading the software and installing it, also edit the two RDF datasets that you have. In order to be able to use them together, you should add some owl:sameAs statements to them (one of them is enough). You can do this by loading the datasets into a tool and adding the triples, or you can simply open the file(s) in a text editor and manually editing them. If you are unsure about the syntax, the file exported from D2R contains a few sameAs statements already, but not referring to your dataset.
Once Fuseki is up and running, go to the graphical interface at http://localhost:3030/ and through the Control Panel select the default dataset and proceed to the graphical query interface. By using the bottom upload facility, load the data you reengineered and exported from the Excel sheet into the server's default graph. Then do the same with the other dataset exported from D2R server. If you are unsure what you have loaded, use the SPARQL query interface, write a couple of SPARQL queries against your dataset and check the results.
Finally, write one SPARQL query that makes use of your owl:sameAs statement(s) to retrieve data about an element (the one that exists in both datasets and that you have linked) from both datasets. After checking that it runs correctly, save the query in a text file.
Submit the reengineered data from the Excel sheet, and the text file with the query, as the result of this exercise.
Extra task (optional) - Rule based reengineering with Semion
If you have time left in the session, also try out the Semion reengineering tool. Note that this exercise is optional, and you don't have to hand in any result from it.
Extra task (optional) - D2R server with SPARQL update
If you have time left in the session, also try out the D2R server plugins for SPARQL UPDATE. You will find the install instructions and tutorial here.
Session 2 - Using LD to enrich data from social streams
Exercises are at the end of the lecture slides that you can find here.
Page responsible: Eva Blomqvist
Last updated: 2012-09-20