Hands-On Sessions: RDF and Linked Data

Overview

The RDF and Linked Data related hands-on sessions are based on the Nobel Prize Laureates from 1901-2016 dataset (or, short, Nobel Prize dataset). You may take a look at the specification of this dataset or download an RDF dump of the dataset (14.8 MB).

Theme 1: SPARQL

In this hands-on, you will exercise the creation of SPARQL queries by writing queries over the Nobel Prize dataset to capture the following questions. To this end, we have set up a SPARQL editor that is connected to the dataset and, thus, allows you to test your queries.

Question 1: List all Nobel Prize categories (hint: you are looking for everything of type nobel:Category).
Question 2: List the names of all persons who won the Nobel Prize in Chemistry (hint: the full name of a Nobel Prize laureate can be accessed using the foaf:name property).
Question 3: Who won the Nobel Prize in Chemistry in 1911?
Question 4: List the names of all persons who won the Nobel Prize in Chemistry, ordered by the year of the award.
Question 5: List all female laureates (hint: the gender of a laureate can be accessed by using the foaf:gender property).
Question 6: For all Nobel Prize laureates from 2000, list their name, date of birth, and date of death if the person has passed away (hint: use an OPTIONAL clause in your query).
Question 7: List all Nobel Prize laureates from Sweden or Finland (hint: use a UNION clause in your query).
Question 8: What is the birth year of persons who won the Nobel Prize in Chemistry? (hint: to obtain the birth year as a query variable you may use a BIND clause with the year function).
Question 9: What is the average age of persons who won the Nobel Prize in Chemistry? (note that this question is about the age the laureates had when they won)
Question 10: For every Nobel Prize category, what is the average age of persons who won the Nobel Prize in the category?

Theme 2: RDF Triple Stores

In this hands-on, you will set up a triple store on your computer and load the Nobel Prize dataset. The triple store that we use is Blazegraph. To this end, please download Blazegraph (you need the file that is called blazegraph.jar).

Now, you can start up Blazegraph by executing the following command at the command line.

java -server -Djetty.port=9999 -jar blazegraph.jar

If Blazegraph has been started successfully, it will report a local Web address where it provides a workspace. This address should be http://127.0.0.1:9999/blazegraph. After opening this address in a browser window, you should see the following welcome page.

To upload a dataset, you need to click UPDATE in the menu, which brings you to the following upload page. Here you click Choose File to select the dump file of the dataset in your computer (you have to download the aforementioned RDF dump of the Nobel Prize dataset before), and click the Update button below.

After the uploading is finished, you can go click QUERY in the menu. This brings you to the following query page where you can write and execute SPARQL queries. Try some of the queries you wrote in the previous hands-on session!

As a final (and advanced) task in this hands-on, capture the following questions using SPARQL queries that employ the full-text search feature of Blazegraph.

Question 1: List the laureate awards (given by their label) for which the description of the contribution contains the word "human".
Question 2: List the laureate awards (given by their label) for which the description of the contribution contains the word "human" together with the word "behavior" (i.e., both words must be there).
Question 3: For the laureate awards for which the description of the contribution contains the word "theory", list the ones that are the top-3 hits in the search (hint: you need to use ORDER BY together with LIMIT).

Hint: Make sure that full-text search is enabled for the namespace into which you have loaded the dataset. Click on NAMESPACES in the menu. At the namespace page you can check which namespace you are using and see the properties of the namespace. If the parameter "com.bigdata.rdf.store.AbstractTripleStore.textIndex" is true, the full-text search should be enabled. Otherwise, you need to click "Rebuild Full Text Index."

Theme 3: Linked Data Publishing

In this hands-on, you will use Pubby to set up a Linked Data server for the Nobel Prize dataset on your computer. Note that Pubby retrieves the data that it returns in response to a URI look-up request by querying a SPARQL endpoint using SPARQL DESCRIBE queries. The Blazegraph server that you have set up during the second hands-on session provides a SPARQL endpoint for the Nobel Prize dataset that you have loaded. We will use this SPARQL endpoint as the back-end for Pubby. Hence, your Blazegraph server must be running for the following tasks to work correctly.

You also need a Java servlet container such as Jetty or Tomcat because Pubby runs as a servlet in such a container. If you do not have a Java servlet container on your computer that you can use, Jetty is a light option for you. Download the latest version Jetty and unzip (note that this version of Jetty requires Java 8; if you do not have it, download a version of Jetty for Java 7).

Next, download Pubby (use the latest version) and unzip. Now copy the 'webapp' directory of Pubby into 'webapps' directory of Jetty/Tomcat. Rename the resulting 'webapp' directory (inside the 'webapps' directory) to 'mydataset'.

Modify the configuration file of Pubby. The file is called 'config.ttl' and you find it in 'webapps/mydataset/WEB-INF/'. Use the following configuration (and adjust if needed).

<> a conf:Configuration;
    # Project name for display in page titles
    conf:projectName "My Test";
    # Homepage with description of the project for the link in the page header
    conf:projectHomepage <http://localhost:8081/mydataset/>;
    # The Pubby root, where the webapp is running inside the servlet container.
    conf:webBase <http://localhost:8081/mydataset/>;
    # If labels and descriptions are available in multiple languages,
    # prefer this one.
    conf:defaultLanguage "en";

    conf:dataset [
        # SPARQL endpoint URL of the dataset
        conf:sparqlEndpoint <http://127.0.0.1:9999/blazegraph/sparql>;
        # Common URI prefix of all resource URIs in the SPARQL dataset
        conf:datasetBase <http://data.nobelprize.org/resource/>;
        # Will be appended to the conf:webBase to form the public
        # resource URIs; if not present, defaults to ""
        conf:webResourcePrefix "resource/";
        # Fixes an issue with the server running behind an Apache proxy;
        # can be ignored otherwise
        conf:fixUnescapedCharacters "(),'!$&*+;=@";
    ] .

Now, start the servlet container. For instance, for Jetty, execute the following command (you should take care of the port you are using):

java -jar start.jar jetty.port=8081

If everything went well, you can now perform URI lookups, For instance, access following URIs in your browser.

In addition to using your browser (which typically only requests HTML representations of the data), you can perform the URI lookups by using the curl command, which allows you to request representations in other formats. For example, to obtain Turtle or RDF/XML representations you may use curl commands such as the following.

curl -L -H "Accept: text/turtle" http://localhost:8081/mydataset/resource/laureate/458
curl -L -H "Accept: application/rdf+xml" http://localhost:8081/mydataset/resource/laureate/458

As an extra feature, you may remove the -L parameter in the curl commands to observe what happens if curl does not automatically follow redirects in HTTP responses. Try also with an Accept header that requests text/html.

Page responsible: Olaf Hartig
Last updated: 2018-03-09

IDA - Department of Computer and Information Science

Hands-On Sessions: RDF and Linked Data

Overview

Theme 1: SPARQL

Theme 2: RDF Triple Stores

Theme 3: Linked Data Publishing