Hide menu

Hands-On Sessions: RDF and SPARQL

Overview

The hands-on sessions related to RDF and SPARQL are based on the Nobel Prize Laureates from 1901-2016 dataset (or, short, Nobel Prize dataset). You may take a look at the specification of this dataset



Theme 1: RDF

In this hands-on, you will work with a small RDF graph that consists of a few triples of the Nobel Prize dataset. A Turtle serialization of this small RDF graph is given below (alternatively, you can also download an actual Turtle file with exactly the same content).


@prefix owl: <http://www.w3.org/2002/07/owl#> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/0.1/> .

<http://data.nobelprize.org/resource/laureate/962> a <http://data.nobelprize.org/terms/Laureate>,
    foaf:Person;
  foaf:name "Donna Strickland";
  foaf:familyName "Strickland";
  foaf:givenName "Donna";
  rdfs:label "Donna Strickland";
  <http://dbpedia.org/ontology/affiliation> <http://data.nobelprize.org/resource/university/University_of_Waterloo>;
  foaf:gender "female";
  <http://dbpedia.org/property/dateOfBirth> "1959-05-27"^^xsd:date;
  <http://dbpedia.org/ontology/birthPlace> <http://data.nobelprize.org/resource/country/Canada>,
    <http://data.nobelprize.org/resource/city/Guelph>;
  <http://data.nobelprize.org/terms/nobelPrize> <http://data.nobelprize.org/resource/nobelprize/Physics/2018>;
  owl:sameAs <http://www.wikidata.org/entity/Q56855591>;
  foaf:birthday "1959-05-27"^^xsd:date .

<http://data.nobelprize.org/resource/country/Canada> a <http://dbpedia.org/ontology/Country>;
  rdfs:label "Canada"@en, "Kanada"@sv, "Canada"@no;
  owl:sameAs <http://www.wikidata.org/entity/Q16> .

Before starting to work on the tasks of this hands-on, explore the data a bit. Here are a few things that you can try:

  • Visualize the graph using RDF Grapher. To this end, copy the complete snippet of Turtle into the text box of RDF Grapher and click on the 'Visualize' button.
  • Look at a different visualization by using the Zazuko Sketch service. Notice that you can move around the boxes in the visualization and that you can hover over the arrows with your mouse, which highlights the corresponding triple.
  • Convert the Turtle representation of this data into the RDF/XML format and into the JSON-LD format by using the Zazuko Converter service. To this end, copy the complete snippet of Turtle into the 'Input' text box of the converter and make sure to select 'text/turtle' in the 'Input format' drop-down field in the top left. Then, click on 'JSON-LD' and on 'RDF/XML' in the 'Output' panel. Observe how the triples of the RDF graph are represented in these two serialization formats.
  • Another popular serialization format for RDF data is N-Triples. Since the Zazuko Converter does not support N-Triples as an output format, you may use the EasyRDF Converter instead. Paste the complete snippet of Turtle into the 'Input Data' text box of that converter and select 'N-Triples' in the 'Output Format' drop-down field. Observe how, in the N-Triples format, every triple of the RDF graph is written in a separate line and that this format does not use any prefix-based shortening of the URIs in the data.

Now you should be ready to work on the following tasks.

  • Task 1: Answer the following question. How many triples does this small example RDF graph consist of? (Hint: remember that the N-Triples format represents the data with one triple per line.)
  • Task 2: Observe that the data describes, among other things, the Nobel Prize laureate Donna Strickland, who is identified in the data by the URI http://data.nobelprize.org/resource/laureate/962. The data mentions two classes/types that she belongs to. Write down the complete/absolute URIs of both of these classes/types (i.e., your answer should not contain the URIs written in the compact form with prefixes).
  • Task 3: Observe that the RDF graph also contains some data about the country of Canada, including the country name given in three different languages. Extend the data with another triple that gives the country name in yet another language. For instance, the German name is "Kanada" and the language code for German is de (however, feel free to use any other language). To add the triple you may edit the Turtle representation within the editor of the Zazuko Sketch service, which gives you error messages if you have syntax errors in your Turtle code and, additionally, you can immediately see the graph visualization adapt to your changes.
  • Task 4: Observe that the data also mentions the University of Waterloo which Donna Strickland is affiliated with. This university is located in the city of Waterloo, but this information is not captured by the given RDF data. Change that! Extend the data with two more triples. One of them should say that the University of Waterloo is located in Waterloo. To this end, you need to introduce a new URI with which you identify the city of Waterloo in the data (feel free to create whatever URI you want) and, for the predicate of that new triple, use the URI http://example.org/locatedIn. The second new triple should provide the name of Waterloo using the rdfs:label predicate (i.e., similar to how the name of Canada is provided, but no need for a language tag in this case).
  • Task 5: Extend the data further to describe that the city of Waterloo is located in Ontario and that Ontario is a part of Canada. This should be done by adding three more triples: one that captures the relationship between Waterloo and Ontario (use again the URI http://example.org/locatedIn for that triple), one that captures the relationship between Ontario and Canada (use the URI http://example.org/isPartOf as the predicate of that triple), and one the captures the name of Ontario using the rdfs:label predicate again. In contrast to the previous task, however, do not introduce a URI to identify Ontario but, instead, represent Ontario as a blank node.
  • Task 6: In the previous two tasks you had to use the new predicate URIs http://example.org/locatedIn and http://example.org/isPartOf. Assuming you have written these URIs as absolute URIs in your Turtle representation of the data, change that now by writing all occurrences of these URIs in a compact form using the prefix name ex for the URI prefix http://example.org/ (e.g., write the compact form ex:locatedIn instead of the absolute form http://example.org/locatedIn). Note, to be able to do that you have to declare the new prefix name ex in the beginning of the Turtle file.


Theme 2: SPARQL

In this hands-on, you will practice working with the SPARQL query language by writing queries over the (complete) Nobel Prize dataset. To this end, you can go to https://data.nobelprize.org/sparql where you find a SPARQL editor that is connected to a SPARQL endpoint for the dataset. A nice feature of this editor is that it provides auto-completion of the possible properties and classes, and it even automatically adds prefix declarations for common URI prefix names as soon as you use these prefix names in the query. Now, the exercise is to capture the following questions as SPARQL queries. For some of the questions, you may need to look at the specification of the dataset to see which URIs the dataset uses to identify the classes of things described in the dataset and the properties that these things may have.

  • Question 1: Recall the data about Donna Strickland above. Return the URI that identifies the particular Nobel Prize that she won. By the data above, you can see that this URI is http://data.nobelprize.org/resource/nobelprize/Physics/2018. However, you should query for this URI via a SPARQL query that contains a single triple pattern with the URI for Donna Strickland in the subject position and the predicate http://data.nobelprize.org/terms/nobelPrize (or simply nobel:nobelPrize if you want to use the compact form).
  • Question 2: Extend the query from the previous point to retrieve the URI of the category in which Donna Strickland's Nobel Prize is. Hint: The category of any particular Nobel Prize is captured via the nobel:category property of the Nobel Prize, as documented in the corresponding part of the dataset specification.
  • Question 3: List the names of all Nobel Prize laureates who won a Nobel Prize in the category Chemistry. Hints: The URI that identifies this category is nobel:Chemistry and the full name of a Nobel Prize laureate can be accessed using the foaf:name property (as you can also observe in the example data about Donna Strickland above).
  • Question 4: Who won the Nobel Prize in Chemistry in 1911?
  • Question 5: List the names of all persons who won the Nobel Prize in Chemistry, ordered by the year of the award.
  • Question 6: List all female laureates (hint: the gender of a laureate can be accessed by using the foaf:gender property).
  • Question 7: For all laureates who won a Nobel Prize in 2000, list their name, date of birth, and date of death if the person has passed away (hint: use an OPTIONAL clause in your query).
  • Question 8: List all Nobel Prize laureates from Sweden or Finland (hint: use a UNION clause in your query).
  • Question 9: What is the birth year of persons who won the Nobel Prize in Chemistry? (hint: to obtain the birth year as a query variable you may use a BIND clause with the year function).
  • Question 10: What is the average age of persons who won the Nobel Prize in Chemistry? (note that this question is about the age the laureates had when they won)



Page responsible: Olaf Hartig
Last updated: 2023-03-01