Hide menu

Semantic Technologies in Practice - Part 1


Part 1 one of the course is a self-study block to be taken between mid-August and mid-September 2012, by those students not already familiar with the Semantic Web and ontology Engineering in OWL. Deadline for handing in the exercises is September 12. Only the solutions to two of the exercises have to be handed in. Submit your answers as a zip file (include your last name in the name of the file) containing the OWL files you have produced (including any additional OWL-files that your ontologies import) by e-mail to eva.blomqvist--at--liu.se

Each section below consists of some reading material, complemented by video lectures, and some exercises. Note that some part of the material is suggested as optional reading if you are interested, while other parts are strongly recommended. For questions, you are welcome to send med an e-mail and/or book an appointment. Some exercises are only included in order for you to check that you have understood the material (solutions are given). Only two exercises, the modeling tasks, have to be handed in according to above.

1. Introduction to the Semantic Web

Start by reading the original article from the Scientific American that introduced the Semantic Web back in 2001. Note that this was the original vision of the Semantic Web, but it is certainly not what the Semantic Web looks like today (we'll get back to that later). Nevertheless, this is the vision that is still discussed and debated, and it also gives a good overview of the idea of adding semantics to links etc.

Next, view two videos, part 1 and part 8, from the Semantic Web tutorial that was held at the International Semantic Web Conference in 2008, with their associated slides. These are introductory talks by Jim Hendler (one of the authors of the original article you just read). Although being introductions, the talks are directed at people who know a bit about the Semantic Web already, so don't worry if you don't recognize all the acronyms and languages (if you want, you can go back to these talks later, once we have gone deeper into the languages and details of Semantic Web engineering). Optionally, if you are interested in Semantic Web applications, what they are and how to build them, also watch the video of the talk by Mathieu D'Aquin (part 9), from the same conference tutorial.

A big part of the Semantic Web today is actually manifested by the so-called Web of Data, also known as Linked Data. You already heard a bit about that in the second talk by Hendler. A more non technical introduction to the Web of Data is the famous TED talk by Tim Berners-Lee on Linked Data. This talk is also less technical, hence, if you felt that the above videos were difficult, Tim Berners-Lee might be able to give you a better feeling of what the Web of Data (as part of the Semantic Web) really is. Also see his follow-up talk where he presents application examples. For an alternative, and more technical, explanation of Linked Data watch the first half hour or so of this talk by Tom Heath, who explains what Linked Data is, why we need it, and how it is published. This is from the same conference in 2008 as above. Then read this paper by Bizer, Heath and Berners-Lee that was published in 2009. It contains several nice application examples, although the real explosion of Linked Data applications actually came after this paper was published, i.e. in the the past 2-3 years.

Finally, if you want to get a bit of a more recent perspective (4 years is a long time in this business, and all the above videos are from 2008) you can also watch the keynote talk by Jim Hendler from ESWC2011, with the interesting title "Why the Semantic Web will never work" - and just to reassure you, no, this is not his final conclusion in the end ;) I am sure that if you have heard of the Semantic Web you have also heard someone saying exactly this: "it will never work", or "but the Semantic Web doesn't exist, right?" This talk is more of a discussion and an overview of where we are at the moment, rather than an introduction, so you can also save this for last.

2. Semantic Web Languages and Logics

As you have seen in one of the videolectures, the Semantic Web is based on a number of languages that can be illustrated as a technology stack (sometimes called the Semantic Web layer cake). At the bottom are the original Web technologies and protocols, such as URIs and XML. These are used as the basis also for the Semantic Web, hence, from a technological and language perspective the Semantic Web is an extension rather than a replacement of the original Web. There are then three main languages that are used in Semantic Web applications today:

  • RDF - The Resource Description Framework for representing data, and RDF Schema which is a simple modeling language for ontologies.
  • OWL - The Web Ontology Language for providing more expressive (than RDFS) data models describing RDF data.
  • SPARQL - A query language for querying data expressed in RDF (and OWL).
Other important languages include, for instance, RDFa for embedding RDF data into normal HTML pages, and vocabularies such as SKOS (for bridging the gab between terminologies and ontologies). There are also rule languages and vocabularies for the higher levels of the layer cake being developed, however, we will not treat any of these in this introduction.

Introduction and Overview

RDF is a simple graph data model, where each expression in the language consists of a so-called "triple". A triple consists of a subject, a predicate and an object, where all of those are Web resources identified by a URI (except the object, which can also be a literal). When connecting such triples together we get a graph representing a set of facts - an RDF graph. Such RDF graphs, which in turn may have links between them, are what makes up the Linked Data cloud (i.e. each bubble in the picture you saw earlier is one dataset - one or more RDF graphs). The use of URIs gives us the opportunity to reference individual data elements on the Web.

As Tim Berners-Lee said in his TED talk, raw data is important, but we also need ways to handle that data, manipulate it and describe it. In order to manipulate data we have the SPARQL query language, which in syntax looks dangerously similar to SQL, but which operates on a completely different data model - graphs instead of tables. Finally, in order to describe metadata, i.e. to define the things that exist in our RDF data, their relations etc., we have RDF Schema and the Web Ontology Language (OWL). On the Semantic Web any OWL or RDFS-file is usually called an ontology. Some people might find this use of the term quite sloppy, and would prefer the term "ontology" to be reserved for logical models describing the nature of the real world, and claim that model+data should be denoted a Knowledge Base rather than an ontology, not to mention a file that only contains data (whether or not it has the file extension .owl). However, in practice this is how the term ontology is often used on the Semantic Web, which can be good to know.

In order to understand, and start to use, these languages, first take a look at the OWL introduction by Sean Bechhofer, which is Part 2 of the the Semantic Web tutorial from 2008. Please note that in 2009 a new version of the OWL recommendation was published, which is called OWL2. OWL2 is for obvious reasons not mentioned in the lecture from 2008, however, except for the different OWL profiles (in the lecture: OWL Full, OWL-DL and OWL Lite) the information in the tutorial is still valid, mostly OWL2 adds things to OWL1 rather than replacing things. In the lecture Sean also mentions RDF(S) and motivates why we in some cases need more expressivity and, hence, want to move on to OWL instead.

Exercises:

Based on the talk by Sean that you've seen, try to answer the following:

  1. Draw a small RDF graph with data about yourself.
  2. Why is it difficult to do automated reasoning on RDFS, although it is less expressive than OWL-DL and OWL 2?
  3. What is the open world assumption? In what way does this assumption make an OWL/RDF knowledge base different from a database containing the same information?
  4. What is the unique name assumption? Can you imagine one effect that the lack of UNA may have on reasoning with OWL/RDF knowledge bases - answer through an example.
  5. Assume I have an instance of a class Person called Peter, and an object property called hasTrunk with domain set to the class Elephant. What happens if I add a fact with Peter as subject and hasTrunk as the predicate, and then apply an OWL-DL reasoner to this model?
Answers can be found here.

Description Logics - Brief introduction

OWL is based on Description Logics. If you want to learn OWL just as a modeling language, much as how you would learn a programming language without actually understanding what happens internally when you write a certain instruction, this is possible. However, in most cases it is useful to have a general idea of the underlying mechanisms that gives the language its semantics. To get this background, read the following paper introducing Description Logics (DL) from the Description Logic Handbook published in 2002.

The paper is mainly directed at researchers and logicians, hence, the language can sometimes be a bit formal. If you find the paper difficult, I give you some detailed reading instructions below, to give you an idea of what to focus on (if you are very familiar with logics and/or very interested then you can of course read the whole thing from start to finish):

  • 2.1 can be read superficially, just for getting an overview.
  • Read the introduction to section 2.2, the introduction to 2.2.1, 2.2.1.1 and 2.2.1.2 carefully, so that you understand the way different DLs are built up from adding constructors.
  • 2.2.1.3 read only superficially as information.
  • Read section 2.2.2. carefully until the end of 2.2.2.2.
  • Skip sections 2.2.2.3 and 2.2.2.4.
  • Read 2.2.2.5 superficially, only to understand the notion of specialization.
  • Read 2.2.3 carefully.
  • Read section 2.2.4, but you do not have to understand the details of the different reductions, and you can skip section 2.2.4.2 all together.
  • Skip section 2.2.5.
  • Read the introduction to section 2.3, but you can skip the rest (from 2.3.1 and onwards) if you are not particularly interested in how a reasoner works.
  • Read section 2.4, but skip 2.4.3.
From reading this chapter you should have got a basic idea of the following notions:
  • Concepts, roles and individuals (and their relations to predicates and constants in FOL).
  • A-Box and T-box
  • Satisfiability and consistency
  • Constructors and different DL languages, including role constructors and number restrictions
  • Inclusion and equality
  • Defined concepts and primitive and atomic concepts
  • Subsumption, equivalence, and disjointness
  • Concept and role assertions
  • Unique name assumption
  • Closed and open world semantics

Exercises:

  1. Translate the following Description Logic knowledge base into First-order logic: Exercise 1
  2. Assuming we have a DL knowledge base expressed in OWL containing statements saying that "Bob is John's father" and "everybody has only one father". We now add a third statement saying that "Sam is John's father"...
    • What would happen if we try to reason with this knowledge base under a closed world assumption, e.g., as in a database setting?
    • What would happen if we try to reason with this knowledge base under an open world assumption, as is the normal assumption for OWL?
Answers can be found here.

Modeling in RDF/OWL - Getting started

Although a bit outdated, this paper contains a good introduction to modeling with OWL and what pitfalls to avoid, through a simple pizza example. As a summary of what you can do remind yourself of the tables in Sean's slides (around minutes 29:54-33:53 of Sean's talk - Part 2 of the the Semantic Web tutorial you watched before). Here is also another good summary of what you can actually do with OWL, with a lot of examples and listing common modeling errors.

Now, all you need is a tool to help you to implement your model. There are several different concrete syntaxes for representing OWL, one of which is the RDF/XML serialization of OWL that is often used when storing OWL ontologies online. To view this syntax, for instance, download this file and save it on your computer, then open it in a normal text editor. What you see is a lot of XML-tags, with special namespaces for the OWL and RDF languages. For this course you don't have to learn how to read this syntax, but it is good to know that this is just one of the possible syntaxes in which OWL ontologies can be stored and subsequently retrieved and processed by other tools, such as reasoners.

However, you don't want to have to write this by hand! Just like you use an IDE for programming you want to use a tool to help you model your ontologies as well. There are several tools available, like Protégé, the NeOn Toolkit, and TopBraid Composer. While the latter is a commercial tool (it does have a free version, but without reasoning capabilities, and trial version of the standard edition valid for 30 days) the other two are free to use. If you want to use Protégé for this course you can download it here (choose one of the 4.x versions), and then check out the "Getting Started" and the "Pizzas in 10 minutes" tutorials. If you prefer TopBraidComposer, perhaps because you are used to working in an Eclipse environment, then you can find it here and find a "getting started" document here. One reason to prefer TopBraidComposer is also that it has a built-in SPARQL engine, which you will need for practicing SPARQL later on - therefore I recommend that you start by downloading and trying out TopBraid Composer Standard Edition (if you don't like it, or when the trial period is over, you can move to Protégé). If you are not familiar with Eclipse and have never used an ontology editor before, I have prepared a small video (screen capture) of using TBC that you can watch here (unfortunately the resolution in YouTube is not optimal, apologies for this).

Now you are ready to have a look at your first ontology. Load this ontology into the editor you have chosen, and then try to solve the exercises below.

Exercises:

Answer the following questions about the ontology you just loaded. The idea is to get familiar with the tool, and with how the OWL constructs look in the graphical interface, as well as with the OWL semantics. Use a standard DL reasoner, such as Pellet or HermiT for the reasoning parts (there are other reasoners sent along with the tools that are only focusing on some specific inferences, while these are general, covering all the OWL DL semantics).
  1. What is the type (kind of language element) of the following things:
    • rdfs:comment
    • cat
    • Tibbs
    • has_father
  2. Are there any datatype properties defined in people.owl?
  3. What is the base namespace of people.owl?
  4. How many named classes does the ontology contain? (Hint: you don't have to count "by hand")
  5. Does people.owl import any other ontology?
  6. The class "animal_lover" has an equivalentClass-restriction. Find it in the model, and express it in natural language.
  7. When using the property "has_father" betweent two individuals, the reasoner will infer that also another property holds between them? Which one? Why?
  8. Is Fido a pet? Run the reasoner and check! If Fido is a pet, why is that (list the statements that causes this inference)?
  9. Why is there a problem with the mad_cow class, and what is the problem?
Answers can be found here.
  1. Next, it is time for the first hand-in exercise, and your first experience in modeling on your own. Below, you find a set of sentences that you should try to express in OWL. Try to be as precise as possible, meaning that you should be as explicit as possible when defining concepts (using logical expressions whenever possible) and leave as little as possible up to the interpretation of class and property names. For instance, when creating a class "YoungWoman" we also want to know what the definition of that class is, i.e. what it means to be a young woman, expressed in terms of other model elements. Nevertheless, this should be a small ontology (max 10-12 classes), so try to find a good balance between defining things in detail and limiting yourself to what is actually required to express according to the sentences. As much as possible you should exploit the built-ins of OWL in order to allow reasoning with your ontology, e.g. since there is a construct in OWL for saying that an individual is the same as another one, you should of course exploit that, rather than defining your own relation for that. Model the following:
    • A doll is a kind of toy
    • Young women are defined as young persons that are also female
    • A young person cannot be both a young man and a young woman
    • Young persons are either young men or young women
    • Lenore and Emily are famous dolls
    • All young women play with some doll
    • Young women play only with famous dolls
    • Young men play with at least one toy
    • Clara is a young woman
    • Clara and Laura are different individuals
    • Lalu is the same person as Laura
    • To dress a toy is a special case of playing, where the toy is a doll
    • To be dressed by a young person is the inverse relation of a young person dressing a doll
    • To be ancestor of a person is a transitive relation that holds between persons
    Save your ontology file with the ending .owl in RDF/XML syntax, and once you have finished the subsequent exercises below, send it (in a zip-file) to me.

Querying with SPARQL

Read sections 1-3 and 10 (DESCRIBE is rarely used) of the SPARQL standard from 2008 (W3C recommendation) and read through the small tutorial here (note that this latter tutorial only talks about the SELECT part of SPARQL, which is also the one you will use in the exercises below). If you feel that you didn't really get the hang of it, a quite extensive SPARQL tutorial is also available here and a less extensive one, but with some good examples, here.

Exercises:

Note that I am currently not aware of any plugin for running SPARQL queries in Protégé 4. You should therefore use TopBraidComposer for this part, or potentially the NeOn Toolkit.

  1. Retrieve the people that read the Daily Mirror and has a pet, by submitting the SPARQL query below to the ontology you used previously (people.owl). A colon with an empty prefix refers to the default namespace of the currently loaded ontology in most tools.
    SELECT ?reader ?pet
    WHERE {
    ?reader :reads :Daily_Mirror .
    {?pet :is_pet_of ?reader} UNION {?reader :has_pet ?pet} .
    }
    Would I still need the UNION statement if I first ran a reasoner over the ontology? Why/why not?
  2. Now, write your own SPARQL query that retrieves all the instances of the duck-class.
Answer can be found here as usual.

3. Ontology Engineering for the Semantic Web

In semantic applications in general, and also on the Semantic Web, ontologies can play an important role. There are a lot of ontologies out there today, but often you need a quite specific ontology for your specific application, which means you have to build it for yourself. If there are already ontologies developed in your field, you may consider to reuse them, or just refer to them (if you are interested, check out Part 6 of the Semantic Web tutorial, which talks about semantic interoperability and aligning different ontologies), but in many cases you also have to build something on your own.

There are a lot of methodologies that have been proposed for building ontologies during the past decade. Most of them are inspired by general Software Engineering methodologies, and discuss on a high level what activities you need to perform to construct your ontology. For instance, ontologies (just as software) may have requirements, may be implemented in some logical language, may be tested and deployed etc. However, there is usually still a lack of detailed guidelines for the actual construction process, i.e. how to go about the actual modeling. Some attempts towards more detailed methodologies have been made, e.g. with the eXtreme Design methodology, which tries to apply an XP-like perspective to ontology engineering in order to develop "rapid prototypes", and highly modular ontologies.

Another "tool" that has emerged is the notion of Ontology Design Patterns (ODPs). On different levels of abstraction there exist modeling best practices that can be encoded as patterns. The term Ontology Design Pattern has been used for many kinds of patterns, everything from "macros" frequently implemented in ontology engineering tools to domain-specific components that can be reused directly through importing them in your ontology (a bit like libraries are used by some programming languages). In the ODP Portal you can browse some of these pattern types if you are interested (note that this is a community portal without any guarantees on pattern quality or provenance - unfortunately, there is as of yet no authoritative book on ontology patterns). For the final exercise however, feel free to reuse any of the Content ODPs (under "catalogues" in the left hand menu you find "Content ODPs", which gives you a table listing all that are currently proposed) that you find in the portal, if you find them useful.

For this course you don't need to learn a specific methodology for Ontology Engineering, but it is still good to know a little bit about commonly used things such as Competency Questions, ODPs, and get an introduction to some common methodologies. In this lecture (slides are available separately as well), that I prepared for you, I first briefly introduce the notion of ontology requirements. You will see that the next exercise uses these kinds of requirements to specify your task. Then I introduce, and in particular exemplify, what ODPs can be about, mainly through an example on different ways to model roles. I won't give you an exhaustive list (it doesn't exist) of ODPs and I don't discuss all types, but I think it can be a good perspective to keep in mind when you model later on - to think in terms of requirements and "modeling problems" rather than in terms of lists of things (such as "what are my domain terms, what are my domain relations, what are my ...", which is otherwise quite common). If you want to try out the XD methodology when you solve the exercise, you can pretend to represent one of the design pairs (although you have already got the requirements written down, so you can skip that part) and try to split the requirements into coherent sets (those that are "about the same thing" or touch the same modeling problem go together) and prepare one small ontology (module) for each such set. Then, as soon as you are ready with one piece, you integrate it with the previous pieces, and step by step your overall ontology grows. In case you want to try out this design methodology, you can find a more detailed description of it in this book chapter (link assumes you are on the LiU network), from the book "Ontology Engineering in a Networked World".

Exercises:

This is the second hand-in exercise. Your task is to create an ontology that fulfills the set of requirements below. In addition to the ontology itself, you should hand in a data file with a set of example facts that use (refer to) the ontology you have created. The easiest way to do this is to create a blank ontology, import the ontology you have created and then start adding the individuals and facts (triples about the individuals) there. Once you have that, test your ontology by selecting three of the Competency Questions, rewrite them as SPARQL queries and submit them against your ontology (make sure you have data that should be retrieved among your test facts). Additionally, your test data set should enable at least one inference that shows the fulfillment of the reasoning requirement. Once you have tested this, write a short description of your tests in a text file, where you provide a) your three SPARQL queries with associated expected results (when ran over your data file), and b) what facts that will produce the inference you have tested (input), and what triples will be inferred (output). If you choose to follow the XD methodology, you will generate the tests and the test data incrementally as you go along, but you only have to present three SPARQL queries and one inference in the text document.

Below you are given first an imagined "context", i.e. the setting you should imagine for your work. Next, there is a "story", which exemplifies what kind of information the ontology should store and process. If split into smaller sub-stories, this could be the basis for the XD iterations of individual design pairs. However, you are already provided with the requirements as well, in the form of a set of CQs, a contextual statement, and one reasoning requirement. Please note that your model should focus on the requirements, and not what can generally be assumed from the story or context.

Context: An online music database wishes to semantically represent their data about musicians, albums, and performances, in order to be able to provide a more intuitive user interface and better search functions to their users, i.e. by querying the knowledge base instead of using keyword queries. Below is an example of what they typically would like to store, and at the bottom you find the comeptency questions developed as requirements for the ontology.

Story: music and bands The current configuration of the "Red Hot Chili Peppers" are: Anthony Kiedis (vocals), Flea (bass, trumpet, keyboards, and vocals), John Frusciante (guitar), and Chad Smith (drums). The line-up has changed a few times during they years, Frusciante replaced Hillel Slovak in 1988, and when Jack Irons left the band he was briefly replaced by D.H. Peligo until the band found Chad Smith. In addition to playing guitars for Red hot Chili Peppers Frusciante also contributed to the band "The Mars Volta" as a vocalist for some time. From September 2004, the Red Hot Chili Peppers started recording the album "Stadium Arcadium". The album contains 28 tracks and was released on May 5 2006. It includes a track of the song "Hump de Bump", which was composed in January 26, 2004. The critic Crian Hiatt defined the album as "the most ambitious work in his twenty-three-year career". On August 11 (2006) the band gave a live performance in Portland, Oregon (US), featuring songs from Stadium Arcadium.

Competency questions (CQs) and contextual statements of music and bands

  1. What instruments does a certain person play?
  2. What role does a certain person have in a certain band at a certain point in time?
  3. During what time period was a certain album recorded?
  4. When was a certain album released?
  5. What song is a specific track a recording of?
  6. What is the genre, e.g., rock or pop, of this track?
  7. When was a certain song composed?
  8. What does a certain critic say about a certain record?
  9. When did a certain performance take place?
  10. What songs were played in a certain performance?
  11. Where did a certain performance take place?
  12. In what region is a certain city located?
  13. In what country is a certain region located?
Contextual statement:
  • A track has one or more genres.
Reasoning requirement:
  • I would like to be able to identify (classify) bands into (multiple) genres based on the genres of their recorded tracks, e.g., a rock band has recorded some track in the genre "rock". In this system a band should be classified into a certain genre even if it has only one track of this genre.

If you want to reuse some Content ODPs as components in your ontology, here's a list of potentially interesting ones you might consider to reuse (note that some are specializations and combinations of each other):

Checklist, what to hand in (zip together and e-mail to me by September 12):

  • An ontology fulfilling all the requirements above, i.e. in the form of one or more .owl-files in RDF/XML syntax.
  • Another file containing test data so that I can pose at least three of the CQs as SPARQL queries over this data + the ontology (preferably imported into the data file).
  • A text document where you list 3 SPARQL queries, each corresponding to a CQ, and their expected results + for the reasoning requirement an explanation of what facts will cause the inference and what will be the result.



Page responsible: Eva Blomqvist
Last updated: 2012-08-21