Description of PhD Student Project

Topic: Integrating Heterogeneous Graph-Data Systems

In recent years we are witnessing an increasing industry interest in technologies for processing and managing graph data, where the term graph data refers to data that describes entities and the relationships between them in the form of a graph. Typical examples include data about transportation networks, protein interactions, biological food chains, social networks, bibliographic networks, topic maps, and knowledge bases. The diversity of emerging types of graph queries and graph analytics tasks calls for a broad variety of different data management and data processing techniques. As a consequence, there exist various types of graph-data systems that all specialize on particular classes of use cases.

The inherent specialization and the diversity of graph-data systems bear a high potential that only a combination of multiple systems can sufficiently address all the graph query and graph analytics use cases within an organization or enterprise. In other words, we foresee the emergence of many scenarios in which different (or even the same) collections of the graph data within an organization or an enterprise may be managed in multiple separate graph-data systems. Certainly, the same holds across organizations. In such scenarios it becomes necessary to integrate individual graph-data systems into a broader, organization-wide or even cross-organization system. A prerequisite to this end is that graph-data systems of different types can interoperate with one another.

While there exist a few software tools that implement ad-hoc solutions usable for some aspects of this interoperability challenge, there is no systematic work to provide well-understood algorithms and data management techniques that can be employed in such scenarios. The overall goal of the project is to fill this gap. More specifically, the aim of the project is to develop and investigate approaches, including algorithms and techniques, to integrate graph-data systems as members of a federated system that can perform workloads of queries and analysis algorithms transparently over the federation members.

To work on this project it is required to have background in both database technologies and Web technologies (i.e., having taken courses or other relevant experience), including a basic knowledge of graph databases and of techniques for building client-server Web applications. Other important properties for working on this project are good programming and problem analysis skills, and a desire to implement solutions into real-world systems and perform comprehensive experimentations.

Related work by our group: We have published first results that establish solid theoretical foundations related to the topic of the PhD project [1,2] and we are planning to continue this line of research. The PhD project is meant to complement this work from a systems research perspective. Other related projects that we are currently working on are i) the development of a performance benchmark for GraphQL server implementations and ii) the EU-funded SPIRIT project that aims to provide software tools for law-enforcement agencies to investigate connections between entities in various openly-accessible online data sources. Regarding the latter, our main contribution is the development of a novel storage and processing architecture for graph data as a basis for the SPIRIT tools, and the PhD student gets the opportunity to be involved in the conceptual and technical development of this critical component.

Collaborations and community: The supervisor has a wide international network of scientific collaborators that may be used by the PhD student to establish connections and to collaborate on specific aspects related to the PhD project. Additionally, the supervisor has contacts at some vendors of graph-data systems (such as Amazon's Neptune, Ontotext's GraphDB, Cambridge Semantics' AnzoGraph, Neo4j, ArangoDB) which may be leveraged to identify opportunities to apply some results of the PhD project in practice.

Official job announcement (including link to application form): https://liu.se/en/work-at-liu/vacancies?rmpage=job&rmjob=12147&rmlang=UK

[1] Olaf Hartig: Foundations to Query Labeled Property Graphs using SPARQL*. In Proceedings of the 1st International Workshop on Approaches for Making Data Interoperable (AMAR), Karlsruhe, Germany, September 2019.

[2] Olaf Hartig and Jan Hidders: Defining Schemas for Property Graphs by using the GraphQL Schema Definition Language. In Proceedings of the 2nd Joint International Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA) (GRADES-NDA'19), Amsterdam, Netherlands, June 2019.

Page responsible: Olaf Hartig
Last updated: 2019-10-07

IDA - Department of Computer and Information Science

Description of PhD Student Project