Examensarbeten och uppsatser / Final Theses
Framläggningar på IDA / Presentations at IDA
If nothing is stated about the presentation language then the presentation is in Swedish.
Due to current distance mode thesis presentations during spring of 2020 will take place online. See more information on the page for online presentations (also link in the menu). If password is required to access the online presentation, please contact the examiner (type in the examiner's name in the search bar in the top right, and choose "Sök IDA-anställda" in the menu).
- 2020-08-25 kl 13:00 i https://liu-se.zoom.us/j/61973717243
An Analysis of Notions of Differential Privacy for Edge-Labeled Graphs
Författare: Robin Christensen
Opponent: Aron Gosch
Handledare: Patrick Lambrix
Examinator: Olaf Hartig
Nivå: Avancerad (30hp)
The user data in social media platforms is an excellent source of information that is beneficial for both commercial and scientific purposes. However, recent times has seen the topic about privacy gaining attention which has led to higher demands. With accurate statistical research data being just as important as the privacy of the user data, the relevance of differential privacy has increased. The purpose of differential privacy is essentially to hide the participation of individuals in databases, making it possible for the individuals to deny information. Differential privacy allows user data to be accessible under certain privacy conditions at the cost of accuracy in query results, which is caused by noise. The noise is based on a constant ε and the (global) sensitivity of a query. The query sensitivity is said to be the greatest possible difference in query result between the queried database and a neighboring database. Where the neighboring database is commonly defined to differ by one record in a tabular database, there are multiple neighborhood notions for edge-labeled graphs. This thesis considers the notions of edge neighborhood, node neighborhood, QL-edge neighborhood and QL-outedges neighborhood. To study these notions, a framework was developed in Java to function as a query mechanism for a graph database. ArangoDB was used as a storage for graphs, which was generated by parsing data sets in the RDF format as well as through a graph synthesizer in the developed framework. Querying a database in the framework is done with Apache TinkerPop, and the Laplace distribution is used when generating noise for the query results. The framework was used to study the privacy and utility trade-off of different histogram queries on a number of data sets, while employing the different notions of neighborhood in edge-labeled graphs. The level of privacy is determined by the value on ε, and the utility is defined as a measurement based on the L1-distance between the true and noisy result. In the general case, the notions of edge neighborhood and QL-edge neighborhood are the better alternatives in terms of privacy and utility. Although, there are indications that node neighborhood and QL-outedges neighborhood are considerable options for larger graphs based on utility measurements.
Page responsible: Ola Leifler
Last updated: 2020-06-11