Hide menu

Christian Smith

This page is under construction...

I'm a PhD student at the Natural Language Processing Laboratory of the Department of Computer and Information Science. My main supervisor is Arne Jönsson and my secondary supervisor is Henrik Danielsson. I started my PhD-studies shortly after I graduated with an MA in Cognitive Science in 2011.

Research interests

My research interest is vector space modelling, applied to text summarization. I'm also investigating the readability of automatically created summaries and how it can be improved.

Traversing space

The vector space model allows for words in a large text collection get together in a space of N dimensions. Words appearing in the similar contexts will also be "close" to eachother in the space, which can be measured for example by calculating the angle between them. A sentence can be treated as the average angle of the words appearing in that sentence. The same can be applied for larger text units, e.g documents. This way, text units of varying sizes can be measured for "similarity", where a high similarity denotes text units sharing contextual information. This is in some ways analogue to how humans are assumed to process texts.

A summary can be created by looking at how sentences are related to eachother in the space, thereby finding "important" sentences that are inline with what the document is about. These important sentences can then be extracted from the document, to create a summary.

Go!
Go (internal)!

Creating flow

Automatically created summaries are often subject to some abruptness. Especially if the sentences are extracted or cut from a text, much in the way of form and structure of the text can be lost. By finding means of "correcting" the summaries after their creation, coherence and thereby readability should be improved. This includes the search for suitable text units to be "pasted" into the summary, to increase a better "flow" of the text.

Projects

FriendlyReader (swedish)
The creation of a complete service to make texts around the web more accessible. This includes summarization, syntactic rewriting to more easy to read texts and accessible graphical user interfaces.

Webblätläst (swedish)
A web-search allowing search results to be ranked according to readability.

Software

In order to investigate the nature of easy to read texts, summaries and word spaces, a suit of software tools have been developed. These include the CogSum summarization system, the CogFlux simplicity rewriter, and solutions to evaluate these. These will someday be open source.

XML Viewer v0.1
A prototype for converting PDF's to XML, that can be summarized and viewed in a web-broweser.

The PDF-annotator button
Click the button below to upload a pdf-document, wait a couple of seconds, and you have the most important information annotated for you.


Page responsible: Christian Smith
Last updated: 2014-11-10