VR Project: Multilingual extraction and term structuring
The goal of this project is to develop new methods and systems for term extraction and term structuring from multi-lingual document collections. The work in a term extraction and structuring system can be divided into the following four basic tasks:
- Term recognition
- Term alignment
- Generation of concepts (e.g. sets of synonymous terms)
- Recognition of semantic relations between concepts, in particular hyponomy and co-hyponomy.
Current methods, including our own, usually combine linguistic and statistical data in some way. In this project we investigate such combinations but also look at their integration with other methods, such as algebraic methods and semantic mirroring. While the creation of multi-lingual data is important in itself, the project will test the hypothesis that multi-lingual parallel data actually offer an advantage over mono-lingual term extraction and structuring. Several works in related areas such as word alignment and word sense disambiugation give support for this hypothesis. Thus, we will also avaluate the multi-lingual methods developed in the project on mono-lingual text collections and compare their performance with state-of the-art methods for monolingual data.
NOTE: This page is subject to change. This version was last updated September 29, 2010.
Page responsible: Magnus Merkel
Last updated: 2010-09-29