Goals
The goal of the project is to develop methods and tools for the extraction of classified translation data from parallel corpora. In this way we will extend the use of parallel corpora beyond standard applications such as bilingual concordancing and generation of lexical data by word alignment programs.
The usefulness of the tools will be demonstrated on a corpus of parallel texts. Specific studies relate to the formal characterization of translations and generation of data for computer-aided translation.
Results
- Definition of a standard XML DTD for source and target texts and link files.
- Part-of-speech tagging of the project corpus and dependency parsing using the Functional Dependency Grammar Parsers of Connexor Oy.
- Development, application nd evaluation of an interactive tool, I*Link, for alignment at the word and phrase level.
- Development of different sets of alignment guidelines for different applications such as lexicography and machine translation.
- Improvement of existing automatic tools such as the Frasse tool for finding collocations, and the word alignment system LWA, by enabling use of syntactic and morphological analyses.
Current work
Funding agency
The Swedish Research Council 2000-2002.
Participants
Lars Ahrenberg
Mikael Andersson
Magnus Merkel
Master's theses related to the project
Maria Holmqvist: Identifying translation shifts using a dependency parser and interactive word alignment.
Michael Petterstedt: Interaktiv länkning i bitexter - I*Link. (Interactive alignment of bitexts - I*Link).
Publikationer
- Ahrenberg, Lars, Magnus Merkel, Michael Petterstedt: Interactive Word Alignment for Language Engineering. Accepted for publication as project note at The 11th Conference of the European Chapter of the Association for Computational Linguistics April 12-17, 2003 Agro Hotel, Budapest, Hungary (EACL-2003).
- Magnus Merkel, Michael Petterstedt, Lars Ahrenberg: Interactive Word Alignment for Corpus Linguistics. Accepted for publication in Proceedings of Corpus Linguistics 2003. UCREL Technical Paper No 16.
- Lars Ahrenberg, Magnus Merkel, Mikael Andersson: A System for
Incremental and Interactive Word Linking. Third International
Conference on Language Resources and Evaluation (LREC 2002), Las
Palmas, 29-31 May 2002.
-
Magnus Merkel: Comparing source and target texts in a translation
corpus. Presented at the 13th Nordic Conference on Computational Linguistics,
NoDaLiDa'01, Uppsala, Sweden.
- Magnus Merkel &
Mikael Andersson: Combination of contextual
features for word sense disambiguation: LIU-WSD. To be published in the
Proceedings of the SENSEVAL-2 Workshop, Toulouse. 2001.
- Lars Ahrenberg, Mikael Andersson and Magnus Merkel: A knowledge-lite approach to word alignment. In J. Véronis (ed.) Parallel Text Processing: Alignment and Use of Parallel Corpora, pp. 97-116. Dordrecht, Kluwer, 2000.
- Lars Ahrenberg and Magnus Merkel: Correspondence measures for MT evaluation. In Proceedings of the LREC 2000 Workshop on Evaluation of Machine Translation, Athens, Greece 29th May, 2000, pp. 41-46.
-
Magnus Merkel & Mikael Andersson. Knowledge-lite
extraction of multi-word units with language filters and entropy thresholds.
In Proceedings of RIAO'2000, Collége de France, Paris, France, April
12-14, 2000, Volume1, pp. 737-746.
Latest update: 2003-07-15
Back to The NLPLAB home page.
|