NLPLAB Project Transmap


NLPLAB Project

TRANSMAP:
From parallel corpora to translation databases

Svensk sida

Goals

The goal of the project is to develop methods and tools for the extraction of classified translation data from parallel corpora. In this way we will extend the use of parallel corpora beyond standard applications such as bilingual concordancing and generation of lexical data by word alignment programs.

The usefulness of the tools will be demonstrated on a corpus of parallel texts. Specific studies relate to the formal characterization of translations and generation of data for computer-aided translation.

Results

Definition of a standard XML DTD for source and target texts and link files.
Part-of-speech tagging of the project corpus and dependency parsing using the Functional Dependency Grammar Parsers of Connexor Oy.
Development, application nd evaluation of an interactive tool, I*Link, for alignment at the word and phrase level.
Development of different sets of alignment guidelines for different applications such as lexicography and machine translation.
Improvement of existing automatic tools such as the Frasse tool for finding collocations, and the word alignment system LWA, by enabling use of syntactic and morphological analyses.

Current work

The final report

Funding agency

The Swedish Research Council 2000-2002.

Participants

Lars Ahrenberg
Mikael Andersson
Magnus Merkel

Master's theses related to the project

Maria Holmqvist: Identifying translation shifts using a dependency parser and interactive word alignment.

Michael Petterstedt: Interaktiv länkning i bitexter - I*Link. (Interactive alignment of bitexts - I*Link).

Publikationer

Ahrenberg, Lars, Magnus Merkel, Michael Petterstedt: Interactive Word Alignment for Language Engineering. Accepted for publication as project note at The 11th Conference of the European Chapter of the Association for Computational Linguistics April 12-17, 2003 Agro Hotel, Budapest, Hungary (EACL-2003).
Magnus Merkel, Michael Petterstedt, Lars Ahrenberg: Interactive Word Alignment for Corpus Linguistics. Accepted for publication in Proceedings of Corpus Linguistics 2003. UCREL Technical Paper No 16.
Lars Ahrenberg, Magnus Merkel, Mikael Andersson: A System for Incremental and Interactive Word Linking. Third International Conference on Language Resources and Evaluation (LREC 2002), Las Palmas, 29-31 May 2002.
Magnus Merkel: Comparing source and target texts in a translation corpus. Presented at the 13th Nordic Conference on Computational Linguistics, NoDaLiDa'01, Uppsala, Sweden.
Magnus Merkel & Mikael Andersson: Combination of contextual features for word sense disambiguation: LIU-WSD. To be published in the Proceedings of the SENSEVAL-2 Workshop, Toulouse. 2001.
Lars Ahrenberg, Mikael Andersson and Magnus Merkel: A knowledge-lite approach to word alignment. In J. Véronis (ed.) Parallel Text Processing: Alignment and Use of Parallel Corpora, pp. 97-116. Dordrecht, Kluwer, 2000.
Lars Ahrenberg and Magnus Merkel: Correspondence measures for MT evaluation. In Proceedings of the LREC 2000 Workshop on Evaluation of Machine Translation, Athens, Greece 29th May, 2000, pp. 41-46.
Magnus Merkel & Mikael Andersson. Knowledge-lite extraction of multi-word units with language filters and entropy thresholds. In Proceedings of RIAO'2000, Collége de France, Paris, France, April 12-14, 2000, Volume1, pp. 737-746.

Latest update: 2003-07-15

Back to The NLPLAB home page.

TRANSMAP:From parallel corpora to translation databases