One basic hypothesis of the project was that an interactive memory-based translation system, i.e. a system providing a terminology database, a database of previously translated units (sentences, phrases and perhaps paragraphs) coupled with a set of tools that derive internal and external recurrence profiles for a given text material would give substantial improvements to the translation process, particularly in speed and certain aspects of quality, such as terminological and stylistic consistency. An important aim of the project was to determine the advantages and drawbacks of memory-based systems and to suggest good designs for them. Another aim was to design and develop other translation tools that fit into a translator's or editor's workbench.
The translation support tools we investigated were of four different categories: 1. Diagnostic tools that characterize texts and text-types in terms of parameters that have a direct bearing on the performance and usability of various computer-supported methods of translation; by applying such tools to a representative sample of texts for a given text-type, a set of text profiles is obtained that reveal characteristics of the text type and that can support decisions as to what kind of computer support should be used in the translation process;
2. Alignment tools that establish correspondences between source and target texts, on various levels such as chapters, divisions, headings, paragraphs, sentences, phrases and words;
3. Data acquisition tools that retrieve data from bilingual corpora, which can be exploited in the actual translation process; and
4. Evaluation tools that are used in the evaluation of translations and checking of properties such as consistency of terminology, variation in phraseology and conformity with a given style-guide.
Another task of the project was to study the effects on the target text of the translation method used and compare translations made by means of memory-based systems with manual translations. This required the development of methods and tools for evaluation of translation. To test design alternatives we need to consider the format and content of the databases as well as the matching algorithms used in database search. In this connection we investigate different methods and tools for data acquisition and the possibility of making database search sensitive to language-independent information encoded in the source document, e.g. semantic or functional properties of a paragraph encoded descriptors in SGML (Standard Generalized Mark-up Language). The analysis tools primarily support the identification of translation units of a text body, where strings as well as linguistically more interesting units such as lemmas, terms, phrases, patterns and constructions are considered. The analysis tools can be used diagnostically in several ways. For example, a text profile can be generated showing what parts of it are covered to what extent by recurrent items at various levels of abstraction, and the recurrent items can be checked for counterparts in an existing translation memory. Both kinds of information are relevant for deciding what efforts and resources are needed for the translation of the given text body.
A prerequisite for the exploration and use of bi-texts is that you have one at your disposal. In a Swedish context, English and Swedish are by far the two most common source and target languages to consider. As there were no such English-Swedish translation corpus available at the start of the project, an important part of it was to create one and align the texts at least at the paragraph level.
The text corpus consists of seven different English-Swedish bitexts, all of which are aligned at the sentence level. About 500 sentences from six of the bitexts have been marked up with parts-of-speech information. Four of the texts are computer program manuals from two companies (1 and 2) where the major difference lies in the method of translation. Two manuals have been translated manually and another two translated with the aid of a memory-based translation tool. In addition, there are two novels and a set of sentences from the ATIS domain that have been translated automatically.
Updated 980417