I*Link - version 1.1

http://www.ida.liu.se/~nlplab/ILink

Contents

  1. What is I*Link?
  2. Requirements
  3. Versions
  4. Licenses and bundled libraries
  5. Obtaining I*Link
  6. I*Link projects
  7. Examples
  8. References

What is I*Link?

I*Link is an interactive tool for creating and storing associations between segments of a bitext: a source text with a translated target text, or other parallel version of the source text. It requires bitexts that are prealigned at sentence level and enables a user to register and inspect word and phrase associations.

Requirements

I*Link is implemented in Java and the current version requires that you have Java 8 installed. It has been tested successfully in Windows, Mac and Linux environments.

Versions

A first version of I*Link was jointly developed by Lars Ahrenberg, Mikael Andersson, Magnus Merkel and Michael Petterstedt in 2000-2001 with funding from The Swedish Research Council (Vetenskapsrådet) and The Swedish Agency for Innovation Systems (VINNOVA). It was implemented by Michael Petterstedt. The current version is a slightly modernized version of that code with exactly the same functionality. It was performed by Niklas Blomstrand within the project Swe-Clarin funded by The Swedish Research Council.

Licenses and bundled Libraries

I*Link is distributed according to the Creative Commons licence Attribution-NonCommercial-ShareAlike (CC BY-NC-SA).

I*Link uses both internal and external libraries. The internal libraries are shipped as part of I*Link and shares therefore the same license. Each external library has its own license.

I*Link use JAXP v1.1 package (The Java API for XML Processing, by Sun Microsystems, Inc., http://java.sun.com/xml/jaxp/) for its capabilities in handling XML-encoded documents through DOM (Document Object Model) and the SAX (Simple Api for XML) parser. The package contains different subpackages named javax.xml, org.w3c.dom, org.apache, org.jdom and org.xml. The software in the packages javax.xml.parsers and javax.xml.transform is covered by the JAXP Reference Implementation License. The software under the package hierarchies beginning with org.w3c.dom is covered by the W3C Software License. All of the remaining software in this distribution is covered by the Apache Software License.

The pattern recognition engine in I*Link is managed by a regular expression package called RegExp v1.1.1 (http://jakarta.apache.org/regexp/). The package is covered by Apache Software Foundation - GNU Public Licence.

The look and feel of I*Link has been improved by the Kunstoff package (http://incors.org/). The package is covered by GNU Lesser General Public Licence (LGPL).

Obtaining I*Link

Download I*Link as a zipped archive from

http://www.ida.liu.se/~nlplab/ILink/ILink.zip

Linux users

Place the zip-file where you want it to be and unzip it. Then start the system by entering into the ilink directory and calling the jar-file:

cd ilink
java -jar ILink.jar

The commands for using I*Link are explained in the Help menu.

Mac-OS users

Place the zip-file where you want it to be and unzip it. Then right-click ILink.jar and agree to Open it.

Windows users

Place the zip-file where you want it to be and unzip it. Then start the system by double-clicking Run iLink.

or open a command prompt window, go to the installation directory and enter

java -jar ILink.jar

I*Link projects

Work with I*Link is organized in terms of projects. A project is defined in a special project file that specifies

The resource files may include bilingual lexicons, or files that define corresponding patterns, e.g. for cognates. The different resources are used to propose associations in the bitext that the user can accept or reject. The different resources are ordered by the user for best performance.

Different bilingual lexicons are used to guide I*Link when proposing the best corresponding links to the user. The user then decides which links that are valid by verifying the proposed links and by specifying new links. This information is then stored by I*Link as a resource for generating better links further on using a machine learning approach.

The output from I*Link consists of link files denoting the verified associations within the bitext as well as different kinds of lexicons. The output can be examined by using the simple built-in tools in I*Link or by being exported to external formats and viewers. The latter option, though, is not supported by any tools in I*Link.

Examples

The example resources bundled with I*Link are bitexts and lexicons that are free to use. The provided bitext sample comes from an extract of John Bunyon's novel "Pilgrim's Progress", published in 1678, together with a translation into Swedish. The source text was downloaded from a site which is no longer available (http://www.johnbunyan.org/text/bun-pilgrim.txt) and can be found at many other sites on the web.

The translation into Swedish, "Kristens resa", can be found electronically at Project Runeberg's site for Nordic literature:

http://www.lysator.liu.se/runeberg/kristens/

The translation was made by G.S. Löwenhielm and published in 1903.

The sample bitext consists of 151 sentence-aligned segments which have been analysed with the Machinese Syntax Analyzers for English and Swedish, by courtesy of Connexor Oy, Finland. The analysed texts are provided in XML format, consistent with the bundled DTD: LIU-MONO.DTD. More information about Connexor's tools is available at

http://www.connexor.com/

Lexicons

The bundled  lexicons can be used when the source language is English and the target language is Swedish. Lexicons are of three different types; Dynamic, Static and Pattern ones. The dynamic lexicons are normally defined by you for a specific project. However, the shipped dynamic lexicons are either default resources which are included when you create a new default project or resources which belong to the example projects provided with I*Link.The static lexicons were constructed by empirical data from previous alignments of bitexts at the department.

The shipped static resources are:

The pattern lexicons are:

References

Lars Ahrenberg, Mikael Andersson & Magnus Merkel (2000) A knowledge-lite approach to word alignment. In J. Véronis (ed.) Parallel Text Processing: Alignment and Use of Parallel Corpora, pp. 97-116. Dordrecht, Kluwer, 2000.

Lars Ahrenberg, Magnus Merkel & Mikael Andersson (2002). A system for incremental and interactive word linking. In Proceedings from The Third International Conference on Language Resources and Evaluation (LREC-2002), Las Palmas, 2002, pp. 485-490.


Contact: Lars Ahrenberg (name dot name at liu dot se)