nlpFarm Open Source Project
The nlpFarm project is a collection of several tools, libraries, and demo applications for various NLP-related tasks. NlpFarm is hosted at SourceForge. Everything on the farm is Open Source, and developed in Java.
The Greenhouse is a contains NLP software from student projects carried out through various activities of NLPLAB. The projects are typically results from courses or thesis work that is downloadable and executable.
I*Link is an interactive graphical tool for aligning subsentential segments (words, phrases, clauses) in parallel corpora. I*Link can be used to align complete corpora interactively, which means that the tool will make alignment proposals to the user, display these in the graphical interface, both colour-coded and in a table format. The user will then accept, reject or revise the system proposal and automatically move on to the next proposal. All resources that are built during an I*Link session are stored and can be reused in other I*Link alignment projects. The resources can also be used as input for the automatic word aligner I*Trix, thereby serving as a training tool for automatic alignment. Input and output from I*Link come in XML format.
I*Pex is graphical interface to a pattern extraction tool used for information extraction. The tool is built on the same four level architecture as I*Link and I*Trix and use the same XML format for input files. The four levels that can be used for pattern specification are wordform, base form, parts-of-speech and syntactic function. By building so called pattern boxes, the user can interactively build and test collection of patterns that can assign semantic categories to entities in the text. Once semantic categories exist, these can be used for assigning higher-level descriptions such as event descriptions.
I*Trix is the automatic companion to I*Link that can take advantage of the resources created with I*Link. I*Link is a fully automatic word aligner built on a multi-resource architecture, including bilingual dictionaries, bilingual patterns for parts-of-speech and syntactic functions as well as statistical resources. The basic idea is to combine information from different resources and find an optimal alignment for each sentence pair in a sentence-aligned parallel corpus. By switching between training (I*Link) and the automatic mode of I*Trix performance can be enhanced step-by-step.
MPATR Chart Parser
An environment for the specification and parsing of unification-based grammars in the PATR-II formalism. The lexical component is based on a minilexicon approach allowing for the specification of inflectional and regular derivational paradigms. The parser is a chart-parser that can be run in different modes (top-down with or without bottom-up filtering, bottom-up with or without top-down filtering, depth-first or bredadth-first). It is provided with a graphical interface to the chart allowing easy inspection of intermediate and final results.
A recent extension allows the specification of partial grammars that may be combined for robust, partial parsing of text.
Frasse Phrase Extraction Tool
This is a statistics-based tool for the recognition of recurrent sentences and collocations in running text. Word-based user-definable filters enable the recognition of multi-word forms of various types with reasonable high precision and recall.
Contact: Magnus Merkel
DAVE is a Windows-based tool comprising four modules for work with parallel texts: sentence alignment, bi-lingual concordance generation, phrase extraction and analysis of the translations (or source expressions) for recurrent sentences on either side.
Contact: Magnus Merkel
LWA - Linköping Word Aligner
LWA is a word alignment tool taking a bi-text aligned at sentence or paragraph level as input. LWA provides two kinds of results: (i) for each pair of aligned sentences, a partial word alignment, and (ii) a list of link types (i.e. proposed translation pairs). LWA combines a basic statistical approach with "lite" linguistic knowledge of various kinds. It can also align multi-word units. Due to its knowledge-lite approach, LWA can easily be extended for use with a number of different language-pairs; current versions work for Swedish, English and French.
Contact: Lars Ahrenberg
LINLIN and MALIN
LINLIN-II is a Java implementation of a grammar-based dialogue management and specification tool. New features of the system as compared with the old Interlisp LINLIN-system is a high-level language for the common specification of allowable dialogue sequences and internal system actions linked to different dialogue states. MALIN is a multimodal application of LINLIN.
Contact: Arne Jönsson
Page responsible: Lars Ahrenberg
Last updated: 2004-11-25