|Project financed by:
Development of Generic Resources for Language Technology
The construction of an open source library for language technology
contains research issues concerning new research roles and how to
transform basic research ideas into reliable usable systems. The
project will follow the same laws for open source development that it
recommends. In particular, we emphasise the use of bottom-up
development starting from existing research prototypes and working
gradually forward. We emphasise the need for executable, but perhaps
incomplete, code from the start.
The starting point of the work is ideas that include basic research
results based on empirical studies, theory and research prototypes.
The existing documentation is mostly done through research articles
and code. From these sources we aim at development of generically
designed resources that will aid future development of end-usage
modules for language technology.
The research issues raised by the proposed work follows three lines:
implementation, system design, and methodology.
Issues of Implementation and System Design
technical research issues are clustered in the following four
Moreover, for all four dimensions there is also a common important
question: to what extent is sharing of results between systems from
different research groups possible? What is a suitable common ground
of open software that different projects can share? These last
questions have, of course, no simple answer but we hope to finally
arrive at a partial answer, at least implicitly through some
successful open source designs, during this project.
- interfaces: how complex should module interfaces be for language
technology modules (LTMs)? What are suitable representations for the
message passing formats?
- modularisation: on what level should modularisation be done?
What kind of LTMs are to be preferred: whole applications or
re-usable code segments? What are the characteristics of a good
- knowledge representation: to what degree can data format be
shared (or standardised) between LTMs? How are format
transformations dealt with? What is a reasonable level of definition
and data typing for the data formats in this context?
- forms of re-use: what LTMs should be made a framework, a tool or
a code pattern? How can we identify when there is a need for a
new tool with its own formalism? When is a framework feasible and
when is the pragmatic solution to accept copying and adjustments of
Issues of Language Technology Development Methodology
For aspects more related to project methodology, we group our
open source development as follows:
Finally, it is interesting to see how well an iterative method such as
ours survive in an open source environment. How can these two
methodologies best be combined into a fruitful synthesis? What is the
best iterative scheme when going all the way from research results,
through open source, into industrial applications? How fast can such
iterations be done?
- publicity: how can the research community best exploit the open
source communication channels for development of stable research
resources? How should these channels best be used for realisation
of research ideas, i.e. for turning an abstract research
architecture into a concrete system design?
- community cooperation: how can an open source community best be
used to increase cooperation between research groups and other
interested programmers such as students?
- user-driven: how can open source best be used to increase
cooperative exchange of language technology ideas between research
Page responsible: Webmaster
Last updated: 2012-05-07