Project financed by:  

Development of Generic Resources for Language Technology

Introduction

Background

Research Issues


Relevance

Project work plan

Project prototypes

System development and design


Results

Members

Links

Summary

Research Issues

The construction of an open source library for language technology contains research issues concerning new research roles and how to transform basic research ideas into reliable usable systems. The project will follow the same laws for open source development that it recommends. In particular, we emphasise the use of bottom-up development starting from existing research prototypes and working gradually forward. We emphasise the need for executable, but perhaps incomplete, code from the start.

The starting point of the work is ideas that include basic research results based on empirical studies, theory and research prototypes. The existing documentation is mostly done through research articles and code. From these sources we aim at development of generically designed resources that will aid future development of end-usage modules for language technology.

The research issues raised by the proposed work follows three lines: implementation, system design, and methodology.

Issues of Implementation and System Design

Our technical research issues are clustered in the following four dimensions:
  • interfaces: how complex should module interfaces be for language technology modules (LTMs)? What are suitable representations for the message passing formats?
  • modularisation: on what level should modularisation be done? What kind of LTMs are to be preferred: whole applications or re-usable code segments? What are the characteristics of a good modularisation?
  • knowledge representation: to what degree can data format be shared (or standardised) between LTMs? How are format transformations dealt with? What is a reasonable level of definition and data typing for the data formats in this context?
  • forms of re-use: what LTMs should be made a framework, a tool or a code pattern? How can we identify when there is a need for a new tool with its own formalism? When is a framework feasible and when is the pragmatic solution to accept copying and adjustments of old code?
Moreover, for all four dimensions there is also a common important question: to what extent is sharing of results between systems from different research groups possible? What is a suitable common ground of open software that different projects can share? These last questions have, of course, no simple answer but we hope to finally arrive at a partial answer, at least implicitly through some successful open source designs, during this project.

Issues of Language Technology Development Methodology

For aspects more related to project methodology, we group our questions on open source development as follows:
  • publicity: how can the research community best exploit the open source communication channels for development of stable research resources? How should these channels best be used for realisation of research ideas, i.e. for turning an abstract research architecture into a concrete system design?
  • community cooperation: how can an open source community best be used to increase cooperation between research groups and other interested programmers such as students?
  • user-driven: how can open source best be used to increase cooperative exchange of language technology ideas between research and industry?
Finally, it is interesting to see how well an iterative method such as ours survive in an open source environment. How can these two methodologies best be combined into a fruitful synthesis? What is the best iterative scheme when going all the way from research results, through open source, into industrial applications? How fast can such iterations be done?


Page responsible: Webmaster
Last updated: 2012-05-07