Abstracts - GSLT seminars Week 43, 2008

Marcus Uneson: Functional transformation-based learning

Transformation-based learning (TBL) (Brill 1995) has been widely used for many natural language processing tasks. It is a simple yet flexible paradigm, which achieves state-of-the-art performance in several areas and does not overtrain easily. It is especially good at catching fixed-distance dependencies in a compact form, which additionally often makes sense to humans.

One major disadvantage of naive TBL implementations is training time. Several solutions have been proposed, including elaborate indexing schemes (Ramshaw & Marcus 1999); explicitly stated independence assumptions (Hepple 2000); and Monte Carlo sampling of the rule space (Samuel 1998).

In this talk, I present an ongoing attempt at a reasonably efficient implementation of TBL in a functional paradigm, based on Florian & Ngai (2001). Typical benefits of the paradigm include higher-order functions and (crucially) referential transparency, offering a wide array of compile-time program transformations and optimizations as well as simplified reasoning.

 

Sara Stymne: Compound processing for statistical machine translation

In many languages, including German and Swedish, compounds are written as single words without spaces or other word boundaries. Compounds are productive and common, which makes them problematic for many applications including statistical machine translation, mainly since they lead to sparse data problems. For translation into a compounding language it is common to find separate words instead of compounds in the translation output.

I will present my work on handling compounds in factored phrase-based statistical machine translation. Compounds are split into their componenet parts prior to training and translation using an empirical corpus-based method. When translating into a compounding language there is also a need to merge compounds.

I will present some experiments performed on splitting and merging German and Swedish compounds and on markup schemes for compound parts. These experiments will form the basis of my licentiate thesis.

 

Back to schedule page.

Page created October 14th, 2008
Latest update October 9th, 2008