Animacy is an inherent property of the referents of nouns which has been claimed to figure as an influencing factor in a range of different grammatical phenomena in various languages. It is also correlated with central linguistic concepts such as agentivity and discourse salience. Knowledge about the animacy of a noun might therefore be relevant for several different kinds of NLP applications ranging from coreference resolution to parsing and generation.
In this talk I will motivate treating animacy as a disambiguating factor by briefly presenting a corpus study of simple transitive sentences in Norwegian which clearly shows the influence of the animacy of the arguments on word order variation and argument interpretation. A notion of typological markedness makes predictions regarding the linguistic behaviour of different constructions, among others, distributional properties; an unmarked structure will typically be more frequent than its marked counterpart and relatedly, figure in a greater number of linguistic contexts. Even though knowledge about the animacy of a noun clearly has some interesting implications, little work has been done in order to acquire such knowledge automatically. I will therefore go on to present recent work on automatic animacy classification for Norwegian common nouns. In this study we make use of linguistically motivated morphosyntactic features which, in different ways, approximate the multi-faceted property of animacy, and, also result in real distributional differences between the nouns. Several experiments in the classification of Norwegian common nouns along the dimension of animacy will be presented together with results approaching 90% accuracy. In the final part of the talk I will outline future work which further investigates the role of animacy in syntactic disambiguation.
Background reading:
The first parts of this paper outlines the corpus study of Norwegian transitives: http://www.ling.helsinki.fi/kielitiede/20scl/Ovrelid.pdf
The work on automatic animacy classification is presented here: http://www.svenska.gu.se/~svelo/cl.pdf
Whereas full parsing of Swedish often involves a grammar for strings to match, and the generated language has infinitely many trees as its set of results, the level of constellations of primary constituents is different. In sentences without coordination of any primary constituents, the set of constellations of the constituents found in Diderichsen's sentence schema is finite. This allows for attemps at this coarse-grained level that rely more on exclusion and less on matching.
The method used here is licensing of non-primary constituents - particularly finite and non-finite verbs, and sentence adverbials. These 'short constituents' (meaning 'not potentially infinitely long'), when identified on the primary level, provide the starting point for a gap filling exercise.
Ambiguity still present at this primary level includes subject/object ambiguity and PP-attachment ambiguity. For the latter, the use of a valency lexicon is planned.
In this seminar I will address some thoughts and ideas I have regarding the existential construction (hereafter e-construction) in Swedish. Through the study of the e-construction both language specific and language universal questions have been raised.
I will present a case study of agent avoidance in e-constructions (Cavallin, 2005). In this study I concentrated on the more information dense intransitive verbs in e-constructions (i.e. all intransitive verbs besides the prototypical verbs in the existential construction: vara, bliva and finnas). The focus of this paper was the question of agentivity for the post-verbal argument (a.k.a. "the semantic subject" ) in the normal form of an e-construction.
Questions that have arisen from this study are how to deal with the tendency in Swedish to avoid agents in post-verbal position in these constructions. Swedish speakers tend to accept agents in post-verbal position, but they don't seem to produce these sentences (at least not in the corpora used for data collection). We don't want to rule out intuitively grammatically sound constructions, based on their low frequency in corpora. But where do we want to implement these tendencies? In the grammar? In the lexicon? Do we want to implement this kind of restriction at all?
Which conclusions, if any, can we draw from the fact that sparse corpus evidence in Swedish correlates with ungrammatical behaviour in other languages? Can the complete avoidance in other languages and the sparse (basically nonexisting) evidence in Swedish show something about the mental lexicon? Can these observations somehow contribute to the discussion of intuition vs. empiricism as a method of studying syntax and semantics? And what are the consequences for implementing these restrictions in grammars to be used e.g. in translation and multilingual dialogue systems?
References
Karin Cavallin. Agent avoidance in existential constructions: A corpus study of existential sentences in Swedish. Course essay, 2005.
I describe an implementation in Ruby of the turn-taking mechanism suggested by Sacks, Schegloff and Jefferson (1974) by Hulsteijn & Wreeswijk (2003). I also present my own implementation of a similar system in Oz, as well as some extensions and improvements beyond the scope of the article by Sacks et al. (1974).
Matrix dimensionality reduction is a powerful technique in which data is compressed and generalised. It has many applications within natural language processing, the best known of which is Latent Semantic Analysis. In this seminar I will present my thesis work, which comprises the application of the Generalized Hebbian Algorithm to incremental eigen decomposition, its extension to singular value decomposition and the application of these two techniques to LSA-style tasks. Developments of the work, including application to n-gram language model smoothing, will be discussed.
In this presentation I will discuss a contrastive study of technical designation in French and Swedish. Despite their exactitude, compound terms can still be ambiguous and thus correspond to several translations. A Swedish nominal compound often translates into a French prepositional phrase.
The use of the prepositions /à/ or /de/ in such translated terms can sometimes seem equally plausible: "bränslerör--tube *à *carburant", "oljerör--tube *d*'huile". The question is if an analysis of the qualia structure of the Swedish term can predict which preposition to use, this in order to enhance the results in MT and MAHT.
Sidan skapad den 29 augusti 2005
Senaste uppdatering den 5 september 2005.