Peter Grünwald

Address     CWI
            Kruislaan 413
            1098 SJ Amsterdam
            The Netherlands
Office      M266
Telephone   020-5924078 (+31 20 5924124)
Telefax     020-5924199 (+31 20 5924199)
E-mail      pdg@cwi.nl


Eh oui, c'est encore lui!



The name's Peter Grünwald. I am a Ph.D. Student at the Algorithms and Complexity group of the CWI (Dutch Centre for Mathematics and Computer Science), Amsterdam, The Netherlands. My supervisor is prof. Paul Vitányi; my research project concerns Machine Learning and the Minimum Description Length Principle.

Welcome to my home page! It contains some information on me, myself, and I, in relation to You may consider to have a look at my bookmarks for some pointers to pages that are relevant to my research. You can also e-mail me directly).

New!!! I have co-organized a mini-conference called Methods for Model Selection. The main intention of the conference was to keep mathematical psychologists up-to-date about recent mathematical developments in the problem of choosing between competing models for the same data. Click here for the conference announcement.

Research Interests and Running Projects

I belong to the part of the department that is mainly involved in fundamental aspects of machine learning (more about this on CWI's Machine Learning and Genetic Algorithms Project Description Page).

The projects I am currently involved in are not all about machine learning, but they do share a common theme: non-deductive inference . Also, all my work involvesthe same technical framework: the powerful formalism of Bayesian Networks and Conditional Independence; for more about this formalism, you may consider the homepage of the Association for Uncertainty in Artificial Intelligence a community that focuses much of its research on Bayesian Networks.

More specifically, I now work on:

I also used to work on though this work has been completed by now. I now consider these three subjects in turn. Links to my papers are here.

The Minimum Description Length Principle

Most of my time is devoted to studying different aspects, both theoretical and practical, of the Minimimum Description Length (MDL) Principle.

Theoretical Considerations

On the Learning Side, I am interested in the relations between the MDL Paradigm and other theoretical frameworks of learning, such as the PAC and the statistical physics framework. On the Statistical Side, I am interested in the exact senses in which the MDL Principle is optimal; furthermore, in the connections between MDL, Bayesian Inference and Maximum Entropy methods. In general, I am fascinated by all information-theoretic approaches to non-deductive reasoning.

From Theory to Practice

If a learning methodology is theoretically optimal, then this should of course be reflected in practice. I am currently spending much of my time together with the CoSCo group of the University of Helsinki on a project in which we apply theoretically optimal methodologies for learning with real-world data. A large part of this work is still theoretical in nature in that it involves finding explicit closed-form formulas for, or well-founded approximations to, the involved quantities. Recently, we have implemented the Naive Bayes classification model and compared its performance on real-world data for prediction using the Bayesian Evidence, the (new definition of) Stochastic Complexity and the classical Maximum Likelihood method.

Grammar Inference

I have also applied MDL to the computational learning/inferring of grammars from large samples of input text. As for now, mostly samples of natural language are studied (i.e. `corpora'). There are some nice results on using the MDL Principle for the task of classifying words into related classes. This can be seen as a `degenerate' case of grammar learning. This work can be seen as a generalization of several statistical approaches to word classification. In the near future I hope to extend this work to more complex kinds of grammars and other applications (i.e. DNA sequences).

Recurrent Neural Networks

(this is joint work with psychologist Mark Steijvers of Indiana University). In recent years, different types of recurrent neural networks have been used to model human cognitive phenomena. We think that if we want to take these networks as serious candidates for psychological models, then we should be able to compare their representational capabilities to other, more traditional, `symbolic' formalisms. Until very recently, the general belief seems to have been that recurrent networks can only represent regular and maybe some context-free languages in an effective way. Recently, we have come up with a very simple recurrent network having only one parameter that is capable of representing the essential properties of some small context-sensitive language that is not context-free. While this does not prove anything about learnability, it does give new insights on what is representable (and how it is represented) by recurrent networks.

Nonmonotonic Temporal Reasoning

I have developed a new theory of reasoning about action which covers a very broad range of problem classes. (consisting of well-known AI problems like the Frame Problem and the Yale Shooting Problem). Later, I have extended our theory and I have found that what it actually does is applying the Causal Network Theory of Causation (an extension of Bayesian Network theory that has its originis in econometrics and statistics; it is mainly associated with the writings of J.Pearl). It turns out that our theory can be interpreted as a generalization and/or correction of most existing formalisms, including Chronological Minimization, Filter Preferential Entailment, Baker's solution to the frame problem, Lin's `embrace of causality', Morgenstern & Stein's Motivated Action Theory, McCain & Turner's theory of ramifications and qualifications, Lifschitz & Rabinov's causal minimization and Baral and Gelfond's L3-approach. For some of these formalisms, we have been able to actually prove this; for the other ones, we give a reinterpretation of what they do in terms of causal network theory and we give distinguishing examples for which our theory gives better results then they do).

Curriculum Vitae


Publications

Publications on Learning and MDL