Good Models of Humble Origins

Paul Bakker
Depts. of Computer Science and Psychology
The University of Queensland

In his submission, Max Coltheart wonders whether "models that train themselves" have any contribution to make to Cognitive Science. He argues that their usefulness is limited because (1) Connectionist networks cannot be interpreted, telling researchers little about the underlying functional architecture, and (2) the strategy found by the network may not be the human strategy.

The first point is certainly true; there are as yet no general techniques for interpreting the function that a trained network has implemented. The second point is also possibly true, even likely. And yet, I believe that Connectionist models can contribute, and have demonstrably contributed, to the discipline of cognitive modelling.

One particular area where such models have made an impact is that discussed by Coltheart in his submission: modelling the process of reading English text aloud. For many years now, a debate has raged as to whether a Dual Route (Coltheart, Curtis, Atkins, & Haller, in press) is required to handle both regular and exception words. Coltheart has consistently advanced the strong claim that, not only is a Dual Route architecture used in human reading, but also that a Dual Route is necessary in any system for it to achieve skilled performance on regular and exception words.

The Seidenberg and McClelland (1989) model, where a 3-layer network - without any explicit Dual Route - was trained to successfully pronounce English text, seriously tested this claim. However, it was found that this model performed poorly on nonwords (Besner, Twilley, McCann, & Seergobin, 1990), leading to the suggestion that it had achieved its performance by simply implementing one route, the lexicon.

Recently the Plaut and McClelland (1993) model has risen to the challenge, achieving near-human performance on nonwords as well as exceptions. Investigations into the network's internal structure (Plaut & McClelland, 1993) so far indicate that it has not implemented a Dual Route, although the actual mechanism whereby it achieves this mapping is still unclear.

So what contribution have Connectionist models made to the modelling of the text-to-speech task so far? The Plaut and McClelland (1993) model has cast serious doubt on Coltheart's strong claim that a Dual Route is necessary to achieve competent performance on reading exceptions and nonwords. That is, this process can apparently be implemented without a Dual Route.

I use the word apparently because, as Coltheart points out, there is as yet no way to determine what functional architecture the network is using. It will be the goal of future research to determine if the network is in fact implementing some kind of Dual Route, or whether it has developed some alternative organization that is able to handle rules and exceptions concurrently.

While it will be difficult to identify exactly what the functional architecture of this network is, it should be possible to confirm, or rule out, that the network has implemented a Dual Route. This could be studied through lesioning the network to determine if double dissociations arise (Bullinaria & Chater, 1993), or by direct inspection of the representations of exception and regular words in hidden unit space (e.g., via dimension reduction techniques such as PCA; Dennis & Phillips, 1991).

The outcome will be important (if not decisive) for the Dual Route debate either way. If a Dual Route is discovered in the network (e.g., by finding that regular and exception patterns are segregated in the hidden unit space), then this will serve as a powerful vindication of the Dual Route theory. Being able to demonstrate that even Connectionist networks settle independently on a Dual Route architecture would strongly suggest that this is the most likely method of handling rules and exceptions within the one system. Furthermore, such a finding might suggest how a Dual Route architecture could be implemented in a parallel distributed system, perhaps even giving clues as to how this might be done in the brain.

If, on the other hand, an alternative functional architecture can be demonstrated, then the stage would be set for a showdown between the two theories. It would be hoped that this new theory would generate predictions about human reading behaviour that would allow researchers to determine if it is this architecture, or Dual Route, which is implemented in the human cognitive system.

Such are (and hopefully will be) the contributions of Connectionism to the Dual Route debate. Connectionist models are also being developed to model another language process, the generation of English past-tense forms (Seidenberg, 1992).

I now turn to Coltheart's second point, that is, his concern that, even though Connectionist models can handle regular and exception words correctly, they may not embody the functional architecture of the human reading process. He points out that the solution found by Connectionist networks is determined by random initial weights and arbitrary design decisions, and that the network opportunistically selects a solution that happens to be convenient in the search space of all possible solutions.

While it may be true that there are multiple minima in weight space, it is probable that many of these implement the same functional architecture. That is, the same mapping can be implemented by networks with different, but functionally equivalent, combinations of weights.

Whether the functional architectures found in this manner by a network are equivalent to those used by humans is naturally the important question, but it is the same question that can be asked of any computational model, including the Dual Route theory. Given the basic premise that all of these models are equally likely (i.e., they do not contradict any known properties of the human cognitive system), then this question can only be answered empirically.

The Dual Route theory does not have an inherent superiority over Connectionist networks, in terms of plausibility as a model, simply because its functional architecture is obvious in the implementation. The fact remains that the actual functional architecture of human reading is unknown, and so any model, no matter how clearly formulated, cannot claim a priori to be closer to the `real' functional architecture of the brain.

The most important criterion for judging between models is to measure how well they account for the human data, in this case, performance on nonwords and exceptions, dyslexia effects etc.

Computational models should therefore be criticised on their ability to replicate human performance, not on their origins. Whether the model is derived by "a cognitive psychologist after reflection upon the relevant empirical data", or by a learning algorithm after exposure to thousands of reading-behaviour examples, should not affect its a priori credibility. Let the data decide.

Implemented Connectionist models have, despite their `humble' origins, already made many important contributions to cognitive modelling. As I have attempted to demonstrate here, they can find solutions previously unsuspected, and can threaten theories previously thought unassailable.

Acknowledgements

Thanks to Janet Wiles and Steven Phillips for thoughtful comments.

References

Besner, D., Twilley, L., McCann, R. S., & Seergobin, K. (1990). On the association between connectionism and data: Are a few words necessary?. Psychological Review, 97 (3), 432-446.
Bullinaria, J. A., & Chater, N. (1993). Double dissociation in artificial neural networks: Implications for neuropsychology. In Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society, pp. 283-288. Hillsdale, NJ: Lawrence Erlbaum.
Coltheart, M., Curtis, B., Atkins, P., & Haller, M. (in press). Models of reading aloud: Dual route and parallel-distributed-processing approaches. Psychological Review.
Dennis, S., & Phillips, S. (1991). Analysis tools for neural networks. Tech. Rep. 207, The University of Queensland. From: Collected Papers from a Symposium on Connectionist Models and Psychology, pp. 102-104.3
Plaut, D. C., & McClelland, J. L. (1993). Generalization with componential attractors: word and nonword reading in an attractor network. In Proceedings of the Fifteenth Annual Conference of the Cognitive Science Society, pp. 824-829. Hillsdale, NJ: Lawrence Erlbaum.
Seidenberg, M. S. (1992). Connectionism without tears. In S. Davis (Ed.), Connectionism: Theory and Practice, pp. 84-122. New York: Oxford University Press.
Seidenberg, M. S., & McClelland, J. L. (1989). A distributed, developmental model of word recognition and naming. Psychological Review, 96 (4), 523-568.