This interpretation appears to deny the distinction between the details of the implementation of a program and either the algorithm that is being implemented or the goals that the computations are designed to achieve (Marr, 1982). This Realist approach to the interpretation of the outcomes of computational modelling seems to me to run the risk of substituting reification of hypothetical constructs with reification of computational models themselves. In my view, the models do occupy the "middle kingdom" in which Latimer feels unsatisified - they are metaphors rather than realistic descriptions of cognitive operations. They are no more inherently accurate or realistic descriptions of processing by virtue of the fact of being realised in the physical form of a program or by being based on elements derived from a neuronal metaphor. The ultimately correct description of cognitive processes will have to be able to be implemented in neurons, but a model is not inherently accurate because it is neurally plausible. There are, moreover, dangers in attributing reality to a model by virtue of neural plausibility because it encourages a literal interpretation that reduces awareness of the assumptions - both theoretically relevant and arbitrary - that necessarily contribute to any computational model.
I will illustrate these issues with an example from my own research area of visual word recognition that parallels the schema/ feature issue discussed by Latimer.
The prototypical symbolic model is the "dual route model" which assumes that reading aloud reflects the outcome of a "race" between a lexical retrieval process and a procedure that assembles pronunciations from abstract spelling-sound rules (Coltheart, 1980). This model epitomises the "sacred cows" of verbal models of visual word recognition: the assumption that word identification reflects independent bodies of knowledge that are operated on by independent procedures (Besner et al., 1990).
PDP models are argued to challenge these "central dogma" of symbolic models by showing that the behaviours assumed to demonstrate the existence of separate lexical and rule-based procedures can be simulated by a standard three-layer back- propagation network that contains neither a lexicon of entries corresponding to individual words nor a set of pronunciation rules. (Seidenberg & McClelland, 1989). This class of models is claimed to "offer an alternative that dispenses with the two- route view" (p. 564) in favour of "a single uniform procedure that learns to process ...letter strings through experience with the spelling-sound correspondences implicit in the set of words from which it learns" (Seidenberg & McClelland, 1989, p. 525).
Thus, the knowledge learned by a PDP network reflects systematic relationships between orthographic and phonological word forms, but these relationships are not expressed as a system of rules. Indeed, according to Seidenberg and McClelland (1989) this is critical to the model's success because "English orthography...is not well captured by mechanisms involving systems of rules" (p. 564). But what does it mean to say that the PDP model does not contain knowledge of rules? To evaluate the validity of this claim it is necessary to specify precisely what does and does not constitute "a rule".
Tests of nonword performance by the Seidenberg and McClelland (1989) model produced very poor generalisation and led to the claim that, if anything, the model implemented the lexical component of a dual route model and that its poor performance on nonwords was proof of the need for a second route (Besner et al., 1990). But a revised version of the model (Plaut & McClelland, 1993) shows good generalisation to nonwords while maintaining accurate performance for words. So, PDP models can show knowledge of rules at the level of generalisation performance. However this is claimed to be an emergent property of the system which arises from a knowledge base that is fundamentally different from symbolic models (Seidenberg & McClelland, 1989). Paralleling Latimer's claims about schema and featrures, the construct of rules is claimed to be both unnecessary and an inaccurate: an "imperfect generalization about the nature of the input and ...what is learned" (p. 549).
Though these models do not contain rules in the strong sense of explicit rule systems, they do demonstrate knowledge of rules in the weaker sense that the relationship between inputs and outputs can be fully specified in terms of component features of the input ( i.e., given input feature P infer output feature Q) so that components of the input have a causal status in determining the output (Davies, 1990). Both formalisms, then, describe input-output relationships in terms of causal connections between the components of orthographic and phonological word forms. In explicit rule systems causality resides in the procedures that abstract, select and apply the rule knowledge while in the multiple level models - and connectionist implementations in general - knowledge and procedures are intertwined:
It is on this dimension that PDP models are argued to differ from symbolic models. Once the model has been trained, repeated exposures to a particular word will give rise to the same pattern of activity and similar words will elicit similar patterns, but there is no sense in which the pattern for a word can be subdivided into constituents nor any necessary relationship between the pattern elicited by the same constituent when it occurs in different words. These models do not, then, contain rules in the sense of a causal relationship between input and output elements. "The output that the model produces for a particular letter string is determined by the properties of all the words presented during training" (Seidenberg & McClelland, 1989, p. 549) and cannot be predicted from the relationship between a fixed set of components that apply to all words. Thus, though PDP models might show apparently rule-governed behaviour, they do not contain rules in the same sense as symbolic models because the distributed patterns of activity elicited by words cannot be decomposed into a set of constituent units that are common across all words and that predict orthographic-phonological relationships.
In particular, the choice of coding scheme might account for the poor generalization shown by the model. Seidenberg and McClelland (1989) claimed successful generalization because error scores derived from the model discriminated between nonwords that people pronounce with differential speed and accuracy (Glushko, 1979), but Besner et al. (1990) showed that generalization was very poor when assessed by the stricter criterion of activation of appropriate output units. Seidenberg and McClelland (1990) claim that these limitations in the model's generalization performance reflect theory-irrelevant limitations of the implemented model - particularly the restricted training corpus and inadequacies in the Wickelfeature coding schemes.
It is unlikely that the limited training vocabulary provides sufficient explanation of the poor generalization because computational implementations of both the explicit rule system of the dual route model (Coltheart et al. in press) and of a PDP model (Plaut & McClelland, 1993) are successful at producing rule-governed pronunciations of nonwords after training on essentially the same training vocabulary.
The Plaut and McClelland (1993) simulation is particularly relevant to evaluating the determinants of effective nonword generalization because it uses the distributed representations and learning algorithm that Seidenberg and McClelland claim to be the theoretically critical aspects of their model. There are two major differences between the two implementations. First, rather than Wickelfeatures, Plaut and McClelland (1993) used a localised input and output coding scheme in which each input and output unit corresponds to a single position-specific grapheme or phoneme. Second, the architecture of the revised model includes interactivity at the hidden and output layers allowing the network to develop "attractors" for recurring patterns of activity such as words. Plaut and McClelland imply that the improved performance for nonwords is primarily due to the inclusion of interactivity and the consequent development of stable attractors: "[the] results demonstrate that attractors can support effective generalization, challenging dual-route assumptions that multiple independent mechanisms are required for quasiregular tasks". However, given the earlier discussion of the sense in which rule-governed behaviour in symbolic models is due to the built-in constituent structure of words, it is equally plausible that credit for the improved generalization performance is entirely due to the coding scheme rather than to the architectural modifications.
By using localised input features corresponding to the symbolic components of written and spoken words Plaut and McClelland have created what can be seen as a hybrid of multiple level symbolic models and the PDP framework. Functionally, the model simulates development of the word level of a multiple levels model containing hard-wired lower level units corresponding to graphemes and phonemes. The model has the opportunity to create distributed representations of words at the hidden unit level, but the structure of the input and output coding schemes encourages development of patterns that reflect the set of common constituent symbols by which words have been coded. Such "componentiality" does appear to be characteristic of what the model learns, particularly for regular words (Plaut and McClelland, 1993).
So the generalisation capability of the Plaut and McClelland (1993) model can be seen as deriving from the same general characteristic as symbolic models: that words are coded in terms of a common set of constituent units. The modelling exercise establishes that a neural net model can learn to respond appropriately to both word and nonword stimuli, but only if it is assumed that words are coded as independent abstract graphemes and phonemes that have equivalent identities across different cases, fonts etc.
I am not trying to imply that Plaut and McClelland (1993) have in some sense "cheated" by using graphemic input units - although it is worth pointing out that attributing the improved generalization performance to the architectural rather than the coding assumptions smacks of the "vagueness and legerdemain" that Latimer attributes to verbal models. The assumption of abstract grapheme and phoneme units may be perfectly valid. It underlies the majority of symbolic models of word recognition and appears to receive independent support from recent applications of brain imaging technologies such as positron emission tomography (PET) to the study of visual word recognition (e.g., Petersen et al., 1991). But the point I am attempting to draw out here is that computational models are no less reliant on such assumptions than verbal models. The elements of a connectionist model provide no more direct insight into the neurophysiological basis of behaviour than the constructs described in verbal models and the validity of the interpretation derived from a PDP simulation requires just as careful an evaluation of the assumptions built into the implementation as is required for a verbal model.
Contrary to the implications of Latimer's discussion, I think that the level of the hardware implementation of a model can and must be separated from the representational and computational levels (Marr, 1982). Failing to recognise these distinctions may lead us to grossly inaccurate conclusions about where credit for the success of a simulation should be assigned and therefore how the performance of the model should be related to brain function. We need to maintain awareness that the computational model is simply a metaphor and use convergent evidence concerning the relationship between the model and empirical data on the one hand, and between the model and brain structure and organisation on the other, to evaluate the validity and implications of the metaphor.
For example, one source of evidence that appears to unequivocally implicate knowledge of rules independently of lexical knowledge is provided by the selective preservation of rule-governed reading aloud in patients with "surface dyslexia" (Patterson, Coltheart & Marshall, 1985). This pattern of performance is claimed to provide incontrovertible evidence against PDP models that do not separate rule and lexical knowledge (Coltheart et al., in press). If knowledge of rules is equated with the constituent structure of word representations, then PDP models are functionally equivalent to multiple level models in this regard, and the critical issue for PDP models becomes whether they are sensitive to the number of words containing a particular component (type versatility) independently of the frequency of the word as a whole (token frequency) (Norris, submitted).
However, it is critical to maintain awareness of the fact that the models are simply metaphors that are inherently no better or worse than verbal models. They are useful because the act of modelling requires refinement of the metaphor and precise specification of how the elements of task map onto the elements of the metaphor embodied in the model. But these mappings do not acquire reality by virtue of their precision.
Unlike Latimer, I do not feel in the least uncomfortable in acknowledging that cognitive models, whether verbal or compuational, lie in a metaphorical middle ground. In fact, I think that this is one of their major virtues in contributing to understanding brain-behaviour relationships. Essentially, cognitive models provide a "common metric" for relating knowledge of behaviour to knowledge of neurophysiology: a level of description that is appropriate and essential for the task of functional analysis of brain-behaviour relationships.
Coltheart, M. (1980) Lexical access in simple reading tasks. In G, Underwood (Ed.), Strategies for information processing. NY: Academic Press.
Coltheart, M., Curtis, B., Atkins, P. & Haller, M. (in press) Models of reading aloud: Dual route and parallel distributed processing approaches. Psychological Review.
Davies, M. (1990) Knowledge of rules in connectionist networks. Intellectica, 9-10, 81-126.
Dell, G. (1986) A spreading activation theory of retrieval in sentence production. Psychological Review, 93, 283-321.
Hinton, G.E. (1986) Learning distributed representations of concepts. Proceedings of the Eighth Annual Conference of the Cognitive Science Society. Hillsdale, NJ: Erlbaum.
Humphreys, G. & Evett, L. (1985) Are there independent lexical and nonlexical routes in word processing? An evaluation of the dual-route model of reading. Behavioral and Brain Sciences, 8, 689-740.
Marr, D. (1982) Vision. San Francisco: W.H. Freeman.
Maze, J.R. (1954) Do intervening variables intervene? Psychological Review, 61, 226-234.
McClelland, J.L. & Rumelhart, D.E. (1981) An interactive activation model of context effects in letter perception: Part 1. An account of basic findings. Psychological Review, 88, 375-407.
McClelland, J.L., Rumelhart, D.E. & Hinton, G.E. (1986) The appeal of parallel distributed processing. In D.E. Rumelhart, J.L. McClelland & G.E. Hinton and the PDP Research Group, Parallel Distributed Processing: Explorations in the Microstructure of Cognition, Volume 1. Cambridge, MA: MIT Press.
Patterson, K. & Coltheart, V. (1987) Phonological processes in reading: A tutorial review. In M. Coltheart (Ed.), Attention and Performance XII. London: Erlbaum.
Patterson, K., Coltheart, M. & Marshall, J. (1985) Surface Dyslexia. London: Erlbaum.
Petersen, S.E., Fox, P.T., Snyder, A.Z.. & Raichle, M. (1990) Activation of extrastriate and frontal cortical areas by visual words and word-like stimuli. Science, 249, 1041- 1044.
Plaut, D. & McClelland, J.L. (1993) Generalization with componential attractors: Word and nonword reading in an attractor network. Proceedings of the 15th Annual Conference of the Cognitive Science Society.
Seidenberg, M. & McClelland, J. (1989) A distributed developmental model model of word recognition and naming. Psychological Review, 96, 523-568.
Seidenberg, M. & McClelland, J. (1990) More words but still no lexicon: Reply to Besner et al. (1990). Psychological Review, 97, 447-452.
Smolensky, P. (1988) On the proper treatment of connectionism. Behavioral and Brain Sciences, 11, 1-74.