Computer Modeling of Cognitive Processes.

Cyril Latimer

Department of Psychology,

University of Sydney.

Introduction

In this paper, I want to do three things:

1. Look briefly at the benefits of computer simulation and connectionist modeling in general;

2. Describe two examples of psychological theory which I believe are in need of modeling, and illustrate the confusions that arise when theorists avoid the discipline imposed by modeling techniques;

3. Consider some of what I regard as problems for computer simulation and neural network techniques currently in use.

The benefits of computer simulation of psychological theories were reviewed and debated at length in the 1960s and 1970s, (Newell, 1973 ; Reitman, 1965; Uhr, 1973) and, more recently, in the context of artificial neural networks, by Quinlan, (1991) so I shall simply summarise rather than dwell upon these aspects of the topic. Frijda (1967) describes how computer programs can provide unambiguous formulations of a theory as well as means for testing the sufficiency and consistency of the theory. Computer simulation may also serve as an heuristic in the search for models; the effort of programming a machine to perform a given task can lead to illuminating psychological hypotheses, even in the absence of behavioural evidence. Just to give an example of the latter, one of my fellow postgrads at the School of Artificial Intelligence in Edinburgh, was building a movement-detection device and noticed that it would occasionally signal movement when no movement was present in the stimulus field Finding no fault in his circuitry or programming, he recreated the stimulus conditions which had produced the movement detection response in the machine for human subjects, and discovered a new illusion of movement, (Lamontagne, 1973).

However, such serendipity is probably rare, and the main benefits of modeling arise from the discipline imposed by attempting to cast a theory within the explicit, logical framework of a machine. If done properly, there can be no room for the vagueness and legerdemain that can often surround the traditional verbal accounts of psychological theory. I would like to argue that many theories in psychology survive, and in some cases remain unfalsifiable, because they have not been subjected to this sort of analysis. John Stuart Mill once said, 'When I am wrong, I want to be wrong in such a way that everyone will know I am wrong." For any psychological theory, Mill's wish is, I believe, fulfilled by answers to the following questions:, "What sort of machine could acquire the behavioural repertoire that the theory is attempting to explain ?" "What would the machine's inputs have to be, and how exactly would they be processed in order to generate appropriate responses ?" Answers to these questions require a clear, detailed and precise delineation of the machine's components, their inter-relationships and the functions they serve. Theories embodied in such a framework are more informative, more easily understood and more vulnerable to experimental test.

Schema and Prototype Theories of Visual Pattern Recognition

To illustrate some of the above points, I shall use the example of schema and prototype theories. These theories have often been applied in the area of memory research, but I shall consider them in the context of visual pattern recognition. The notions of schema and prototype have a very long history in philosophy and psychology, and go back at least as far as Plato's universal forms. Despite this long history, debate continues on what exactly schemata are and to what the term schema refers, so it is important for points I wish to make later in the paper, to look briefly at the history of schema. The mediaeval scholastics addressed the issue under the dichotomy of nominalism and realism - did universals refer to real things or were they mere names ? John Locke's handling of the notion in his doctrine of abstract ideas is in some ways more philosophically sophisticated than its treatment in contemporary psychology (Locke, 1690/1961) . Referring to general or abstract ideas, Locke said,

"...they frame an idea which they find those many particulars do partake in, and to that they give, with others, the name man, for example. And thus they come to have a general name and a general idea, wherein they make nothing new, but only leave out of the complex idea they had of Peter and James, Mary and Jane, that which is peculiar to each and retain only that which is common to all." Vol 2, Book 3, p. 11.

The important point to note here is Locke's use of the words, "wherein they make nothing new", the idea being that an abstract idea is not another idea or entity over and above all the ideas of the single instances, but simply refers to relationships in which the single entities stand. Locke emphasizes this point in other parts of his essay; when constructing genera and species, he notes that "there is no new thing made." His critic, Berkeley, and to a large extent, contemporary cognitive theory, have misinterpreted Locke's original notion, (Berkeley, 1710/1965). Referring to Locke's abstract idea of a triangle, Berkeley notes,

"What more easy than for anyone to look a little into his own thoughts, and there try whether he has, or can attain to have, an idea that shall correspond with the description that is here given of the general idea of a triangle - which is neither oblique nor rectangle, equilateral, equicrural nor scalenon, but all and none of these at once ?", p. 52

Locke, of course, was careful not to fall into the trap of reifying his abstract ideas, but, as I shall argue, not so contemporary cognitive psychology ! Many claim that the term schema was introduced to psychology by Bartlett (1932) who took the notion on from Sir Henry Head the neurologist, and today, in journals and textbooks of cognitive psychology, schema and prototype theories of visual pattern recognition are often contrasted with feature theories. In feature theory, patterns are recognized by way of processes that extract and differentially weight their features or attributes. Indeed, when I have applied for ARC financial support in my investigations of feature theory, I have often been advised by my reviewers that schema and prototype theory would be a much more fruitful context for my research, and as I contemplate yet another year of academic penury, my thoughts, not unnaturally, turn to the exact nature of schemata.

Here then is a perfect test case for computer simulation. Can schema theory be programmed ? Would it be possible to embody the pattern recognition processes proposed by schema theorists in some clearly defined, unambiguous mechanism ? Among the benefits of this exercise would be a possible end to the debate on exactly what schemata are, and a much needed clear distinction between schema and feature theory. Feature theories have, of course, been embodied in many different computer systems. Where then does one begin in this attempt to invest schema theory with mechanism ? One approach, which I call the modern Berkeleyean approach, has been mechanised in the form of template or prototype theory. The approach is Berkeleyean in the sense that the templates, schemata or prototypes are actual entities that are meant to encapsulate the general properties within some pattern class. In the world of banking, such mechanisms have had limited success in the recognition of stylised characters on cheques, but are not taken seriously as explanations of human pattern recognition.

On the surface, the Lockean approach has more to offer; schemata are not entities, but rather descriptions of the relationships in which the members of a class of objects stand. However, attempts to develop a pattern recognition mechanism along these lines have tended to make schema theory suspiciously like - some would say indistinguishable from - feature theory. One early attempt to program schema theory was that of Evans, Hoffman, Arnoult, & Zinser (1968). Evans (1967) defined a schema as,

"...a characteristic of some population of objects. It is a set of rules which would serve as instructions for producing (in essential aspects) a population prototype and object typical of the population." p.87.

So, the schema is not the prototype itself, but rather the rules or instructions for producing one. How could such a theory serve as a blueprint for constructing a pattern recognition device ? Evans, et al., (1968) suggest at the outset of their paper that pattern perception might benefit from attention to small frequently occurring characteristics such as straight lines of various slopes, and make reference to the utility of devices that analyze patterns into component strokes. Since these components may be put together to form more complex schemata such as alphabetic characters, they employ the term subschemata to describe them. The authors then go on to describe a device which represents patterns on an input array of 48 x 48 binary units. and convolves this array with 5 x 5 subschematic operators seeking matches between the operators and the input pattern.

This supposed implementation of schema theory is distinguishable from simulations of feature theory in name only. The patterns for recognition have become schema and their features subschemata. The processes described are those used in the early stages of the Uhr & Vossler (1963) feature-extraction and feature-weighting model. One may well ask where the theoretical processes of prototype construction and the measures of the extent to which individual members of the schema family adhere to the schema rules have gone. Clearly, these vague notions do not fare well within the discipline imposed by simulation. In my opinion, attempted simulations of schema theory in memory research expose the same sorts of difficulties. For example, in McClelland and Rummelhart's (1985) network model of prototype extraction, objects such as DOG, CAT and BAGEL and their names are coded as vectors of binary features or attributes. Given instance-based training, the model supports the Lockean view of schema, by retaining information about the patterns comprising each class and their interrelationships in the weights on connections between units in the network. One may take the view that the set of weighted connections between input units and the output unit for each class constitute the schemata. However, such a position revives an old debate in psychology; are the weighted connections entities in their own right or just relationships in which units stand ? - see Maze (1954) and McCorquodale & Meehl (1948). The message is clear: what may appear to be a plausible theory when presented verbally, can often appear otherwise when attempts are made to embody the processes of the theory in mechanism.

Global Versus Local Precedence.

Yet another venerable binary opposition in experimental psychology is that of global vs local precedence in visual pattern recognition. It is proposed by one class of theorists that the global or holistic attributes of patterns like symmetry, curvedness, jaggedness and angularity are extracted prior to the extraction of local properties such as particular lines, curves angles etc., in local regions of the patterns. Navon (1977), using as stimuli large alphabetic characters constructed from smaller alphabetic characters, argued that the properties of the higher level unit (large character) are processed first, followed by analysis of the properties of lower level units (small characters). For a recent comprehensive reviews of this debate, see Kimchi, (1992).and Treisman (1986). My intention here is not to deny the findings of global-to-local interference and faster responses to global structure - both of which occur under particular experimental conditions. Rather, my intention is to question explanation of those findings in terms of a theory which proposes extraction of global properties prior to analysis of local properties.

A major difficulty in this area is the propensity for theorists to adopt different definitions of the terms whole and part.. Gibson, (1969, p. 89), describing her feature theory, cites evidence (Hubel & Weisel, 1965) in support of her use of discontinuity as a feature. Lockhead (1972, p. 416) arguing for an holistic model, cites the same evidence in support of his notion of blob processing. If these theories are accepted as exemplifying the putative local/global opposition, then the opposition may, in some cases, rest on no more than the theorists' definitions of whole and part, making it difficult for empirical evidence to be brought to bear on the issue. Nonetheless, some theorists genuinely believe that global properties are extracted prior to local properties. The question is, of course, what is the mechanism that could recognize patterns in this way ? Leaving aside the definitional issue of what will constitute a whole and what will be regarded as it's parts, I know of no computer program that is capable of extracting properties like symmetry directly from patterns. In all cases, it is first necessary to compute a rich description of local properties and their relationships to each other prior to the computation of global properties. (Marr & Nishihara, 1978; Fukushima, 1988; Uhr & Vossler, 1963). Even programs that are said to recognize patterns by gestalt methods are, on closer analysis, seen to be deriving global properties from the products of prior local analysis (Uhr, 1959; Guiliano, Jones, Kimball, Meyer, & Stein, 1961; Tunstall, 1975).

One rejoinder to this claim of necessary prior local analysis of patterns is that the brain is simply an analog device (Dreyfus, 1972), and is not to be understood by traditional, scientific, analytic methods. A related rebuttal is the notion that the brain is possessed of smart mechanisms which can extract global properties directly without recourse to local analysis (Pomerantz & Kubovy, 1981; Runeson, 1977). Examples of smart mechanisms are devices like speedometers and planimeters which are said to extract speed, distance and area directly from the environment. Such rejoinders arise from the tendency of some theorists to ignore the relativity of the notions of whole and part (Rescher & Oppenheim, 1955; Wenderoth & Latimer, 1978) and could, in some cases, be said to verge on obscurantism.

For example, one could take the view that the whole brain is a smart mechanism capable of knowing things and, leave matters there. Newton could have viewed the entire solar system as a smart mechanism incapable of analysis and explanation. He did not, and it is to be hoped that experimental psychologists continue to follow his example. While one can, arbitrarily, regard speedometers and planimeters as mysterious black boxes housing continuous unanalyzable processes, one may also, equally arbitrarily, explain their abilities analytically by reference to discrete wheel revolutions causing discrete cable revolutions within discrete distances such as metres and kilometres. One may even take the physicist's view and analyze the mechanism at a molecular level. Similarly, with the visual system, one may adopt views ranging all the way from the Gestaltist holistic conception through neurophysiology to molecular structure. The question is, of course, what analysis, if any, is appropriate at a psychological level ? It is argued by some (Pomerantz, 1978) that proposed units of analysis should have functional or psychological validity; a pixel-level analysis is unsuitable because it would not be used by subjects[1]. How then does one determine what analysis is used by the human visual system ?

Clearly, the argument has come full circle; one must construct clear, precise, logical, explicitly defined theories of the processes to be investigated, and one can be aided in this venture by computer simulation. The difficulty for global-precedence theory is that, while there are many existing mechanisms for the derivation of global properties from the products of prior local analysis, there are, to the best of my knowledge, no working systems that can extract global properties directly from patterns. This is not to say that such systems could not be devised. What the global-precedence theorists need is a demonstration that visual patterns have unconditionally global properties - properties that could not, under any conditions, be derived from the products of any local analysis. Here again, I contend, is another example of psychological theory surviving because it has been shielded from investigative methods which may expose illogicality - the methods of computer simulation.

Difficulties With Current Simulation Methods.

Others in this symposium have been charged with the task of exposing the deficiencies of computer simulation, so I shall not speak at length on these matters. There are, however, three main difficulties I wish to raise: the distinction between theory-relevant and theory-irrelevant components of simulations; how networks are trained; the level of analysis in neural-network simulations.

Theory-relevant and theory-irrelevant components

One of the benefits of computer simulation is the explicit delineation of processes posited in any theory. It is therefore essential that processes in a computer model which are intended as direct simulations of processes posited in the theory are distinguished from those which are mere fixes or arbitrary assumptions adopted to "get things running". Such distinctions are rarely made, and if computer models are to generate hypotheses for test, it is necessary to know just what aspects of the theory, and its embodiment in the model, are involved in predictions about human behaviour and which are not. For example, in neural network models of pattern recognition, the units of analysis and input array resolution are usually quite arbitrary and determined neither by theory nor psychological evidence. Theoretical claims are often compromised by the arbitrary psychophysics assumed. Massaro (1990) cites the example of the TRACE model by McClelland & Elman (1986) who acknowledge that although the features, phonemes and word levels were central to predictions, these may not be the units that are functional. They further assert that their model is still valid if different units are functional ! Massaro (1990 p. 406) summarizes his complaint by quoting an email request he received, "I'm trying to work up a proposal for research in classificatory neural networks. The computational end is easy, but I need an application that will provide the data for experimentation." Not only can the ease of computation preclude psychophysical enquiry, but it is essential that those aspects of the model that are based upon, or require further, psychophysical investigations are portrayed.

Network training

An issue that is often sidestepped in connectionist modeling is the passive training of neural networks. Even the so called "unsupervised" learning systems do not train themselves, and rely on a program to present inputs and generally manage affairs. Human behaviour is characterised by active seeking of stimulation, shifts in attention, and changes in the levels of motivation and arousal. Granted, there have been attempts to model attention and consider the effects of arousal (Mozer, 1988; Schreter & Latimer, 1992), but the usual response to this objection is that one must start with simple problems before attacking the more difficult ones. However, will the solutions developed in simple networks, generalize to the necessarily more complex systems ?

Connectionists often seek correspondence with neurophysiological evidence, but there is already a well established body of psychological evidence that attention, anxiety, arousal and motivation can affect learning and performance on cognitive tasks. If connectionists want psychologists to take modeling seriously, they may have to demonstrate that neural networks can incorporate mechanisms that simulate the effects of these important variables on behaviour.

Level of analysis

Occasionally, students ask me what the units in connectionist systems are meant to represent. Are they meant to represent real neurons or not ? The question raises the familiar Instrumentalist/Realist debate in the philosophy of science, and I, presumably like everyone else, find it extraordinarily difficult to answer: Do theoretical terms refer to real entities or are they mere convenient fictions or instruments of prediction ? In many simulations, the units in the network are meant to refer to real neurons, but in simulations of higher level cognition, this is said not to be the case, even though it must be admitted that the simulated units look suspiciously like and behave like real neurons. Smolensky, (1988) suggests that the level of analysis in connectionist systems is neither symbolic nor neurological, but what he calls subsymbolic. As a realist, I find this suggestion that my network simulations occupy a middle kingdom unsatisfying and unconvincing. The answer I give to students is that I believe my network models of pattern recognition work because, at some level of neurophysiology, similar processes of feature extraction and differential weighting are going on. Like everyone else, I do not know what these neurophysiological processes are. If I did, why would I be modeling ? In the absence of knowledge, one has to theorise and construct models of the processes one wants to study. None of this is meant to deny that cognition can be modeled by information processing diagrams, symbolic systems in computers or, indeed, in terms of molecules and electrons etc. Each of these methods has it's own strengths and weaknesses. Lest I be accused of relativism, let me conclude by saying that I believe that whatever the level of analysis, one accesses the same enduring reality.

Conclusions

Computer models can be useful in uncovering the implications of theory. They provide a necessary discipline in the interpretation and understanding of theoretical processes, and they can function to make psychological theories more falsifiable and more vulnerable to experimental test. Using examples, I think that I have shown how vagueness and illogicality in psychological theories can be revealed.

Modeling is not without difficulty. It is necessary to distinguish between theory-relevant and theory-irrelevant routines in models, and to state clearly where psychophysical evidence has been used to support assumptions and where psychophysical investigation is necessary.

In many cases, current connectionist systems ignore the influence of important factors such as attention, arousal and motivation. To be taken seriously in psychology, I believe that connectionist models need to address these issues.

To what do the units in connectionist systems refer ? I have suggested a realist interpretation of the constituents of models; connectionist systems work because, at some level of neurophysiology, processes similar to those modeled in neural networks are present.

References

Bartlett, F. C. (1932). Remembering: A study in experimental and social psychology. Cambridge: Cambridge University Press.

Berkeley, G. (1710/1965). Principles of human knowledge. In D. Armstrong (Ed.), Berkeley's philosophical writings. London: Collier-Macmillan.

Dreyfus, H. L. (1972). What computers can't do: A critique of artificial reason. New York: Harper & Row.

Evans, S. H. (1967). A brief statement of schema theory. Psychonomic Science, 8, 87-88.

Evans, S. H., Hoffman, A. A. J., Arnoult, M. D., & Zinser, O. (1968). Pattern enhancement with schematic operators. Behavioural Science, 13, 402-404.

Frijda, N. H. (1967). Problems of computer simulation. Behavioral Science, 12, 59-67.

Fukushima, K. (1988). Neocognitron: A hierarchical neural network capable of visual pattern recognition. Neural Networks, 1, 119-130.

Gibson, E. J. (1969). Principles of perceptual learning and development. New York: Appleton-Century-Crofts.

Guiliano, V. E., Jones, P. E., Kimball, G. E., Meyer, R. F., & Stein, B. A. (1961). Automatic pattern recognition by a gestalt method. Information and Control, 4, 332-345.

Hubel, D., & Weisel, T. N. (1965). Receptive fields and functional architecture in two non-striate visual areas (18 and 19) of the cat. Journal of Neurophysiology, 28, 229-289.

Kimchi, R.. (1992), Primacy of wholistic processing and global/local paradigm: A critical review. Psychological Bulletin, 112, 24-38.

Lamontagne, C. (1973). A new experimental paradigm for the investigation of human visual motion perception. Perception, 2, 167-180.

Locke, J. (1690/1961). An essay concerning human understanding. London: J. M. Dent & Sons.

Lockhead, G. R. (1972). Processing of dimensional stimuli: A note. Psychological Review, 79, 410-419.

Marr, D., & Nishihara, H. K. (1978). Representation and recognition of the spatial organization of three-dimensional shapes. Proceedings of the Royal Society of London, B, 200, 269-294.

Massaro, D. (1990). The psychology of connectionism. Behavioral and Brain Sciences, 13(2), 403-406.

Maze, J. R. (1954), Do intervening variables intervene ? Psychological Review, 61, 226-234.

McClelland, J. L., & Elman, J. L. (1986). Interactive processes in speech perception: The TRACE model. In J. L. McClelland, D. E. Rummelhart, & P. R. Group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition (pp. 58-121). Cambridge, MA: The MIT Press.

McClelland, J. L., & Rummelhart, D. E. (1985). Distributed memory and the representation of general and specific information. Journal of Experimental Psychology: General, 114, 159-188.

Mozer, M. C. (1988). A connectionist model of attention in visual perception (Technical Report No. GRG-TR-88-4). University of Toronto, Department of Computer Science.

Navon, D. (1977). Forest before trees:The precedence of global features in visual perception. Cognitive Psychology, 9, 353-383.

Newell, A. (1973). You can't play 20 questions with nature and win. In W. G. Chase (Ed.), Visual information processing (pp. 283-308). New York: Academic Press.

Pomerantz, J. R. (1978). Are complex visual features derived from simple ones ? In E. L. J. Leeuwenberg & H. F. J. M. Buffart (Eds.), Formal theories of visual perception (pp. 217-229). Chichester, England: Wiley.

Pomerantz, J. R., & Kubovy, M. (1981). Perceptual organization: An overview. In M. Kubovy, & J. R. Pomerantz (Eds.) Perceptual organization (pp. 423-456). Hillsdale, N.J.: Lawrence Erlbaum Associates.

Quinlan, P. T. (1991). Connectionism and psychology: A psychological perspective on new connectionist research. Chicago: University of Chicago Press.

Reitman, W. (1965). Cognition and thought:An information processing approach. New York: Wiley.

Rescher, N., & Oppenheim, P. (1955). Logical analysis of gestalt concepts. British Journal for the Philosophy of Science, 6(22), 89-106.

Runeson, S. (1977). On the possibility of "smart" perceptual mechanisms. Scandanavian Journal of Psychology, 18, 172-179.

Schreter, Z., & Latimer, C. R. (1992). A connectionist model of attentional learning using a sequentially allocatable 'spotlight of attention'. In Third Australian Conference on Neural Networks, (pp. 143-146). Sydney: School of Electrical Engineering, Sydney University.

Smolensky, P. (1988). On the proper treatment of connectionism. Behavioral and Brain Sciences, 11, 1-74.

Treisman, A. (1986). Properties, parts and objects. In K. R. Boff, L. Kaufman & J. P. Thomas (Eds.) Handbook of perception and human performance. Vol 2, Cognitive processes and performance. (pp. 35:1 - 35:70). New York: Wiley.

Tunstall, K. W. (1975). Recognizing patterns: Are there processes that precede feature analysis ? Pattern Recognition, 7, 95-106.

Uhr, L. (1959). Machine perception of printed and hand-written forms by means of procedures for assessing and recognizing gestalts. In National Conference of the Asociation for Computing Machinery, Preprint 34 .

Uhr, L. (1973). Pattern recognition, learning and thought: computer-programmed models of higher mental processes. Englewood Cliffs, N.J.: Prentice-Hall.

Uhr, L., & Vossler, C. (1963). A pattern-recognition program that generates, evaluates and adjusts its own operators. In E. A. Feigenbaum & J. Feldman (Eds.), Computers and thought (pp. 251-269). New York: McGraw-Hill.

Wenderoth, P. M., & Latimer, C. R. (1978). On the relationship between the psychology of visual perception and the neurophysiology of vision. In J. P. Sutcliffe (Ed.),, Festschrift in Honour of W.M. O'Neil Sydney: Sydney University Press.

[1] At least two issues are raised by the requirement that the units be functional. First, it is not a good idea to eliminate possible analyses without proper experimental investigation. One can conceive of tasks requiring same/different judgements where the patterns to be judged demand very close and detailed analysis. What the pattern recognition system makes use of as perceptual units may well be determined, in part, by task demands. Second, it is assumed here that the term functional does not mean conscious, although this distinction is not always clear in the literature.