Three connectionist models of single-word reading will be compared: McClelland and Rumelhart (1981, hereafter MR81), Seidenberg & McClelland (1989, hereafter SM89), and Plaut, McClelland, Seidenberg, and Patterson (1996, hereafter PMSP96).
Although they are all models of single-word reading, they were designed to account for rather different aspects of word reading:
MR81: Word superiority effect in letter perception.
SM89: Word naming and lexical decision.
PMSP96: Word and nonword naming.
In the following sections, each model will be reviewed briefly, with difference in network structure, strengths, and weaknesses emphasized.
MR81 was developed before the rediscovery of the back-propagation algorithm (Rumelhart, Hinton, & Williams, 1986). Compared with SM89 and PMSP96, MR81 has more localized representations, and there is no learning component in the model.
MR81 has three layers: features (input), letters (output), and words ("hidden"). Feature nodes activate or inhibit letter nodes, and letter nodes activate or inhibit word nodes. Word nodes mutually inhibit each other and send feedback to letter nodes. The activation of letters is transformed into responses by Luce's choice rule so that comparisons between the model's performance and empirical data can be made
Strengths. MR81 successfully accounted for results in the literature on word superiority effect in letter perception. Predictions made by the model have also been verified by empirical experiments. Furthermore, even if the model does not contain pronouncing rules, it accounted for the data on the effect of pronounceability quite well. Finally, the method of converting the model's output into responses is transparent and logically understandable.
Weaknesses. There is one piece of data that MR81 did not account for. The superiority effect for pseudoword actually depends on subjects expectation (Carr, Davidson, and Hawkins, 1978), and MR81 cannot simulate their result. MR81 also has some problems. It is limited to four letter words using a rigid encoding scheme, it does not learn, and it does not apply to major word recognition tasks, such as naming and lexical decision.
SM89 is a three-layered back-propagation network, with 400 orthographic input units, 200 hidden units, and 460 phonological output units. In addition, there are also 400 of orthographic output units.
SM89 used Wickelgren's (1969; cited in Seidenberg & McClelland, 1989) triples scheme to represent a word's orthographic and phonological content. In essence, there is a unit for every possible combination of phonemes or letters. The representation is an unordered set of such units. The representation used at the phonological level is a set of wickel-features. The representation used at the orthographic level, however, is more complicated.
The training set consisted of 2897 monosyllabic words. SM89 was trained on logarithmically compressed frequencies, rather than actual frequencies of words. The phonological or orthographic error score, which is the sum of squared differences between the target activation value for each phonological or orthographic unit and the actual activation computed by the network, is used as an indirect measurement of reaction time and accuracy.
Strengths. The most successful aspect of SM89 is that it demonstrates that a single computation that takes spelling patterns into phonological codes is sufficient to account for naming of both regular and exceptional words. This is a big challenge to traditional, dual-route models of naming.
Weaknesses. First of all, the use of compressed frequencies distorted the actual frequency distribution of the words in the training set. Secondly, the wickel-features used, although can be viewed as an improvement to the rigid four-letter encoding in MR81, still lack of psychological or linguistic reality. Thirdly, the model's performance on nonwords reading was poor. Fourthly, the model is not so good at lexical decisions. Using orthographic error score as the basis of discriminating words from nonwords, its performance is worse than that of human subjects in many conditions (Besner, Twilley, McCa, & Seergobin, 1990) Finally, the model does not general actual reaction times. The use of error the score as an indirect measurement is unpersuasive.
The PMSP96 model (Simulation 3, the attractor network) is a three-layered, feedforward recurrent network with 105 grapheme input units, 100 hidden units, and 61 phoneme units. Each input units is connected to each hidden unit, and each hidden unit is connected to each phoneme unit. In addition, each phoneme unit is connected to each other phoneme unit, and each phoneme unit sends a connection back to each hidden unit. The weights on the two connections between a pair of units (hidden and phoneme) are trained separately. There is no feedback from hidden units to grapheme units.
The orthographic and phonological representations are more condensed than those of SM89. The phonological representation consists of three groups of units: all possible onsets (23), all possible vowels (14), and all possible codas (24). Reading out a pronunciation from this representation involves simply concatenating the phonemes that are active in sequence from left to right, including at most one phoneme per mutually exclusive set. The 105 orthographic units are constructed by the same principle.
The network is trained with a version of back-propagation designed for recurrent networks, known as back-propagation through time. The idea is that settling time in a recurrent implementation can be used as a more direct measurement of naming latencies. The training corpus consists of 2998 monosyllabic words, and they are trained on actual word frequencies. The frequency value of each word is used to scale with weight changes in duced by the word.
Strengths. The PMSP96 model successfully simulates the standard effects of frequency and consistency found in empirical studies. In addition, its ability to pronounce nonwords is comparable to at of skilled readers. Therefore, it demonstrates that a single mechanism (instead of dual-route) is sufficient to account for naming of regular and irregular words and nonwords. The representations are more psychologically and linguistically real. The representations are also more distributed than those of SM89. Finally, actual reaction times are generated by the model.
Weaknesses. Unlike SM89 which simulates pseudohomophone effects in lexical decision, the PNSP96 model does not simulate any lexical decision data at all.
Psychologists interested in the role of morphology in lexical processing during reading ask the following questions:
Lexical representation: Are morphologically complex words represented as unanalyzed full forms or does the representation reflect their morphological structure?
Lexical processing: Are morphologically complex words processed as unanalyzed full forms or are they decomposed (parsed) into sub-lexical units according to their morphological structure prior to identification?
These two questions are not independent. Even if a researcher wants to investigate lexical processing issues, he or she still has to make appropriate assumptions about how morphologically complex words are represented in the mental lexicon. In answer this question, it is crucial to distinguish claims about the lexical entry (modality-independent representation of a word's lexical information) for a given word from claims about its access representation (modality-specific perceptual target for word identification) (Marslen-Wilson, Tyler, Waksler, & Older, 1994). However, "psycholinguistic research into morphologically complex words often failed to maintain this distinction, making it hard to sort out whether claims and evidence for full-listing or morpheme-based accounts apply to the access representation, the lexical entry, or both" (Marslen-Wilson et al., 1994, p.4, emphasis added).
The above observation is also true in Chinese. Many researchers studying lexical processing in Chinese have inappropriately assumed that characters are morphemes. It is certainly true that most characters map onto morphemes. However, the relationship is simply not one-to-one. Therefore, characters themselves are not morphemes and cannot be viewed as unambiguous symbols for morphemes. This distinction is particularly difficult for Chinese psychologists to maintain because of the perceptual and semantic salience of the characters.
Another important issue is morphological category. In English, two categories of morphological structure has been extensively studied: inflectional and derivational. In English, inflectional morphology has a primarily grammatical function. The inflectional suffixes usually do not change the form of their stems. Derivational morphemes, on the other hand, alter the meaning and often the syntactic form class of the base forms to which they are attached.
In Chinese, the important morphological categories differ from those of English. Mandarin Chinese has very few affixes. Most multi-morphemic words are compounds. As a result, researches on the role of morphology in lexical representation and processing in Chinese have also focused on compounds.
Despite the big difference in morphology between English and Chinese, the semantic transparency of morphologically complex words is equally important in studying both languages. In English, Marslen-Wilson et al. (1994) have found that semantically opaque forms behave like monomorphemic words. The same logic applies to Chinese as well. However, the relationship between the meaning of a compound and those of its constituents is more complex and can vary from close to nonexistent. This situation makes it difficult to study the role of semantic transparency in morphological representation and processing in Chinese.
There are several semimorphological markers that are unique to Chinese: the object marker ba, and the passive marker bei. The special linguistic status of these markers is particularly useful not only in understanding lexical and sentence processing in Chinese, but also in testing general models of language processing (e.g., Li, Bates, & MacWhinney, 1993).
Now let us turn to word identification. As mentioned earlier, the characters map onto spoken language at morphemic level. The constituents of written Chinese compounds are perceptually salient and their morphological structure is rather transparent to the readers. This is quite different from English orthography in which there are no physical boundaries for strings that represent morphemes. On the other hand, the lack of word boundaries in written Chinese also makes the words less salient. In sum, the perceptual properties of Chinese orthography make "morphological decomposition" in word identification more likely to occur.
In English, the role of morphology in learning to read has been under-represented in the reading literature. Instead of studying the role of morphology in learning to read, most researchers study morphology acquisition. These questions are typically asked: How do children acquire morphological rules? Are there "stages" in acquiring different morphological rules? Of course, these questions are universal that can be applied to Chinese as well.
In English, children begin to see inflected forms, such as past tense verbs, very early. They must be able to parse to strings and recognize meaningful parts (i.e., stems and suffixes) in order to recognize the word. One would expect that teaching beginning readers how to analyze words would help them recognizing words and acquiring vocabulary. However, results from reading instruction studies are inconclusive.
What about Chinese? Parsing a compound into meaningful constituents is not a problem, since they are already segmented physically. However, word boundaries are absent from the text. Learning new words in reading Chinese, therefore, is quite different from learning new words in reading English.
What about reading instruction? Since the compounds are made of meaningful constituents (either stems or bound stems), should the teachers explicitly explain the meanings of constituents and how they relate to the meaning of the compound? Or, should the teachers just treat the compounds as character strings and let the children to learn the compound-constituent relationship implicitly? I suppose the best instruction method should lie somewhere between the two extremes.
The number of orthographic neighbors of a word that can be generated by changing only one letter in the stimulus word to another letter, preserving letter positions. Most current models of visual word recognition involve a candidate activation/generation stage in which the visual input activates, or contacts, orthographically similar words (i.e., neighbors) in the lexicon. Based on this assumption, it is reasonable to expect that number of neighbors should have a negative effect on the subsequent candidate selection process. However, Grainger, O'regan, Jacobs, and Segui (1989) have observed that the important factor is not the number of neighbors, but rather the frequency of these neighbors relative to the stimulus word. The number of orthographic neighbor(s) with higher frequency relative to the stimulus word has a negative effect on the lexical decision latency to the stimulus word. They found that when a stimulus word had at least one higher frequency neighbor, the lexical decision performance was negatively affected. This is the basic neighborhood frequency effect. Grainger et al. (1992) subsequently observed that words differ from a more frequency word by the fourth letter showed strong interference effects, whereas words differ by the second letter did not show significant interference. They also observed that when participants initially fixate the critical disambiguating letter of the stimulus word, interference effects are significantly reduced. The reduced interference was on the second letter, rather than the fourth letter, of the five-letter stimuli. They concluded that the effects of competition between lexical representations can be modified by the relative visibility of the stimulus word's letters and that the positions of disambiguation letters interact with their visibility.
Since the majority of Chinese words are multi-character words, the same concept of neighborhood can certainly be applied to the reading of Chinese if we define the orthographic neighbors as those can be generated by changing only one character in the stimulus word to another character, preserving character positions. Indeed, strong (but indirect) evidence for the neighborhood frequency effects has been observed in my master's thesis (Tsai, 1994). The frequency of constituent characters of two-character words was manipulated. It was found that constituent frequency had negative effects in lexical decision latencies, especially when the two constituent characters were presented sequentially. Furthermore, the negative effect of constituent frequency increases as the stimulus onset asynchrony (SOA) between the two constituents increases. This is apparently an evidence for the effect of competition among neighbors. In a follow-up analysis, it was found that the two groups of words did differ in the total number of neighbors, and in the number of high-frequency neighbors. The SOA effect suggests the negative effect of character frequency was more likely to be caused by competition among those high-frequency neighbors. Note that in this particular study, word frequency was also manipulated. Low-frequency words, in general, have more higher frequency neighbors than high-frequency words do. However, there was no word frequency by character frequency interaction. Therefore, the important factor seems not be the number of neighbors with relatively higher frequencies. Rather, what is important is the number of high-frequency neighbors whose frequencies were higher than some fixed criteria (the same criteria used to select high-frequency stimulus words).
Part of the question is similar to, but not equivalent to, an old question: what are the relative strengths and weaknesses of representative design, compared to linear experimentation (Berkowitz & Donnerstein, 1982; Petrinovich, 1979)?
One of the most obvious strengths of eye movement methodology is its ecological validity. After all, if the goal is to study lexical or other cognitive processes during reading, it is best to have the experimental tasks as close to normal reading as possible. By doing so, the designs will have more ecological validity and can be generalized to normal reading more easily.
Representative designs usually imply less control and manipulation. Traditionally, it has been believed by experimental psychologists that representative designs are inadequate for making causal inferences, that ecological validity may facilitate the formulation of population estimates but is not necessary for causal inferences, and that experiments are not conducted to establish population estimates.
Although the above statement is still true in many situations, modern eye movement methodology (specifically, the use of computers in monitoring eye movements on-line and producing eye-movement contingency display changes in real time) have given the representative designs more inferential power (McConkie & Rayner, 1975, 1976; Rayner, 1978).
The second strength of eye movement methodology is that the eye movements are perhaps the only observable behaviors that can be used to make inferences about the underlying cognitive processes during reading. In general, the eye movements respond to the cognitive demands in real time. Consequently, in order to understand the operation of a real-time system such as the human visual information processing system, eye movements are perhaps the best measurements.
Despite the advances in technology, eye movement methodology also has disadvantages. The eye movement control mechanism during reading is extremely complex and has not been fully understood yet. It is virtually impossible and incorrect to simply use fixation times or gaze duration to derive estimates of lexical access or word identification times, or the times of any other cognitive processes during reading. The reasons are as follows. First of all, there is a problem of parafoveal previews. Some of the time that is spent processing many fixated words occur on the prior fixation. Secondly, there is also a problem of word skipping. When a word is skipped, some of the time spent fixating on the prior word is spent processing the skipped word. Finally, higher-order processes may also affect the fixation time.
Eye movement record needs to be interpreted with great caution. The ability to interpret eye movement record correctly depends heavily on the researcher's knowledge and experience in eye movement research. Therefore, eye movement methodology is difficult to be learned and adopted by most reading researchers.
Another disadvantage of using eye movement methodology in studying reading is that it is still hard to effectively manipulate independent variables while preserving the ecological validity at the same time, even with the power of eye-movement contingency display technique.
Why do we need more control? Inferences about lexical processing requires appropriate assumptions on how words are represented in the mental lexicon. Although eye movements in reading are very important in revealing important aspects of the cognitive processes that occur during reading, from time to time reading researchers will also focus on the lexical representations themselves. When studying lexical representation rather than on-line processes during reading, single-word reading paradigms such as lexical decision or naming, together with priming, masking, or other controlled presentation techniques, may be more suitable.
(Works cited but not found in the original Lexical Processing During Reading Reading List.)
Berkowitz, L., & Donnerstein, E. (1982). External validity is more than skin deep: Some answers to criticisms of laboratory experiments. American Psychologist, 37, 245-257.
Besner, D., Twilley, L., McCann, R. S., & Seergobin, K. (1990). On the connection between connectionism and data: Are a few words necessary? Psychological Review, 97, 432-446.
Carr, T. H., Davidson, B. J., & Hawkins, H. L. (1978). Perceptual flexibility in word recognition: Strategies affect orthographic computation but not lexical access. Journal of Experimental Psychology: Human Perception and Performance, 4, 674-690.
Li, P, Bates, E., & MacWhinney, B. (1993). Processing a language without inflections: A reaction time study of sentence interpretation in Chinese. Journal of Memory and Language, 32, 169-192.
McConkie, G. W., & Rayner, K. (1975). The span of the effective stimulus during a fixation in reading. Perception & Psychophysics, 17, 578-586.
McConkie, G. W., & Rayner, K. (1976). Asymmetry of the perceptual span in reading. Bulletin of the Psychonomic Society, 8, 365-368.
Petrinovich, L. (1979). Probabilistic functionalism: A conception of research method. American Psychologist, 34, 373-390.
Rayner, K. (1978). Eye movements in reading and information processing. Psychological Bulletin, 3, 618-660.
Rumelhart, D. E., Hinton, G. E., & Williams, R. J. (1986). Learning internal representations by error propagation. In D. E. Rumelhart, J. L. McClelland, & the PDP research group (Eds.), Parallel distributed processing: Explorations in the microstructure of cognition. Volume 1: Foundations (pp. 318-362). Cambridge, MA: MIT Press.
Tsai, C. H. (1994). Effects of semantic transparency on the recognition of Chinese two-character words: Evidence for a dual-process model. Unpublished master's thesis, National Chung-Cheng University, Chia-Yi, Taiwan.