Doctoral Dissertation of Chih-Hao Tsai >

July 2001

Tsai, C.-H. (2001). Word identification and eye movements in reading Chinese: A modeling approach. Doctoral dissertation, University of Illinois at Urbana-Champaign.

Previous:Chapter 2 | Top:Table of Contents | Next:Chapter 4


p. 12Chapter 3
Mental Lexicon and Lexical Access

Now that the linguistic status of the word in Chinese is clarified, we can return to discuss one of the most relevant subjects in the psycholinguistic domain: mental lexicon and lexical access. Traditionally, lexical access has been conceptualized as follows. Lexical access begins with word identification (in the conventional sense). The orthographic or phonological input is encoded, and then the encoded input is used to find the best match in the mental lexicon. The identification of a visually presented word can be achieved via two routes. First, the encoded orthographic input is used to search the mental lexicon. Second, the encoded orthographic input is recoded to a phonological code, then the phonological code is used to search the mental lexicon. This is called the dual-route model of lexical access (Coltheart, 1978). After the identification of a word (i.e., a match in the mental lexicon is found), its syntactic categories, meanings, and other properties are retrieved or activated.

There are three fundamental issues with respect to lexical representation and lexical access. (a) What is the role of the indirect (phonological) route in identifying visually present words? (b) How are complex words represented in the mental lexicon? (c) Are complex words decomposed prior to lexical access?

Phonology and Visual Word Identification

There are actually a few sub-questions related to the role of phonology in visual word identification. At the beginning of lexical access research, researchers asked whether words are recoded into phonological codes at all during reading. The answer is "yes" for both English p. 13 (Kleiman, 1975; Meyer, Schvaneveldt, & Ruddy, 1974; Rubenstein, Lewis, & Rubenstein, 1971; Van Orden, 1978) and Chinese (Tzeng, Hung, and Wang, 1977).

Researchers then began to ask whether phonological codes are used in visual word identification. This is a rather complicated issue, but the general impression is that a large pool of higher frequency words is recognized on a visual (i.e., direct) basis, without phonological mediation. Phonology only enters into the processing of lower frequency words (Seidenberg, 1985). In fact, in the same study Seidenberg also found the same pattern of word frequency by spelling-sound consistency interaction for both English and Chinese, which led him to make the same conclusion for both writing systems.

A related question is how the phonological code is computed. In traditional dual-route models, phonological codes for regular words (e.g., gave) can be either assembled by applying orthography-phonology conversion rules, or by retrieving from the mental lexicon. Phonological codes for nonwords (e.g, brane) can only be assembled. Phonological codes for irregular words (e.g., have) cannot be assembled and must be retrieved from the mental lexicon. However, it has been demonstrated that the regularity dimension is more graded, and that single-process connectionist models trained to encode orthography-phonology relationship in a distributed representation are capable of generating pronunciations for both regular and irregular words as well as for nonwords (Plaut, McClelland, Seidenberg, & Patterson, 1996; Seidenberg & McClelland, 1991). In Chinese, although there is indeed regularity, the orthography-phonology mapping is nevertheless less transparent. The connectionist accounts have attracted many psychologists studying phonological processing in Chinese word identification, but the models proposed (e.g., Perfetti & Tan, 1999; Tan & Perfetti, 1997) are usually too vague to be tested.

p. 14The final question, which is also the core question, is whether phonological information is used in visual word identification at all. The research results have been inconclusive. Daneman, Reingold, and Davidson (1995) showed positive evidence, but Inhoff and Topolski (1994) and Rayner, Pollatsek, and Binder (1998) showed negative evidence. In Chinese, most studies have been done at the character level. Although it has been demonstrated that the phonological codes are activated very early in the course of character identification (Perfetti & Zhang, 1991, 1995; Perfetti & Tan, 1998; Tan & Perfetti, 1997), the researchers have found little evidence for the mediation of phonology in character identification.

Morphology, Semantics, and the Mental Lexicon

What is listed in the mental lexicon? Since the main word-formation mechanisms in English are inflection and derivation, naturally they have been extensively studied by psycholinguists. It has been found that in addition to morphological structure, word frequency and regularity (Stemberger & MacWhinney, 1988) and semantic opacity (Marslen-Wilson, Taylor, Waksler, & Older, 1994) also affect whether a morphologically complex word is listed in the mental lexicon. Stemberger and MacWhinney argued, from speech error data occurring naturally or in controlled experiments, that irregularly inflected forms (e.g., sang) are stored. High-frequency regularly inflected forms are also stored, but low-frequency ones are not. Marslen-Wilson et al. argued, from data collected with the cross-modal repetition priming task, that semantically transparent suffixed and prefixed words are represented in decomposed morphemic forms at the level of the lexical entry.

The above findings in English are not without controversy, but should be sufficient to capture quite adequately what most researchers believe and shed light on the Chinese situation. p. 15The predominant word-formation mechanism in Chinese is compounding, rather than inflection or derivation. The latter two do exist in Chinese, but there are very few inflectional and derivational affixes. Note that Packard (2000) uses grammatical/word-formation affix in favor of inflectional/derivational affix, because "not all the world's languages possess the properties (such as agreement, paradigms and morphophonemic alternation) that are often associated with these affix categories" (p. 77).

The majority of Chinese polymorphemic words are compound words. However, Chinese compound words do not just consist of free morphemes. The compounding morphology in Chinese also takes roots, which are usually bound but have their own semantic and syntactic properties. A number of researchers became interested in the representation of Chinese compounds during 1992-1995 (S.-T. Chen, 1993; Lee, 1995; M.-Y. Liang, 1992; Tsai, 1994). They manipulated semantic opacity as well as word frequency, expecting to obtain similar effects of word frequency and semantic opacity as those found in English research. Unfortunately, the results were very inconsistent. (The only consistent effect was the word frequency effect.) Presumably, this was due to the fact that morphological structure of the compounds was not controlled at all, and that semantic opacity was not well defined because it was very difficult to set an unambiguous criterion for either transparency or opacity. Zhang and Peng (1992) and Peng, Liu, and Wang (1999) investigated the effects of semantic relations of constituents of compounds on lexical access. However, they not only failed to control the word structure based on the form classes of constituents, but also mistakenly treated characters as morphemes. This makes their findings difficult to interpret.

p. 16Lexical Access of Complex Words

Taft (1979, 1992) and Taft and Forster (1975, 1976) reported evidence that in visual word recognition, morphologically complex words are analyzed into smaller units based on orthographic and syllabic structure before lexical access occurs. However, others (Jordan, 1986; Lima & Pollatsek, 1983; Prinzmetal, Treiman, & Rho, 1986) have found contradictory results. Some researchers (e.g., Seidenberg, 1987; Seidenberg & McClelland, 1989) argued that effects of various kinds of sublexical structures are merely emergent phenomena of orthographic redundancy. This is an area where there have been more disagreements than agreements.

I will not go into details of these studies, as perceptual parsing is not relevant in Chinese--the constituent characters of Chinese words are already perceptually salient and do not need to be perceptually segmented. What is relevant in Chinese is whether a compound word is recognized as a whole, or mediated by its constituent characters. It seems intuitively reasonable that the meanings of most compound words can be derived from the meanings of their constituents. However, this may be an illusion. Take idioms for example. An idiom can be semantically compositional in the sense that its meaning is entirely derivable given the knowledge of the meanings of the constituent words. The analyzability of an idiom is the extent to which a speaker of the language can trace the relations between the two levels of meaning (literal vs. figurative) (Cacciari, 1993). By such criteria, most Chinese compounds are indeed analyzable, but it does not imply that they are also semantically compositional. Even for those truly transparent compounds, it is still questionable whether there is a need to compute the compound meaning from meanings of constituents. After all, if a word can be identified as a whole, why bother carrying out the much less efficient semantic computation? In fact, studies that manipulated frequencies of p. 17 constituents of Chinese compounds have failed to obtain a consistent pattern constituent frequency effect. Some obtained positive effects of constituent frequency (Mattingly & Xu, 1993; Zhang & Peng, 1992), some obtained negative effects (S.-T. Chen, 1993; Tsai, 1994), and some obtained null, unreliable, or mixed effects (Taft, Huang, & Zhu, 1993). Therefore, there is no strong evidence for the involvement of sublexical semantic processing in identifying compounds.


© Copyright by Chih-Hao Tsai, 2001