Lexical Processing During Reading in Chinese

Ph.D. Specialized Area Qualifying Exam Questions and Answers

Chih-Hao Tsai ( )

University of Illinois at Urbana-Champaign

June 22-29, 1998

Committee:

George McConkie
Kevin Miller
Jerome Packard

Question 1

Words in Chinese: Hoosain has argued that Chinese words are "flexible" compared to English words. Review evidence on (a) different methods of defining words in Chinese, (b) psychological data on Chinese speakers' conscious knowledge of and/or implicit use of "words" as a cognitive unit, and (c) how Chinese readers parse text into words as they read.

The phenomenon. It is well known that native speakers of Chinese do not have a clear concept about what a word is. People who do not have any experience or knowledge in linguistics believe that the character is the word-level linguistic unit. This belief, although linguistically incorrect, is nevertheless widely held by native speakers of Chinese. The general public's attachment to characters has led Chao (1968) to use the term sociological word to describe this particular status of characters.

The Explanations. There are two possible explanations of the above phenomenon. First of all, the words themselves are not clearly definable units in Chinese. Secondly, words themselves are not concrete entities in the Chinese mental lexicon. The first viewpoint will be examined in Section (a), and the second point of view will be examined in Sections (b) and (c).

(a) Different Methods of Defining Words in Chinese

Words as orthographic framing units. In English orthography, there is a unit roughly correspond to the word in spoken language, and the speaker must surround each unit with white spaces. However, in Chinese orthography, the 'framing' unit is the character, which in general corresponds to the morpheme in spoken language.

Words as listed entries in the lexicon. The lexical words are items listed in the lexicon. This definition is not very useful for both English and Chinese, both because some items listed are not words (e.g., idioms), and because some words are not listed because of their regularity.

Words as concepts. The word can be viewed as a linguistic unit that represents the unit of concept in the non-linguistic, propositional structure. However, both English and Chinese face the problem that some concepts may need more than one words to express, and some words are simply not concepts at all.

Words as outputs of word formation rules (Packard, 2000). The word can be viewed as the output of a word formation rule. Word defined in this way overlap to a large extent with the set of wordlike entities defined using other criteria. However, since the Chinese language does not have a clear discrete set of word formation rules, the output does not overlap completely with the set of wordlike units derived using other criteria.

Words as syntactic free forms (X0) (Packard, 2000). The word can be viewed as an independent occupant of a syntactic form class slot. This definition is more abstract than the orthographic, lexical, and morphological definitions, but more concrete than the conceptual definition. According to Packard (1998), this definition seems to serve as the basis for identifying the orthographic and lexical words, and is the best definition for identifying and describing the Chinese words.

Conclusion. In sum, not all methods of defining words are appropriate in all languages. However, they seem to converge to a large extent in the English language. On the other hand, these methods produce more diverge results in Chinese, which partially account for the argument that Chinese words are "flexible" or "fuzzy".

(b) Speakers' Conscious and Implicit Knowledge of Words

Explicit knowledge of words. As noted earlier, speakers of Chinese do not seem to possess an explicit concept of the word. Hoosain (1992) asked a group of college students to mark word boundaries on conventionally printed text and found that there were disagreements among his participants. Liu, Yeh, Wang, and Chang (1974) found that texts with spaces artificially added between words were read more slowly than regularly printed text. However, these data only tell us that the speakers are not explicitly aware of the words, and making such knowledge explicit does not help the readers. It says virtually nothing about the implicit knowledge of words.

Implicit knowledge of words. Although Hoosain (1992) showed some evidence that the readers do disagree with each other with regard to whether the words are, he did not provide a quantitative summary of the results. In a recent study (Tsai, 1998), two groups of Chinese readers (Mainland and Taiwanese Chinese) were asked to mark word boundaries on a 300-character text according to their subjective evaluation. It was found that the readers did agree with each other to a high degree with regard to where the words were. Therefore, the concept of word of Chinese readers, although explicitly fuzzy, is nevertheless quite concrete and consistent. In addition, it was found that experience in reading word-based hanyu pinyin orthography and linguist level and context of the boundaries all have effects on the boundary marking decisions. The findings suggest that the underlying concept of word is concrete among the readers, but different readers may differ in their thresholds for word boundary decisions.

Psychologists have long been interested in how words are identified during reading. Since written languages consist of strings of words, and the words are physically marked by spaces in English and most other writing systems, it is intuitive to assume that segmenting (parsing) the text into words should not be a problem for the readers. Consequently, when psychologists talk about word identification, they usually do not distinguish between word identification during reading and word identification in isolation, unless they want to investigate the effects of sentence context on word identification.

In written Chinese, however, the task nature of word identification during reading is not the same as word identification in isolation. Word boundaries are not marked in written Chinese, therefore the readers must find where the words are before "word identification" (in the traditional sense) can take place.

Psychologists, being heavily influenced by the traditional definition of word identification, tend to assume that there should be a separate, word segmentation process prior to word identification (Chang, 1993; Hoosain, 1992; Liu, 1974). In other words, word boundaries must be identified before words can be identified. If this were true, then then we should have observed slower reading rate in Chinese due to the extra processing cost spent in word segmentation, compared with that of English. However, this is clearly not the case. When measured in average number of words read per minute, the reading rates in Chinese reported in most studies fall within the range of reading rates in English (Sun, Morita, & Stark, 1985; Tsai & McConkie, 1995).

What is wrong with the "boundary identification precedes word identification" assumption? First of all, boundary identification can be the by-product of word identification. By using either (a) a lexicon to match the input string and a set of algorithms to resolve the ambiguity, or (b) statistical methods, or (c) both, in general very high identification rates (usually above 95%) can be achieved (Chen & Liu, 1992; Chiang, Chang, Lin, & Su, 1996; Fan & Tsai, 1988; Sproat & Shih, 1990; Sproat, Shih, Gale, & Chang, 1996; Yeh & Lee, 1991). Therefore, boundary identification is not necessarily the precondition of word identification.

Secondly, if we are not investigating human mind, computationally there is nothing wrong to propose a separate, word segmentation process. However, human mind were designed to solve adaptive problems, not arbitrary tasks (Cosmides, 1989; Pinker & Bloom, 1990). We are more likely to discover the true cognitive mechanisms if we understand the problem if we first understand the problem they are designed to solve (Marr, 1982). In the task of reading, it is word identification, not boundary identification, that is pertinent to reading. No matter what kind of orthography is being read, no matter what sub-processes might have been involved, the initial goal is still to identify words, not word boundaries. Therefore, I argue that the right goal of a theory of word identification in reading Chinese text must not be boundary identification, although I do not preclude the possibility of boundary identification as a sub-process of word identification.

Question 2

Explain the different 'routes' of lexical access that have been posited in reading, including arguments for and against each, with emphasis on reading the Chinese orthographic system.

Different "Routes" of Lexical Access

The initial stage of lexical access is word identification in which the reader finds the best match for the suitably encoded input (can be either visual or auditory), among the many access representations (can be either spacial or phonological) stored in the mental lexicon. Word identification, in turn, leads to retrieval of (or activation of) learned phonological, semantic, syntactic, and other lexical information.

Since word identification is a perceptual process which is modality-sensitive, two fundamental routes of word identification during reading that have been posited in reading: visual (direct) route and phonological (indirect) route.

The visual route of visual word identification is direct because the modality of the access representations contacted (spacial; i.e., character strings) is the same as the modality of the stimulus (visual).

The phonological route of visual word identification is indirect because the visual stimulus is first recoded into its phonological form by rules or analogies, then the phonological form is used to match the phonological access representations.

The most common view today is that virtually all lexical access is by the fast, direct visual route; the route going through rules or analogies to sound and then to the lexical entry is slower and therefore plays a minor role in lexical access. One of the most strong arguments comes from the finding of frequency by regularity interaction in naming English words (Seidenberg, 1985). For high-frequency words, whether the spelling-sound correspondence is regular did not affect naming latencies. However, for low-frequency words, naming latencies for irregular words were longer than naming latencies for regular words. The results indicate that a large pool of higher frequency words is recognized on a visual basis, without phonological mediation. Phonology only enters into the processing of lower frequency words.

The Chinese Case

There is no doubt that there exists a direct visual route in word identification during reading. In addition, there is also no doubt that phonological recoding does occur in Chinese reading (Tzeng, Hung, & Wang, 1997).

The question to ask, then, is very similar to that in English: Can word identification be mediated by phonology?

Past studies have been overwhelmingly focused on the role of phonology in character identification. It is true that character is the basic writing unit in Chinese. But they are not representative samples of Chinese words. Modern Chinese lexicon consists of predominantly multi-morphemic words, or multi-character words if described in terms of writing units. If the goal is to understand "word identification" in Chinese, researchers should use multi-character words as the stimuli. Theories based on character identification experiments can hardly be viewed as "lexical access" or "word identification" theories.

Recent studies have found that phonological code of a character is available very early in the course of character identification (Perfetti & Tan, 1998; Perfetti & Zhang, 1991, 1995; Perfetti, Zhang, & Bernet, 1992; Tan & Perfetti, 1997). The general conclusion is that the phonological code does not mediate character identification, it does become available at least at the point of identification of the character, and could be earlier than that. The activation of phonological code does not mediate character identification, but is early enough to mediate other lexical processes.

This general conclusion, however, cannot not be generalized to the identification of multi-character words. The identification of a character string is totally different from the identification of a single character. In multi-character word identification, individual constituent's pronunciation is directly related to the pronunciation of the whole word. In other word, the mapping is transparent (i.e., regular). Plus the fact that constituents' pronunciations are activated quite early, in such a scenario, it is more likely for phonology to take part in the identification of multi-character words.

Can word identification be mediated by phonology in Chinese? Very likely. But we will not know for sure unless empirical data is collected.

Question 3

Lexical processing is about the processing of words. Lexical processing can usually be divided into two phases: pre-lexical and post-lexical. Pre-lexical processing refers to the processes that mediate word identification, and post-lexical processing refers to the processes that follow the identification of a word.

Phonological Processing

The term "pre-lexical" is actually ambiguous. It can refer to processes that mediate word identification, or it can merely refer to processes that occur prior to the identification of the word. Recently, researchers studying phonological processing in Chinese character identification have been avoiding using the former sense (mediation) of "pre-lexical processing" (e.g., Perfetti, Zhang, & Berent, 1992). One of the reasons for this shift is that the grapheme-phoneme correspondence of Chinese characters, in general, is too opaque that the phonological route to character identification is unlikely to win the dual-route competition. However, as mentioned in my answer to Question 2, recent researches have found that the activation or synthesis of character pronunciation does initiate pre-lexically, and the phonological code of a character becomes available at the point of identification.

Pre-lexical phonological processing, on the other hand, does play a role in character naming. Seidenberg (1985), for example, has found similar frequency by regularity interaction for both word naming data in English and character naming data in Chinese, indicating that phonology does mediate naming of low-frequency characters. Fang, Horng, and Tzeng (1986) found consistency effects in Chinese character naming task. Characters whose phonetic radicals are reliable cues to pronunciation were pronounced faster than characters whose phonetic radicals are unreliable cues to pronunciation. The consistency effect was also found in pseudo-characters. Thess findings are similar to that in English (Glushko, 1979; cited in Seidenberg, 1985).

Phonological processing has been studied extensively in both English and Chinese. However, as mentioned earlier, in Chinese, the role of pre-lexical phonology in the identification or pronunciation of multi-character words is still unclear.

Morphological Processing

There were only a few studies that have focused on morphological involvement in Chinese word identification. All of them have adopted the word/constituent frequency paradigm, which has been used widely in studying the role of morphology in English word identification since Taft and Foster (1975) and Taft (1979). The common assumption was that if word identification is mediated by its morphological constituents, then the constituent frequency should affect word identification time, even if the word frequency is the same.

However, their results were quite inconsistent. Zhang and Peng (1992) claimed they have found that when word frequency was controlled, the frequency of both constituents affected lexical decision latencies to coordinative words, but only the frequency of the second constituent affected lexical decision reaction times (RTs) to modifier words. However, their design was problematic. As Taft, Huang, and Zhu (1993) pointed out, the frequency manipulation in Zhang and Peng was problematic that the manipulation of character frequency was not equivalent for the two categories of words.

In fact, even Taft, Huang and Zhu's (1993) study which adopted better frequency manipulation failed to find a reliable effect of character frequency. They did not find any significant RT difference between compounds composed of low-frequency characters and those composed of high-frequency characters. As to the effect of morphological structure, they did obtain a very small effect of second character frequency for low-frequency non-binding words, but not for binding words.

To illustrate how inconsistent the findings of character frequency effects are, three more studies are reviewed. Mattingly and Xu (1993) reported reported a reliable, positive character frequency effect for lexical decision RTs on two-character words. However, reliable, negative character frequency effects in lexical decision tasks in which the two characters of a compound were sequentially with 50 ms, 100 ms (Tsai, 1994), 200 ms, 400 ms, and 600 ms (Chen, 1993) intervals. That is, the RTs for words composed of high-frequency characters were longer than the RTs for those composed of low-frequency ones.

To summarize, there is nominal evidence for the involvement of sub-lexical (ore pre-lexical) morphological processing in recognizing Chinese compounds, but more information is needed to determine the nature of these processes. The effects of character frequency has been demonstrated to be much more complex than it had been assumed to be in the studies reviewed. Researchers, therefore, cannot just follow the standard assumption in English studies without carefully examining the nature of character frequency effects in Chinese.

Specialized Area References

Chen, S. T. (1993). Hanyu gouci zai yuedu licheng zhong dui yuyi cufa xiaoying de yijngxiang. [Effects of Chinese morphology on semantic priming effects during reading]. Unpublished master's thesis, National Chung-Cheng University, Chia-Yi, Taiwan.

Cosmides, L. (1989). The logic of social exchange: Has natural selection shaped how humans reason? Studies with the Wason selection task. Cognition, 31, 187-276.

Fang, S. P., Horng, R. Y., & Tzeng, O. J. L. (1986). Consistency effects in the Chinese character and pseudo-character naming tasks. In H. S. R. Kao & R. Hoosain (Eds.), Linguistics, psychology, and the Chinese language (pp. 11-22). Hong Kong: The University of Hong Kong.

Marr, D. (1982). Vision: A computational investigation into the human representation and processing of visual information. New York: W. H. Freeman.

Mattingly, I. G., & Xu. Y. (1993, September). Word superiority in Chinese. Paper presented at the Sixth International Symposium on Cognitive Aspects of the Chinese Language. Taipei, Taiwan.

Packard, J. (2000). The morphology of Chinese. Cambridge, UK: Cambridge University Press.

Pinker, S., & Bloom, P. (1990). Natural language and natural selection. Behavioral and Brain Sciences, 13, 707-784.

Sun, F. C., Morita, M., & Stark, L. W. (1985). Comparative patterns of reading eye movement in Chinese and English. Perception and Psychophysics, 37, 502-506.

Taft, M. (1979). Lexical access via an orthographic code: The basic orthographic syllabic structure (BOSS). Journal of Verbal Learning & Verbal Behavior, 18, 21-39.

Taft, M., & Forster, K. I. (1975). Lexical storage and retrieval of prefixed words. Journal of Verbal Learning & Verbal Behavior, 14, 638-647

Taft, M., Huang, J., & Zhu, X. (1993, September). The influence of character frequency on word recognition responses in Chinese. Paper presented at the Sixth International Symposium on Cognitive Aspects of the Chinese Language. Taipei, Taiwan.

Tsai, C. H. (1994). Effects of semantic transparency on the recognition of Chinese two-character words: Evidence for a dual-process model. Unpublished master's thesis, National Chung-Cheng University, Chia-Yi, Taiwan.

Tsai, C. H. (1998). Words and reading in Chinese. Unpublished manuscript, University of Illinois at Urbana-Champaign.

Tsai, C. H., & McConkie, G. W. (1995, December). The perceptual span in reading Chinese text: A moving window study. Paper presented at the Seventh International Conference on the Cognitive Processing of Chinese and Other Asian Languages, Hong Kong.

Zhang, B. Y., & Peng, D. L. (1992). Decomposed storage in the Chinese lexicon. In H. C. Chen & O. J. L. Tzeng (Eds.), Language processing in Chinese. North-Holland.