Doctoral Dissertation of Chih-Hao Tsai >

July 2001

Tsai, C.-H. (2001). Word identification and eye movements in reading Chinese: A modeling approach. Doctoral dissertation, University of Illinois at Urbana-Champaign.

Previous:Acknowledgements | Top:Table of Contents | Next:Chapter 2

p. 1Chapter 1

Word Identification

In reading research, word identification has traditionally been defined as a pattern recognition process in which the visual input is encoded and looked up in the mental lexicon to find the best match. Although the tasks used in the majority of word identification studies in cognitive psychology were single-word reading tasks, such as lexical decision and naming, it is generally believed that the word identification process in online reading does not differ fundamentally from that in single-word reading tasks. In fact, researchers have assumed that the identification process is essentially the same, given the fact that words are isolated by space boundaries in written English. What is different in identifying words online is not the identification process itself, but the quality of the visual input due to unevenly distributed visual acuity across the retina and peripheral preview (Rayner & Pollatsek, 1987). Consequently, most researchers studying online reading and word identification have focused more on those various effects of different quality of visual input on word identification, such as word skipping, refixation, and information integration across fixations (e.g., McConkie & Hogaboam, 1985; McConkie, Kerr, Reddix, & Zola, 1988; McConkie, Kerr, Reddix, Zola, & Jacobs, 1989; Morrison, 1984; Rayner, 1978; Reichle, Pollatsek, Fisher, & Rayner, 1998).

In addition to identifying words in the mental lexicon, there is another sense of word identification in reading, which refers to the identification of word units from the text. This aspect of word identification is easy when word boundaries are present, because space boundaries for words make the word units perceptually salient. Past studies have found that p. 2 boundary information for words are indeed picked up and used by the readers (McConkie & Rayner, 1975, 1976; Rayner, Well, & Pollatsek, 1980; Rayner, Well, Pollatsek, & Bertera, 1982). When reading text with word boundaries removed, English readers experience significant difficulty in reading. Experiments have shown that the mean reading speed is reduced 30% to 50% (Epelboim, Booth, & Steinman, 1994; Spragins, Lefton, & Fisher, 1976).

Therefore, word identification has two faces. Traditionally, researchers have predominantly focused on one face, which is the single-word identification process described in the first sentence of the first paragraph. The other face, which is the identification of words from text, has been paid very little attention because the words are so perceptually salient that identifying them from the text seems effortless.

The Word Identification Problem in Reading Chinese

Exponential Complexity

The scenario of word identification in sentential context is dramatically different in Chinese from that in English. Characters are the basic writing units in Chinese, representing the spoken language at both syllable and morpheme levels. Modern Chinese lexicon consists predominantly of polymorphemic words. In other words, most written Chinese words are multi-character words. However, in conventionally written or printed text, characters are evenly spaced. There are no extra space boundaries for words. The absence of word boundaries makes it apparently very difficult to identify words from the text. For example, in a sentence consisting of 10 characters, there will be 9 character boundaries. Since each character boundary could also be a potential location for a word boundary, the 10-character sentence would generate 512 (2^9) p. 3 different word strings, among which most of the time there is only one correct word string. Word identification uncertainty grows exponentially as the sentence lengthens.

The exponential nature of word identification ambiguity seems extremely challenging, but the Chinese readers do not appear to experience significant difficulty in word identification. In terms of reading speed measured by number of words read per minute, Chinese readers are as fast as English readers (Sun, Morita, & Stark, 1985). Besides, compared with English readers, Chinese readers also make a similar proportion of regressive eye movements (Yang, 1994; Yang & McConkie, 1999). Since regressive saccades usually reflect processing difficulty, the eye movement pattern in Chinese reading, again, shows no evidence of excessive difficulty.

Perceptual Span and Eye Movements

The perceptual span is the extent into the periphery within which the visual information are acquired and used during fixations in reading. Past research have found that readers have very limited perceptual spans. For example, on average, the perceptual span of English readers is about four character spaces to the left of fixation and 14 to the right (McConkie & Rayner, 1976). Underwood and McConkie (1985) further demonstrated that letters are distinguished only within eight character positions to the right. Beyond that point letters become indistinguishable, and only word boundary and word shape information is obtained. For Chinese readers, the perceptual span is about one character space to the left of fixation and two to the right (Inhoff & Liu, 1998; Tsai & McConkie, 1995). Because what can be seen in any single fixation is limited, the eyes need to move around to acquire information from different parts of text.

What does word identification have to do with perceptual span? The perceptual span is limited, and the right-boundary of perceptual span does not always align with a word boundary. p. 4Readers may see incomplete words, and there must be uncertainty about the lengths of the incomplete words. Such uncertainty, in turn, could affect the planning of eye movements.

Word identification could also affect eye movements. Traditionally there have been different views regarding the role of word identification in eye movement control. The strongest position argues that word identification drives eye movements, and that the eye movements are targeting words (Morrison, 1984; Reichle et al., 1998). However, McConkie, Underwood, Zola, and Wolverton (1985) argue that word identification and other cognitive processes only have minimal control over eye movement planning.

The role of word identification in eye movement control in reading Chinese is further complicated by the fact that there are no space boundaries for words in the text. As a result, there are two layers of uncertainty: uncertainty about length of words near the right boundary of perceptual span, and uncertainty about where the word boundaries are within perceptual span. The two layers are interrelated, because the way the string viewable in the perceptual span is tokenized (segmented) affects what is incomplete near the right of the perceptual span.

Goal of the Present Study

Word identification in reading Chinese is no doubt a complicated process, and the tightly bound factor of perceptual span and eye movement makes word identification even more complicated. The goal of this study is to explore the nature of the problem, and develop, implement, and test a model of word identification and eye movements in reading Chinese.

This dissertation is organized as follows. Chapters 2 to 4 review the linguistic status of the word in Chinese, conventional studies of lexical access in Chinese, and studies about word identification during reading in Chinese. Potential causes for the lack of significant progress of p. 5 research in the word identification in reading Chinese are examined in Chapter 5. In Chapter 6 I attempt to reanalyze the problem from several less conventional perspectives, and in Chapter 7 the research framework of this study is laid out. The study itself consists of two parts. Chapter 8 presents the first part, and Chapter 9 presents the second part. The contributions, implications, and limitations of this two-part study are discussed in Chapter 10.

© Copyright by Chih-Hao Tsai, 2001