Doctoral Dissertation of Chih-Hao Tsai >

July 2001

Tsai, C.-H. (2001). Word identification and eye movements in reading Chinese: A modeling approach. Doctoral dissertation, University of Illinois at Urbana-Champaign.

Back:Chapter 7, Chapter 10 | Top:Table of Contents


p. 88Table A1
Distribution of Word Lengths

Length Unique
words
Percentage of
unique words
Word tokens Percentage of
word tokens
Percentage of
characters tokens
1 3,863 2.94 2,242,590 46.18 28.50
2 66,785 50.74 2,291,738 47.19 58.26
3 45,381 34.48 258,173 5.32 9.84
4 12,297 9.34 55,922 1.15 2.84
5 1,878 1.43 5,563 0.12 0.36
6 698 0.53 1,411 0.03 0.11
7 385 0.29 481 0.01 0.04
8 174 0.13 200 < 0.01 0.02
9 90 0.07 106 < 0.01 0.01
10 26 0.02 28 < 0.01 < 0.01
11 3 0.01 13 < 0.01 < 0.01
12 11 0.01 12 < 0.01 < 0.01
13 7 0.01 8 < 0.01 < 0.01
14 7 0.01 7 < 0.01 < 0.01
15 1 < 0.01 1 < 0.01 < 0.01

Note. Percentage of character tokens = percentage of character tokens in the corpus occurring in words with the given length.


© Copyright by Chih-Hao Tsai, 2001