Doctoral Dissertation of Chih-Hao Tsai >

July 2001

Tsai, C.-H. (2001). Word identification and eye movements in reading Chinese: A modeling approach. Doctoral dissertation, University of Illinois at Urbana-Champaign.

Previous:References | Top:Table of Contents


p. 107Appendix C
Examples of Errors in Disambiguating Disjunctive Ambiguity

p. 108Appendix C lists examples of errors in disambiguating critical fragments with disjunctive ambiguity in Part 1 (Chapter 8), including errors made by GMM, FMM, AWF, and MI. Examples listed here are those with correct tokenizations being among their critical tokenizations. Those with correct tokenizations being covered (covering relationship as defined in Guo, 1997) by at least one of the critical tokenizations are not listed. For example, the character string gou tong cai neng "ditch-connect-talent-ability" has two critical tokenizations: goutong caineng "communication-talent" and gou tongcai neng "ditch-versatile person-ability". However, the correct tokenization (in all of the three contexts where the critical fragment appears in ASBC) is goutong cai neng "communication-then and only then-can; '(something is) possible only via communication'", which is covered by the critical tokenization goutong caineng. This kind of errors is not included in Appendix C.

Please also be reminded that critical fragments are character strings segmented by critical points--unambiguous word boundaries defined mechanically. Consequently, they do not necessarily match any linguistic structure, and therefore do not necessarily have comprehensible meanings.

GMM Unique Resolution

This section lists examples of errors caused by GMM in disambiguating critical fragments where a single critical tokenization with maximum average of word length (AWL) can be identified (that is, no ties in AWL), but the correct tokenization does not have the maximum AWL.

p. 109(1) yi jiu shi ren          recall-old-time-people
1.  yi jiushi ren           recall-old times-people
                            AWL = 1.33

2.  *yijiu shiren           cherish memory of-contemporaries
                            AWL = 2.00

(2) li zhi shou shi         leave-job-keep-time
1.  li zhishou shi          leave-duty-when
                            AWL = 1.33
2.  *lizhi shoushi          leave office-show up on time
                            AWL = 2.00

(3) ji gu tou deng          chicken-bone-head-class
1.  ji gutou deng           chicken-bone-and so on
                            AWL = 1.33
2.  *jigu toudeng           chicken bone-first class
                            AWL = 2.00

(4) yi shu tuan ti neng     art-skill-group-body-can
1.  yishu tuanti neng       art-group-can
                            AWL = 1.67
2.  *yishutuan tineng       art group-physical strength
                            AWL = 2.50

(5) xue qi zhong xue sheng  learning-period-center-learning-life
1.  xueqi zhong xuesheng    semester-halfway between-students
                            AWL = 1.67
2.  *xueqi zhongxuesheng    semester-high school students
                            AWL = 2.50
3.  *xue qizhong xuesheng   learning-midterm-students
                            AWL = 1.67

GMM Tie

This section lists examples of errors resulted from at least one of the following heuristics: FMM, AWF, and MI. Naturally, for the above heuristics to be applied, there must be ties inp. 110 AWL resulted from the application of GMM. The three heuristics were applied and evaluated independently, as described in Chapter 8. Since each heuristic could either succeed or fail, there are eight possible outcome combinations of the three heuristics. Excluding the situation where all heuristics succeed, there are seven different situations with at least one heuristic failing to pick up the correct tokenization.

Each sub-section lists examples of errors of a particular outcome combination, and the sub-section heading denotes the pattern of combination. Heuristic(s) marked with a "(+)" sign succeeded, and those marked with a "(-)" sign failed in picking up correct tokenizations. The FMM score ranges from 1 to the number of competing tokenizations. The tokenization with the highest FMM score is what the FMM heuristic chooses. The averages of logarithmically transformed frequencies for words are scaled up by 10^6 times, and the sums of mutual information for characters are scaled up by 10^9 times, to make them easier to read.

FMM(+) AWF(+) MI(-)

(6) wai guo xue     outside-nation-learning
1.  waiguo xue      foreign country-learning
                    FMM = 2 AWF = 9,435,103 MI = -2,692,968
2.  *wai guoxue     outside-studies of ancient Chinese civilization
                    FMM = 1 AWF = 7,554,792 MI = 152,131

(7) chang di zu     field-land-rent
1.  changdi zu      place-rent
                    FMM = 2 AWF = 7,714,539 MI = -2,533,996
2.  *chang dizu     field-land rent
                    FMM = 1 AWF = 6,563,432 MI = -1,024,677

(8) lai zi jia ren  come-self-family-people
1.  laizi jiaren    come from-family members
                    FMM = 2 AWF = 9,166,129 MI = -4,998,085
2.  *lai zijiaren   come-people on our side
                    FMM = 2 AWF = 8,333,056 MI = 1,253,631

p. 111(9) chou bei chu yu prepare-prepare-place-in/at
1.  choubeichu yu   preparatory office-in/at
                    FMM = 2 AWF = 9,986,222 MI = 2,600,155
2.  *choubei chuyi  prepare-be (in a certain condition)
                    FMM = 1 AWF = 7,514,168 MI = 3,766,979

FMM(+) AWF(-) MI(+)

(10) bo chang duan      wave-long-short
1.  bochang duan        wavelength-short
                        FMM = 2 AWF = 6,462,777 MI = 1,007,424
2.  *bo changduan       wave-length
                        FMM = 1 AWF = 7,141,182 MI = -1,851,483

(11) shuo fa ze         speak-law-rule/in that case
1.  shufa ze            statement-in that case
                        FMM = 2 AWF = 10,683,251 MI = 1,109,060
2.  *shu faze           speak-standard method
                        FMM = 1 AWF = 10,918,540 MI = 330,736

(12) zuo wei shen me    make-do-what-suffix for interrogatives and adverbs
1.  zuowei shenme       serve as-what
                        FMM = 2 AWF = 10,929,141 MI = 2,849,043
2.  *zuo weishenme      make-why
                        FMM = 1 AWF = 11,817,185 MI = 2,841,067

(13) bi xia gong fu     pen-down-attack-husband
1.  bixia gongfu        ability to write-skill
                        FMM = 2 AWF = 6,178,638 MI = 2,043,163
2.  *bi xiagongfu       pen-put in time and energy
                        FMM = 1 AWF = 6,272,119 MI = 253,391

FMM(+) AWF(-) MI(-)

(14) di zhu yao             land-master-want
1.  dizhu yao               landlord-want
                            FMM = 2 AWF = 10,676,433 MI = 121,782
2.  *di zhuyao              land-main
                            FMM = 1 AWF = 12,134,349 MI = 1,613,268

p. 112(15) xie xia shan           write-down-mountain
1.  xiexia shan             write down-mountain
                            FMM = 2 AWF = 8,109,442 MI = -1,151,904
2.  *xie xiashan            write-descend hill
                            FMM = 1 AWF = 8,366,070 MI = 446,429

(16) bao zhuang he zhuang   wrap-load-box-load
1.  baozhuanghe zhuang      package box-load
                            FMM = 2 AWF = 4,438,257 MI = 285,822
2.  *baozhuang hezhuang     pack-boxed
                            FMM = 1 AWF = 4,628,693 MI = 1,579,818

(17) hua dong hai an        flower-east-sea-shore
1.  Huadong haian           Huadong-coast
                            FMM = 2 AWF = 5,208,926 MI = 1,810,962
2.  *hua donghai'an         flower-east coast
                            FMM = 1 AWF = 7,168,357 MI = 2,614,353

FMM(-) AWF(+) MI(+)

(18) ke ai qing             but/may-love-affection
1.  ke aiqing               but-love
                            FMM = 1 AWF = 10,507,047 MI = 216,744
2.  *ke'ai qing             lovely-affection
                            FMM = 2 AWF = 8,392,087 MI = -2,811,633

(19) cai mi yu              guess-riddle-language
1.  cai miyu                guess-riddle
                            FMM = 1 AWF = 5,810,796 MI = 1,561,359
2.  *caimi yu               guess riddle-language
                            FMM = 2 AWF = 5,435,182 MI = -3,751,759

(20) tai yang guang xian    too-sun-light-string
1.  taiyang guangxian       sun-ray
                            FMM = 1 AWF = 7,825,861 MI = 1,608,593
2.  *taiyangguang xian      sunlight-string
                            FMM = 2 AWF = 5,798,094 MI = 413,021

p. 113(21) diao cha biao shi      transfer-inspect-form/indicate-indicate
1.  diaocha biaoshi         investigate-indicate
                            FMM = 1 AWF = 8,454,764 MI = 3,309,182
2.  *diaochabiao shi        questionnaire-indicate
                            FMM = 2 AWF = 6,437,682 MI = -2,595,684

FMM(-) AWF(+) MI(-)

(22) yuan zuo zhe       original-writings-nominal suffix
1.  yuan zuozhe         original-author
                        FMM = 1 AWF = 8,933,609 MI = -1,219,412
2.  *yuanzuo zhe        original work-nominal suffix
                        FMM = 2 AWF = 8,259,210 MI = 2,718,467

(23) na shou qiang      to take-hand-gun
1.  na shouqiang        to take-pistol
                        FMM = 1 AWF = 8,251,819 MI = 628,786
2.  *nashou qiang       good at-gun
                        FMM = 2 AWF = 6,235,329 MI= 854,392

(24) zi da du hui       from-large-metropolis-meeting
1.  zi daduhui          from-metropolis
                        FMM = 1 AWF = 7,987,699 MI = -3,808,184
2.  *zida duhui         arrogant-metropolis
                        FMM = 2 AWF = 5,681,793 MI = -1,258,353

(25) dang ri ben ren    undertake-day-foundation-people
1.  dang ribenren       when-Japanese
                        FMM = 1 AWF = 10,034,895 MI = 383,639
2.  *dangri benren      the same day-oneself
                        FMM = 2 AWF = 6,935,471 MI = 602,685

FMM(-) AWF(-) MI(+)

(26) yi ding zhi        one-fixed-value
1.  yi dingzhi          one-constant
                        FMM = 1 AWF = 7,966,095 MI = 149,282
2.  *yiding zhi must-value
                        FMM = 2 AWF = 9,364,571 MI = -1,183,193

p. 114(27) you xiao yong      have-effect-use
1.  you xiaoyong        have-effectiveness
                        FMM = 1 AWF = 10,637,599 MI = 2,639,721
2.  *youxiao yong       effective-use
                        FMM = 2 AWF = 10,969,295 MI = -1,216,556

(28) yi lan xian min    suitable-orchid-county-people
1.  Yilan xianmin       Yilan-county resident
                        FMM = 1 AWF = 6,328,658 MI = 905,642
2.  *Yilanxian min      Yilan county-poeple
                        FMM = 2 AWF = 6,738,372 MI = -2,393,206

(29) wu li xue hui      matter-law-learning-meeting
1.  wuli xuehui         physics-association
                        FMM = 1 AWF = 8,121,930 MI = 1,182,448
2.  *wulixue hui        physics-meeting
                        FMM = 2 AWF = 9,168,069 MI = 295,382

FMM(-) AWF(-) MI(-)

(30) ren kou cai        people-mouth-talent
1.  ren koucai          people-eloquence
                        FMM = 1 AWF = 9,888,140 MI = -1,483,541
2.  *renkou cai         population-talent
                        FMM = 2 AWF = 11,184,133 MI = 1,770,649

(31) lao shi fu         old-teacher-father
1.  lao shifu           old-master
                        FMM = 1 AWF = 8,710,946 MI = -3,404,322
2.  *laoshi fu          teacher-father
                        FMM = 2 AWF = 9,121,112 MI = -670,914

(32) te shu xing neng   special-unique-character-ability
1.  teshu xingneng      special capability
                        FMM = 1 AWF = 8,771,001 MI = 1,639,140
2.  *teshuxing neng     specificity ability
                        FMM = 2 AWF = 9,084,085 MI = 2,641,945

p. 115(33) ting che chang di  stop-car-field-land
1.  tingche changdi     parking-place
                        FMM = 1 AWF = 7,565,084 MI = - 493,641
2.  *tingchechang di    parking lot-land
                        FMM = 2 AWF = 10,108,360 MI = 2,995,229

© Copyright by Chih-Hao Tsai, 2001