Word is commonly assumed to be the basic linguistic unit, but its definition has actually been controversial in Chinese. The Chinese language is documented in Chinese characters, with no spaces between words: for Chinese, the inherent and (relatively) stable unit is the Chinese character, but not words. The study looks at the essence of Chinese words, i.e., how Chinese words are formed from Chinese characters.
Experiment Data are from Chinese native speakers' word segmentation test.
Correlation Analysis is the correlation between results from the word segmentation test and the conventionality index of Modern Mandarin.
Besides the conventionality index of Modern Mandarin, the conventionality of frequent two-character combinations in Zuozhuan, Shishuoxinyu, Bianwen, Nogeoldae & Bak Tongsa, and Sanyanerpai are also presented in separate files.
Article abstract: The Chinese language is defined on the basis of Chinese characters, which stabilize monosyllabic root morphemes across the countless varieties. As subsyllabic linguistic forms such as derivational morphology can hardly be represented by Chinese characters, compounding is preferred over derivation in Chinese. Compounds do not have fixed word boundaries. The wordhood of compounds pertains to the level of conventionality in language use, which is a continuum instantiated by synchronic gradience and diachronic gradualness. A perennial archaizing aesthetics further complicates the determination of Chinese words by preserving classical linguistic forms in formal and literary writing, thus making every synchronic stratum heterogeneous by blurring the distinction between historical strata. Therefore, the boundaries of words have always been fluid in native speakers’ mental lexicon.
(2021-07-19)