Automatic Clustering and Division of Chinese Dialects and Related Computational Methods

JIANG Di

Jinan Journal ›› 2022, Vol. 44 ›› Issue (3) : 10-23.

PDF(4497 KB)
PDF(4497 KB)
Jinan Journal ›› 2022, Vol. 44 ›› Issue (3) : 10-23.

Automatic Clustering and Division of Chinese Dialects and Related Computational Methods

  • JIANG Di
Author information +
History +

Abstract

This paper reviews three measuring methods of the relationships between Chinese dialects: feature statistics, etymological statistics and lexical similarity measures, pointing out that these three measures employ a non-holistic, phonetically and lexically constrained methods of examination. This paper expounds a more applicable calculation model, the Levenshtein Distance algorithm (or Edit Distance), which has an integrated and coordinated function for phonological similarity and lexical correspondence of linear strings between languages or dialects, and implies feature comparison and etymological probability utilities. The automatic dialect classifying experiments in this paper collect 78 dialects from eight districts of Wu, Min, Yue, Xiang, Ke, Gan, Hui and Huai in the South China, and 108 dialects from eight divisions of Mandarin, namely Dialects of Dongbei, Beijing, Ji-lu, Jiao-Liao, Zhongyuan, Lan-Yin, Xinan and Jin Dialect, for a total of 186 Chinese dialects. Swadesh's 100 basic words were collected for each dialect and similarity calculations were carried out between the dialects. The calculation results are basically consistent with the traditional partitioning, but more precise.

Key words

automatic partitioning / Chinese dialects / clustering algorithm / levenshtein distance

Cite this article

Download Citations
JIANG Di. Automatic Clustering and Division of Chinese Dialects and Related Computational Methods. Jinan Journal. 2022, 44(3): 10-23
PDF(4497 KB)

72

Accesses

0

Citation

Detail

Sections
Recommended

/