Concepts and Realization of National Language Data and Language Knowledge Service in China

LONG Congjun

Jinan Journal ›› 2024, Vol. 46 ›› Issue (6) : 15-30.

PDF(2691 KB)
PDF(2691 KB)
Jinan Journal ›› 2024, Vol. 46 ›› Issue (6) : 15-30. DOI: 10.11778/j.jnxb.20240390

Concepts and Realization of National Language Data and Language Knowledge Service in China

  • LONG Congjun
Author information +
History +

Abstract

Before the advent of the era of large language models, ethnolinguists mainly obtained relevant research data by manually searching for various works that recorded ethnolinguistic words, and sentences. In the process from data acquisition to data collection, there are often many practical problems such as high difficulty in data collection, incomplete information acquisition, scattered data distribution, and no system. Nowadays, the processing of language data has entered the era of large models, and the above problems can be effectively solved by collecting, sorting, and saving data and systematizing the data into the database. However, due to the relative shortage of ethnic language resources, the access channels are not smooth, and the interpretation and analysis of relevant ethnic language data require strong language knowledgeability. As a result, the effect of a large model processing ethnic language data is not ideal. Up to now, there is no publicly used large-scale professional database of Chinese ethnic languages in academia. To make effective use of the advantages of large model processing of ethnic languages and solve various problems faced by large model processing of ethnic languages, actively building ethnic language data and language knowledge services with human-computer collaboration as the core should be an effective measure to carry out ethnic language research and inheritance in the Internet era. Ethnic language data and language knowledge services play an important role in humanities and social research, ethnic traditional science and technology, cultural protection and inheritance, and the exploration of Chinese cultural genes.
Based on ethnic language data and knowledge service, this paper constructs professional data resources and a series of knowledge bases for ethnic language and culture research. Digital humanities technology is used to digitize important literature data in the field of ethnolinguistics, and knowledge graph technology is used to associate domain knowledge to form a literature retrieval and knowledge service platform. According to the national language dictionaries, language compendium, endangered languages, grammatical annotations, reference grammar, thesis, and other categories, more than 150 studies were collected from the literature database, more than 200 grammatical category concepts in different national languages were associated, and the results of case category knowledge association were analyzed.
Combining the ethnic language data with the support of digital humanities technology can effectively solve the problems faced by ethnolinguists in collecting materials and analyzing corpus. The mining, mathematical statistics, analysis, and calculation of relevant data can facilitate the ethnolinguists to accurately and comprehensively grasp the combination and aggregation relations of phonetics and grammar within a certain ethnolinguistic system and the differences between cross-ethnolinguistic systems. Visualization of data analysis results can also effectively help ethnolinguists to systematically form regular cognition on phonetics, grammar, vocabulary, and other levels among one or more ethnic languages, and promote the in-depth development of related ethnolinguistic research fields. This paper proposes only a professional data and linguistic knowledge service model for ethnolinguistics research, in order to provide a reference for the implementation of the construction of ethnic language data and linguistic knowledge base. At the same time, we still find that the accuracy, consistency, and standardization of ethnic language data are worthy of attention; there are very rich types of ethnic languages in China, and the diversity of languages carries the diversity of cultures. The relevance of language knowledge reveals the commonalities and differences among ethnic languages and cultures and inspires researchers to think and explore the kinship of ethnic languages and cultural mutual learning.

Key words

language data / language knowledge / case category / knowledge service / Digital humanities

Cite this article

Download Citations
LONG Congjun. Concepts and Realization of National Language Data and Language Knowledge Service in China. Jinan Journal. 2024, 46(6): 15-30 https://doi.org/10.11778/j.jnxb.20240390
PDF(2691 KB)

Accesses

Citation

Detail

Sections
Recommended

/