NATIONAL CORPUS OF THE KAZAKH LANGUAGE

KZ | RU | EN

Welcome to the website of the Onomastic Subcorpus of the Kazakh language! The Onomastic Subcorpus is a tool that collects, systematizes, digitizes, and presents onyms. Through this subcorpus, it is possible to identify changes in place names, their usage in texts, and their frequency of occurrence. In addition, proper names in the language are accumulated and a structured database is formed. Within the subcorpus, types of onyms are supplied with linguistic information, and onomastic texts are annotated using special metadata tagging. The purpose of the Onomastic Subcorpus is to collect Kazakh toponyms, anthroponyms, and other types of onyms, and to provide them with geographic, cultural-semantic, and word-formation annotation. Currently, the “Onomastic Subcorpus” of the National Corpus of the Kazakh Language contains nearly 3,000 onyms and a text database consisting of 15,000,000 word tokens.

GENERAL INFORMATION

The Onomastic Subcorpus is a tool that collects, systematizes, digitizes, and presents onyms. Through this subcorpus, it is possible to identify changes in place names, their usage in texts, and their frequency of occurrence. In addition, proper names in the language are accumulated and a structured database is formed. Within the subcorpus, types of onyms are supplied with linguistic information, and onomastic texts are annotated using special metadata tagging. The purpose of the Onomastic Subcorpus is to collect Kazakh toponyms, anthroponyms, and other types of onyms, and to provide them with geographic, cultural-semantic, and word-formation annotation. Currently, the “Onomastic Subcorpus” of the National Corpus of the Kazakh Language contains nearly 3,000 onyms and a text database consisting of 15,000,000 word tokens.