NATIONAL CORPUS OF THE KAZAKH LANGUAGE

General Information

Parallel Subcorpus The Parallel Subcorpus is a collection of source texts and their translations. The purpose of creating the parallel subcorpus within the National Corpus of the Kazakh Language is to build a database of aligned translated texts and provide a linguistic platform for teaching the Kazakh language. The parallel subcorpus consists of aligned texts, annotation, metadata, and a search system. At the initial stage, the text base includes literary and official texts. Morphological analysis has been performed for aligned texts in both Kazakh and Russian. Metadata for literary texts includes 28 parameters. The volume of official texts составляет 600,000 word usages, literary texts — 1,500,000 word usages, with a total of 2,000,100 word usages. The Cultural-Representative Subcorpus is an electronic database that provides information about the cultural semantics of ethnocultural units. The total volume of texts in this subcorpus is 8 million word usages. The texts are collected from four main areas: folklore, authorial oral heritage, ethnographic studies, and scientific works and articles. Users can search for ethnocultural units by thematic groups such as personal names, kinship terms, national cuisine, traditional clothing, jewelry, weapons, sacred numbers, and household items. The corpus also provides information on lexical layers, including religious vocabulary, archaic words, borrowed words, ethnographisms, variants, and cultural onyms.