KZ | RU | EN
Parallel corpus is the original text and a set of its translations. The purpose of creating a parallel intra-corpus on the basis of KTU is to create a linguistic platform for teaching the Kazakh language by forming a database of texts in the Kazakh language with equivalent translation in other languages. The parallel intra-corpus consists of a base of aligned texts, markup, meta-subjects and a search engine. The text base of the intra-corpus at the first stage included fiction and official business style. The matched texts have morphological analysis in both languages. The information about a fiction text (meta-meta) consists of 28 parameters: author of the original text, title of the original text, language of the original text, source of the original text, time of publication of the original text, number of pages of the original text, number of words in the original text, year of birth of the author of the original text, gender of the author of the original text, style of the original text, type of text, translation text author, translation text title, translation text language, translation text source, translation text publication time, translation text page count, translation text word count, translation text author's year of birth, translation text author's gender, translation text style, translation text type, translation method, translation structural level, text leveller name, text levelling time, name of the person entering the text into the corpus, time of text entry into the corpus.


General Information



Parallel Subcorpus The Parallel Subcorpus is a collection of source texts and their translations. The purpose of creating the parallel subcorpus within the National Corpus of the Kazakh Language is to build a database of aligned translated texts and provide a linguistic platform for teaching the Kazakh language. The parallel subcorpus consists of aligned texts, annotation, metadata, and a search system. At the initial stage, the text base includes literary and official texts. Morphological analysis has been performed for aligned texts in both Kazakh and Russian. Metadata for literary texts includes 28 parameters. The volume of official texts составляет 600,000 word usages, literary texts — 1,500,000 word usages, with a total of 2,000,100 word usages. The Cultural-Representative Subcorpus is an electronic database that provides information about the cultural semantics of ethnocultural units. The total volume of texts in this subcorpus is 8 million word usages. The texts are collected from four main areas: folklore, authorial oral heritage, ethnographic studies, and scientific works and articles. Users can search for ethnocultural units by thematic groups such as personal names, kinship terms, national cuisine, traditional clothing, jewelry, weapons, sacred numbers, and household items. The corpus also provides information on lexical layers, including religious vocabulary, archaic words, borrowed words, ethnographisms, variants, and cultural onyms.