Welcome to the Main Corpus of the Kazakh Language!
The Main Corpus is an electronic collection of texts from 5 functional styles of the Kazakh language (fiction, scientific, journalistic, official/business, and colloquial), serving as an IT resource for research and education. The purpose of the Main Corpus is to be a text resource that covers all stylistic layers of the Kazakh language and represents a unified picture of the language.
The total volume is 31,105,900 word usages. The Main Corpus includes a search system by word and word form (inflection).
The Main Corpus operates with morphological, semantic, lexical, and phonetic-phonological annotation types. These annotations provide information about the searched word at all levels of the language:
In morphological annotation, the analyzer automatically splits the word/word form into root and affixes (lemmatization) and assigns a part of speech to the root (lemma). It also provides grammatical characteristics of affixes.
Lexical annotation shows all meanings of words from explanatory dictionaries.
Phonetic annotation provides the orthoepy of the word, automatically divides it into syllables, and describes types of syllables.
Phonological annotation provides phonemic characteristics of the sounds within the word.
Each text included in the Main Corpus has a source (metadata). The metadata window opens on a separate page when the cursor is pointed at the author.
Users of the corpus can search for the required word using metadata types (text author, text title, author gender, text style, audience, distribution type, time period, topic, full source).