KZ | RU | EN
Welcome to the website of the main corpus of the Kazakh language! The main corpus is an electronic collection of texts borrowed from five styles of the Kazakh language as a scientific-research and educational IT resource (literary style, scientific style, journalistic style, business style, and spoken style). The goal of the main corpus is to serve as a source of texts that encompass all stylistic layers of the Kazakh language, representing a unified image of a single language. The total volume consists of 31,105,900 words.The main corpus includes a search system by word and word formation (transformation of a word). The main corpus functions with morphological, semantic, lexical, and phonetic-phonological types of annotations. These annotations provide information about the word being searched at all levels of the language: In morphological annotation, the analyzer automatically breaks down the word or word formation into the root and affix (lemmatization), places the word into its root form (lemma), and provides a grammatical description of the affixes. In lexical annotation, all meanings of the word from an explanatory dictionary are reflected. In phonetic annotation, the word’s orthoepy (pronunciation) is given, it is automatically divided into syllables, and the types of syllables are described. In phonological annotation, a phonematic description of the sounds contained in the word is provided. Each text included in the main corpus has a source (metadata). The metadata window opens on the second page when you hover over the author’s name. A corpus user can search for the desired word based on meta-categories such as the text author, text title, author’s gender, text style, audience, type of publication, publication period, topic, and full source.


Loading...

×

Word data will appear here

×

Word data will appear here