National corpus of Kazakh language contains 30 million words. Texts with over 14 million words were maked down by 16-21 parameters (author's info, theme, topic, style, genre, type of text and so on). Texts were collected by all 5 sunctional styles of speech (scientific, literary, journalistic, conversational and official).
Literary subcorpus contains works of famous kazakh writers.
Journalistic subcorpus has texts from media and internet resources.
Scientific subcorpus mainly collected only papers and scientific works from art and literature researches, as the natural sciences are mostly about graphical representation of information.
Official subcorpus was made of the various types of official and legal documentation from internet
Conversational subcorpus is based on the texts from suitable puclication from internet.
In addition, texts were collected from schoolbooks, handook and manuals available.