KZ | RU | EN
Welcome to the National Corpus of the Kazakh Language website! The corpus website contains an electronic text collection of the Kazakh language. The total volume of texts in the corpus is 40 million. The texts are compiled from five functional styles of the Kazakh language (literary style, scientific style, journalistic style, official-business style, and colloquial style). The corpus allows users to perform searches by word and word form (inflected form) and to view a list of sentences in which the queried word is used, along with their sources. For any found word/word form or for any word in the example sentences, information covering all linguistic levels is provided. The corpus can be used by native speakers of Kazakh as well as learners of the Kazakh language.


NEWS



7.04.2026 – As part of the program-targeted funding project “IRN BR21882227 ‘Enhancement of the National Corpus of the Kazakh Language (NCKL) as a tool of intercultural communication and expansion of its subcorpora’ (2023–2025), an Internet Text Corpus has been developed (volume – 100 million word usages).

17.03.2025 – The interface of the interactive Learner’s Corpus, designed to teach Kazakh to English-speaking learners, has been developed. The work was carried out within the framework of the targeted program funding (2024–2026) under project IRN BR24993244 “Enhancing the National Corpus of the Kazakh Language as the foundation of the Smart-Texts megaproject and Kazakh-language artificial intelligence, and developing its subcorpora”.

09.12.2024 – A six-language parallel internal corpus has been created, containing a database that presents versions of the same text in Kazakh, English, Turkish, Uzbek, Uyghur, and Azerbaijani. The work was completed within the targeted program funding (2024–2026) under project IRN BR24993244 “Enhancing the National Corpus of the Kazakh Language as the foundation of the Smart-Texts megaproject and Kazakh-language artificial intelligence, and developing its subcorpora”.

18.11.2024 – Errors have been collected from various types of written work (dictations, essays, summaries, exam responses, textbooks, newspapers, social media). The errors were classified by type (orthographic, punctuation, grammatical, stylistic, lexical, technical, cognitive), provided with explanations, and published on the NKCL website as a separate internal corpus (the Errors Corpus). The work was carried out within the targeted program funding (2024–2026) under project IRN BR24993244 “Enhancing the National Corpus of the Kazakh Language as the foundation of the Smart-Texts megaproject and Kazakh-language artificial intelligence, and developing its subcorpora”.

16.10.2024 – Terminological texts have been summarized and included in the National Corpus of the Kazakh Language (NCKL) database. (IRN BR21882249 was implemented under the program-targeted financing of the project "Improvement and expansion of sub-corpuse of the National Corpus of the Kazakh Language (NCCB) as means of Intercultural Communication" (2023-2025)

8.10.2024 – The original version of the Learning Sub-corpus has been launched. (IRN BR18574183 was implemented under the program-targeted financing of the project "Automatic Recognition of Kazakh Text: Development of Linguistic Modules and IT Solutions" (2023-2024)

2.10.2024. – The modern poetic corpus has been developed (implemented according to the project IRN BR21882249 Improving the National Corpus of the Kazakh Language (NCKL) as a tool for intercultural communication and expanding its internal corpora" (2023–2025)."

2.10.2024 - A search field by thematic and semantic groups has been added to the corpus of proverbs and sayings (implemented according to the project of IRN BR21882227 Development of linguistic tools and solutions for updating linguistic consciousness in the context of the new Kazakhstan" (2023–2025)."

25.09.2024 – The onomastic corpus database has been updated (implemented according to the project of IRN BR21882249 Improvement of the National Corpus of the Kazakh Language (NCKL) as a tool for intercultural communication and expansion of its internal corpora" (2023–2025)."

12.09.2024 – The historical and poetic corpus has been prepared and included in the database (implemented according to the project of IRN BR21882249 Improvement of the National Corpus of the Kazakh Language (NCKL) as a tool for intercultural communication and expansion of its internal corpora" (2023–2025)."

6.09.2024 – The website of the National Corpus of the Kazakh Language has been transferred to a new design and the interface has been improved.

22.08.2024 - Full names of abbreviated names of regions are given, the Dialectical Corpus database has been updated.

10.05.2024 – "The phraseological unit corpus has been developed (implemented according to the IRN project BR21882227 "Development of linguistic tools and solutions for updating linguistic consciousness in the context of the new Kazakhstan" (2023–2025)."

05.02.2024 - Compared business-style texts have been entered into the parallel corpus.

19.01.2024 – "The corpus of writers' texts has been included in the database of the National Corpus of the Kazakh Language (implemented according to the IRN project BR21882249 "Improvement of the National Corpus of the Kazakh Language (NCKL) as a tool for intercultural communication and expansion of its internal corpora" (2023–2025)."

15.05.2023 – A corpus based on the works of A. Baitursynuly has been created .

20.05.2023 – "The corpus of advertising texts has been included in the database of the National Corpus of the Kazakh Language (implemented according to the project of IRN BR18574132 "Development of cultural-representative and advertising corpora of texts" (2023-2024).

29.05.2023 - Cultural-semantic marking has been introduced into the texts of the linguacultural corpus, the function of uploading video recordings has been launched.

28.05.2023 - "The linguacultural corpus has been created and launched online (implemented according to the project of IRN BR18574132 "Development of cultural-representative and advertising corpora of texts" (2023-2024)."

20.05.2023 - The corpus of proverbs and sayings has been created and launched.

18.05.2023 - The initial website of the onomastic corpus has been created and included in the database of the National Corpus.

2.05.2023 - "The website of the oral corpus has been created and included in the National Corpus of the Kazakh Language. Prosodic marking has been added to the oral texts (implemented according to the project of ZhTN BR11765619 "Development of the National Corpus of the Kazakh Language as an information and innovative base of the state language: a research and educational Internet resource" (2022–2023)."

10.10.2022 - "Texts of the historical corpus have been included in the database (implemented according to the project of IRN BR11765619 "Development of the National Corpus of the Kazakh Language as an information and innovative base of the state language: a research and educational Internet resource" (2022–2023)."

5.05.2022 - "A parallel corpus has been created and included in the artistic style database (implemented under the project of IRN BR11765619 "Development of the National Corpus of the Kazakh Language as an information and innovative database of the state language: a research and educational Internet resource" (2022–2023)."

1.02.2022 - The dialectological corpus has been created and launched.

12.11.2021 – The corpora classified by 5 styles of the Kazakh language have been combined and included in the main corpus.

15.11.2021 - The design of the National Corpus of the Kazakh Language website has been updated, the interface has been improved for the convenience of users.

5.10.2021 - A register of word forms of the corpus texts has been created with the ability to search the register.

28.09.2021 - A search by inflectional forms of the Kazakh language has been implemented.

26.11.2020 - Current work on improving the corpus is reflected on a new page of the news section.

26.11.2020 - Additional information has been added to the guide to software methods for copying text when searching for the desired word.

25.08.2020 - A glossary page has been introduced that describes the concepts of corpus linguistics.

25.08.2020 - The ability to display statistics of the data found for the searched word has been provided.

19.08.2020 - The software provides for searching for examples not only by root words, but also by their modified forms - the "Search by word form" field has been added.

16.08.2020 - The search system for metamarking has been improved for each corpus.

15.08.2020 - The Kazakh language texts collected in a common database are distributed among separate corpora depending on 5 styles, and the search capability is implemented for each.

15.08.2020 - The corpus interface has been launched in Russian and English.

1.08.2020 - Under the supervision of the director of the Institute of Linguistics A.M.Fazylzhanova, the design and interface of the corpus website have been updated and presented for wide use under the domain www.qazcorpus.kz.