Resources

Project leader — Anatoly Ventsov.

The Corpus of Spoken Russian is being created at St. Petersburg State University as a collection of audio texts accompanied by orthographic and acoustic-phonetic transcription. The main aim of the Corpus is to provide data for studying spoken word recognition: the Corpus is used for testing our functional model of spoken word recognition and shows what kind of speech signal native speakers encounter in natural communication.

Authors — Vladislav Zubov, Elena Riekhakaynen.

The list includes words and word combinations that differ by the presence or absence of a space and cause difficulties in writing. The list includes 201 pairs, information on the frequency of usage of the units in spoken and written speech, and part-of-speech annotation. It can be used both in the practice of teaching Russian and as a source of stimuli for experiments.

Author — Vladislav Zubov.

The database contains information on the variable pronunciation of words in modern literary Russian. It combines data from three standard dictionaries and the results of the author's annotation. Each lexeme was annotated for part of speech and corpus frequency, while each pair of variants was classified according to the scope of variation (entire paradigm, single form, or subparadigm) and one or more of eight types of variation. The normative status of each variant in each source was also marked. The database includes 2,794 pronunciation variants for 1,164 lexemes. The database can be used as a reference resource or as a tool for linguistic research.

Certificate of state registration of the database No. 2025626096 dated 12.12.2025.