Recognizing speech in voice messages

Authors

  • M.P. Riadchenko State Higher Education Institution "Priazovskyi state technical university", Dnipro, Ukraine
  • O.E. Piatykop State Higher Education Institution "Priazovskyi state technical university", Dnipro, Ukraine https://orcid.org/0000-0002-7731-3051

DOI:

https://doi.org/10.31498/2225-6733.45.2022.276225

Keywords:

speech recognition, audio, anti-noise improvement, ASR, NodeJS, messengers, chat-bot app

Abstract

The level of development of information technology makes it possible to use speech recognition technologies in a wide range of human life and activities. It is very convenient to use the voice interface: voice search for the necessary documents, dialing a phone number, managing IOT devices, voice navigation, simple text dictation. Since the natural language interface provides an additional convenience for a person when typing, sending voice messages has become common among users. In this case, voice messages are audio files. But it is not always available and convenient for the recipient to listen to such messages. This problem can be solved with the help of an automatic speech recognition system (ASR). The article describes the stages and elements of the process of processing and recognition of natural language by audio signal. Modern technologies of automatic speech recognition and problems with choosing among them are indicated. Modern automatic speech recognition (ASR) systems understand fully spontaneous speech that is natural, not memorized, contains signs of stuttering or even minor errors. At the same time, they are still too expensive to develop from scratch. So companies are faced with a choice between using the cloud API for ASR developed by the tech giants and using open source solutions. The analysis of the latest research and publications on the processing of voice data is considered. A software solution for automatic conversion of voice messages into text is proposed. The interface to the voice signal delivery system is proposed to be made as a chat bot in the messenger. The article presents the main components of the system, the algorithm of the chat bot, modern technologies for the development, implementation and configuration of the chat bot in the messenger

Author Biographies

M.P. Riadchenko, State Higher Education Institution "Priazovskyi state technical university", Dnipro

Студент

O.E. Piatykop, State Higher Education Institution "Priazovskyi state technical university", Dnipro

Кандидат технічних наук, доцент

References

Добрушкін Г.О. Основні підходи до розпізнавання мовленнєвої інформації (частина 1) / Г.О. Добрушкін, В.Я. Данилов // Вісник Вінницького політехнічного інституту. – 2009. – № 47. – С. 50-64.

Васильєва Н.Б. Проблеми створення систем розпізнавання мовлення для різних комп’ютерних платформ / Н.Б. Васильєва, Д.Я. Федорин // Штучний інтелект. – 2013. – Вип. № 4. – С. 158-167.

Chavan Rupali S. An Implementation of Text Dependent Speaker Independent Isolated Word Speech Recognition Using HMM Ms / Rupali S. Chavan, Dr. Ganesh S. Sable // Journal of Engineering Sciences & Research Technology. – 2013. – Vol. 2(9). – Pp. 2311-2318.

Tang C. Speech Recognition in High Noise Environment / C. Tang, M. Li // Ekoloji. – 2019. – Vol. 28(107). – Pp. 1561-1565.

Dave N. Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition / N. Dave // International Journal For Advance Research in Engineering And Technology. – 2013. – Vol. 1, iss. VI. – Pp. 1-5.

Dubagunta S.P. Improving Children Speech Recognition through Feature Learning from Raw Speech Signal / S.P. Dubagunta, S.H. Kabil, Doss M. Magimai // 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). – 2019. – Pp. 5736-5740. –Mode of access: https://doi.org/10.1109/ICASSP.2019.8682826.

Mittal R. Speech Emotion Recognition / R. Mittal, S. Vart // 2nd International Conference on Intelligent Technologies (CONIT). – 2022. – Pp. 1-6. – Mode of access: https://doi.org/10.1109/CONIT55038.2022.9848265.

AssemblyAI API Platform for Models [Electronic resource]. – Mode of access: https://www.assemblyai.com.

Telegram Bot Features [Electronic resource]. – Mode of access: https://core.telegram.org/bots/features.

Bot API Reference [Electronic resource]. – Mode of access: https://tlgrm.ru/docs/bots/api.

Axios [Electronic resource]. – Mode of access: https://axios-http.com.

Node.js | About [Electronic resource]. – Mode of access: https://nodejs.org/about.

Published

2022-12-29

How to Cite

Riadchenko, M. ., & Piatykop, O. . (2022). Recognizing speech in voice messages. Reporter of the Priazovskyi State Technical University. Section: Technical Sciences, (45), 28–34. https://doi.org/10.31498/2225-6733.45.2022.276225