Recognizing speech in voice messages
DOI:
https://doi.org/10.31498/2225-6733.45.2022.276225Keywords:
speech recognition, audio, anti-noise improvement, ASR, NodeJS, messengers, chat-bot appAbstract
The level of development of information technology makes it possible to use speech recognition technologies in a wide range of human life and activities. It is very convenient to use the voice interface: voice search for the necessary documents, dialing a phone number, managing IOT devices, voice navigation, simple text dictation. Since the natural language interface provides an additional convenience for a person when typing, sending voice messages has become common among users. In this case, voice messages are audio files. But it is not always available and convenient for the recipient to listen to such messages. This problem can be solved with the help of an automatic speech recognition system (ASR). The article describes the stages and elements of the process of processing and recognition of natural language by audio signal. Modern technologies of automatic speech recognition and problems with choosing among them are indicated. Modern automatic speech recognition (ASR) systems understand fully spontaneous speech that is natural, not memorized, contains signs of stuttering or even minor errors. At the same time, they are still too expensive to develop from scratch. So companies are faced with a choice between using the cloud API for ASR developed by the tech giants and using open source solutions. The analysis of the latest research and publications on the processing of voice data is considered. A software solution for automatic conversion of voice messages into text is proposed. The interface to the voice signal delivery system is proposed to be made as a chat bot in the messenger. The article presents the main components of the system, the algorithm of the chat bot, modern technologies for the development, implementation and configuration of the chat bot in the messenger
References
Добрушкін Г.О. Основні підходи до розпізнавання мовленнєвої інформації (частина 1) / Г.О. Добрушкін, В.Я. Данилов // Вісник Вінницького політехнічного інституту. – 2009. – № 47. – С. 50-64.
Васильєва Н.Б. Проблеми створення систем розпізнавання мовлення для різних комп’ютерних платформ / Н.Б. Васильєва, Д.Я. Федорин // Штучний інтелект. – 2013. – Вип. № 4. – С. 158-167.
Chavan Rupali S. An Implementation of Text Dependent Speaker Independent Isolated Word Speech Recognition Using HMM Ms / Rupali S. Chavan, Dr. Ganesh S. Sable // Journal of Engineering Sciences & Research Technology. – 2013. – Vol. 2(9). – Pp. 2311-2318.
Tang C. Speech Recognition in High Noise Environment / C. Tang, M. Li // Ekoloji. – 2019. – Vol. 28(107). – Pp. 1561-1565.
Dave N. Feature Extraction Methods LPC, PLP and MFCC in Speech Recognition / N. Dave // International Journal For Advance Research in Engineering And Technology. – 2013. – Vol. 1, iss. VI. – Pp. 1-5.
Dubagunta S.P. Improving Children Speech Recognition through Feature Learning from Raw Speech Signal / S.P. Dubagunta, S.H. Kabil, Doss M. Magimai // 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). – 2019. – Pp. 5736-5740. –Mode of access: https://doi.org/10.1109/ICASSP.2019.8682826.
Mittal R. Speech Emotion Recognition / R. Mittal, S. Vart // 2nd International Conference on Intelligent Technologies (CONIT). – 2022. – Pp. 1-6. – Mode of access: https://doi.org/10.1109/CONIT55038.2022.9848265.
AssemblyAI API Platform for Models [Electronic resource]. – Mode of access: https://www.assemblyai.com.
Telegram Bot Features [Electronic resource]. – Mode of access: https://core.telegram.org/bots/features.
Bot API Reference [Electronic resource]. – Mode of access: https://tlgrm.ru/docs/bots/api.
Axios [Electronic resource]. – Mode of access: https://axios-http.com.
Node.js | About [Electronic resource]. – Mode of access: https://nodejs.org/about.
Downloads
Published
How to Cite
Issue
Section
License
Journal "Reporter of the Priazovskyi State Technical University. Section: Technical sciences" is published under license CC-BY (License "Attribution").This license allows you to distribute, edit, correct and take work as a basis for derivatives, even on a commercial basis with an indication of authorship. It is the most convenient of all the proposed license. Recommended for maximum dissemination and use of licensed materials.
Authors who publish in this journal agree to the following terms:
1. The authors reserve the right to authorship of his work and pass the journal right of first publication of the work under the terms of Creative Commons Attribution License, which allows others to freely distribute the work published with the obligatory reference to the authors of the original work and the first publication of the work in this journal.
2. The authors have the right to enter into separate supplemental agreements relating to the non-exclusive dissemination of works in the form in which it was published in the journal (for example, to place the work in the institution, or to publish a monograph as part of an electronic store), while maintaining links to the first publication of the work in this magazine.