Evaluating the effectiveness of a voice activity detector based on various neural networks
DOI:
https://doi.org/10.15587/1729-4061.2025.321659Keywords:
convolutional neural network, recurrent neural network, voice activity detectorAbstract
This paper considers the efficiency of neural networks for human voice recognition. The objects of the study are artificial neural networks used for human voice recognition. Their ability to effectively recognize a human voice regardless of language, trained on a small number of speakers in noisy conditions, has been considered. The task being solved is to enhance the accuracy of speech activity detection, which plays a significant role in improving the functioning of automatic speech recognition systems, especially under conditions of a low signal-to-noise ratio.
The findings showed that the accuracy of human voice recognition in languages of different phonetic proximity could vary greatly. As a result of the study, it was found that the recurrent neural network (RNN) demonstrates high accuracy in voice recognition – 95 %, which exceeds the results of the convolutional neural network (CNN), reaching an accuracy of 94 %. Special features of the results are the adaptation of neural networks to multilingual features, which made it possible to increase the efficiency of their work. An important conclusion was that training neural networks on data with different languages and types of speakers significantly improves recognition accuracy. The study confirmed that training neural networks on different languages and speaker types could significantly affect recognition accuracy. The results are an important contribution to the development of speech recognition technologies and have the potential for application in various fields where high accuracy in human voice recognition is required
References
- Dhouib, A., Othman, A., El Ghoul, O., Khribi, M. K., Al Sinani, A. (2022). Arabic Automatic Speech Recognition: A Systematic Literature Review. Applied Sciences, 12 (17), 8898. https://doi.org/10.3390/app12178898
- Zhang, X.-L., Xu, M. (2022). AUC optimization for deep learning-based voice activity detection. EURASIP Journal on Audio, Speech, and Music Processing, 2022 (1). https://doi.org/10.1186/s13636-022-00260-9
- Tang, M., Huang, H., Zhang, W., He, L. (2024). Phase Continuity-Aware Self-Attentive Recurrent Network with Adaptive Feature Selection for Robust VAD. ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 11506–11510. https://doi.org/10.1109/icassp48485.2024.10446084
- Cherukuru, P., Mustafa, M. B. (2024). CNN-based noise reduction for multi-channel speech enhancement system with discrete wavelet transform (DWT) preprocessing. PeerJ Computer Science, 10, e1901. https://doi.org/10.7717/peerj-cs.1901
- Priebe, D., Ghani, B., Stowell, D. (2024). Efficient Speech Detection in Environmental Audio Using Acoustic Recognition and Knowledge Distillation. Sensors, 24 (7), 2046. https://doi.org/10.3390/s24072046
- Tan, Y., Ding, X. (2024). Heterogeneous Convolutional Recurrent Neural Network with Attention Mechanism and Feature Aggregation for Voice Activity Detection. APSIPA Transactions on Signal and Information Processing, 13 (1). https://doi.org/10.1561/116.00000158
- Mihalache, S., Burileanu, D. (2022). Using Voice Activity Detection and Deep Neural Networks with Hybrid Speech Feature Extraction for Deceptive Speech Detection. Sensors, 22 (3), 1228. https://doi.org/10.3390/s22031228
- Rho, D., Park, J., Ko, J. H. (2022). NAS-VAD: Neural Architecture Search for Voice Activity Detection. Interspeech 2022, 3754–3758. https://doi.org/10.21437/interspeech.2022-975
- Nurlankyzy, A., Akhmediyarova, A., Zhetpisbayeva, A., Namazbayev, T., Yskak, A., Yerzhan, N., Medetov, B. (2024). The dependence of the effectiveness of neural networks for recognizing human voice on language. Eastern-European Journal of Enterprise Technologies, 1 (9 (127)), 72–81. https://doi.org/10.15587/1729-4061.2024.298687
- Liu, P., Wang, Z. (2004). Voice activity detection using visual information. 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, I-609–612. https://doi.org/10.1109/icassp.2004.1326059
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Bekbolat Medetov, Aigul Nurlankyzy, Timur Namazbayev, Aigul Kulakayeva, Ainur Akhmediyarova, Ainur Zhetpisbayeva, Nurtay Albanbay, Mussa Turdalyuly, Asset Yskak, Gulzhazira Uristimbek

This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.





