Evaluating the effectiveness of a voice activity detector based on various neural networks

Authors

DOI:

https://doi.org/10.15587/1729-4061.2025.321659

Keywords:

convolutional neural network, recurrent neural network, voice activity detector

Abstract

This paper considers the efficiency of neural networks for human voice recognition. The objects of the study are artificial neural networks used for human voice recognition. Their ability to effectively recognize a human voice regardless of language, trained on a small number of speakers in noisy conditions, has been considered. The task being solved is to enhance the accuracy of speech activity detection, which plays a significant role in improving the functioning of automatic speech recognition systems, especially under conditions of a low signal-to-noise ratio.

The findings showed that the accuracy of human voice recognition in languages of different phonetic proximity could vary greatly. As a result of the study, it was found that the recurrent neural network (RNN) demonstrates high accuracy in voice recognition – 95 %, which exceeds the results of the convolutional neural network (CNN), reaching an accuracy of 94 %. Special features of the results are the adaptation of neural networks to multilingual features, which made it possible to increase the efficiency of their work. An important conclusion was that training neural networks on data with different languages and types of speakers significantly improves recognition accuracy. The study confirmed that training neural networks on different languages and speaker types could significantly affect recognition accuracy. The results are an important contribution to the development of speech recognition technologies and have the potential for application in various fields where high accuracy in human voice recognition is required

Author Biographies

Bekbolat Medetov, L.N. Gumilyov Eurasian National University

PhD

Department of Radio Engineering, Electronics and Telecommunications

Ainur Zhetpisbayeva, L.N. Gumilyov Eurasian National University

PhD, Associate Professor

Department of Radio Engineering, Electronics and Telecommunications

Ainur Akhmediyarova, Satbayev University

PhD

Department of Software Engineering

Aigul Nurlankyzy, Satbayev University; Almaty University of Power Engineering and Telecommunications

PhD Student

Timur Namazbayev, Al-Farabi Kazakh National University

Master, Senior Lecturer

Aigul Kulakayeva, International Information Technology University

PhD

Department of Radio Engineering, Electronics and Telecommunications

Nurtay Albanbay, Satbayev University

PhD

Department of Cybersecurity, Information Processing and Storage

Mussa Turdalyuly, Satbayev University

PhD, Associate Professor

Department of Software Engineering

Asset Yskak, Ghalam LLP

Lead Design Engineer

Department of Software Development

Gulzhazira Uristimbek, L.N. Gumilyov Eurasian National University

Master's Student

Department of Radio Engineering, Electronics and Telecommunications

References

  1. Dhouib, A., Othman, A., El Ghoul, O., Khribi, M. K., Al Sinani, A. (2022). Arabic Automatic Speech Recognition: A Systematic Literature Review. Applied Sciences, 12 (17), 8898. https://doi.org/10.3390/app12178898
  2. Zhang, X.-L., Xu, M. (2022). AUC optimization for deep learning-based voice activity detection. EURASIP Journal on Audio, Speech, and Music Processing, 2022 (1). https://doi.org/10.1186/s13636-022-00260-9
  3. Tang, M., Huang, H., Zhang, W., He, L. (2024). Phase Continuity-Aware Self-Attentive Recurrent Network with Adaptive Feature Selection for Robust VAD. ICASSP 2024 - 2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 11506–11510. https://doi.org/10.1109/icassp48485.2024.10446084
  4. Cherukuru, P., Mustafa, M. B. (2024). CNN-based noise reduction for multi-channel speech enhancement system with discrete wavelet transform (DWT) preprocessing. PeerJ Computer Science, 10, e1901. https://doi.org/10.7717/peerj-cs.1901
  5. Priebe, D., Ghani, B., Stowell, D. (2024). Efficient Speech Detection in Environmental Audio Using Acoustic Recognition and Knowledge Distillation. Sensors, 24 (7), 2046. https://doi.org/10.3390/s24072046
  6. Tan, Y., Ding, X. (2024). Heterogeneous Convolutional Recurrent Neural Network with Attention Mechanism and Feature Aggregation for Voice Activity Detection. APSIPA Transactions on Signal and Information Processing, 13 (1). https://doi.org/10.1561/116.00000158
  7. Mihalache, S., Burileanu, D. (2022). Using Voice Activity Detection and Deep Neural Networks with Hybrid Speech Feature Extraction for Deceptive Speech Detection. Sensors, 22 (3), 1228. https://doi.org/10.3390/s22031228
  8. Rho, D., Park, J., Ko, J. H. (2022). NAS-VAD: Neural Architecture Search for Voice Activity Detection. Interspeech 2022, 3754–3758. https://doi.org/10.21437/interspeech.2022-975
  9. Nurlankyzy, A., Akhmediyarova, A., Zhetpisbayeva, A., Namazbayev, T., Yskak, A., Yerzhan, N., Medetov, B. (2024). The dependence of the effectiveness of neural networks for recognizing human voice on language. Eastern-European Journal of Enterprise Technologies, 1 (9 (127)), 72–81. https://doi.org/10.15587/1729-4061.2024.298687
  10. Liu, P., Wang, Z. (2004). Voice activity detection using visual information. 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing, 1, I-609–612. https://doi.org/10.1109/icassp.2004.1326059
Evaluating the effectiveness of a voice activity detector based on various neural networks

Downloads

Published

2025-02-28

How to Cite

Medetov, B., Zhetpisbayeva, A., Akhmediyarova, A., Nurlankyzy, A., Namazbayev, T., Kulakayeva, A., Albanbay, N., Turdalyuly, M., Yskak, A., & Uristimbek, G. (2025). Evaluating the effectiveness of a voice activity detector based on various neural networks. Eastern-European Journal of Enterprise Technologies, 1(5 (133), 19–28. https://doi.org/10.15587/1729-4061.2025.321659

Issue

Section

Applied physics