Development and increase of noise immunity of a model of biometric identification of a speaker based on metal-frequency cepstral coefficients and a convolutional neural network

Authors

DOI:

https://doi.org/10.15587/1729-4061.2025.347451

Keywords:

speaker identification, voice biometrics, Kazakh speech, mel-frequency cepstral coefficients, noise

Abstract

This study is focused on improving the noise robustness of a biometric speaker identification system based on mel-frequency cepstral coefficients (MFCC) and a convolutional neural network (CNN). The object of analysis is the acoustic structure of the Kazakh language under clean and noisy conditions. The experimental database consisted of 16 speakers, each represented by 12 audio recordings with a duration of approximately 1 s. The speech signals were corrupted by additive pink noise with different signal-to-noise ratio (SNR) levels.

Under clean signal conditions, the CNN-based classifier achieved a high recognition accuracy of approximately 96%, as confirmed by the confusion matrix with strong diagonal dominance. When exposed to noise, the classification accuracy decreased to about 69%, demonstrating the significant impact of acoustic interference on speaker identification performance. To improve noise immunity, noise augmentation was applied during training. After retraining on the augmented dataset, the classification accuracy under noisy conditions increased to approximately 89–90%.

The heatmaps of precision, recall, and F1-score demonstrate that after robustness enhancement, most speaker classes achieve stable metric values in the range of 0.85–1.00, while the averaged performance metrics reach accuracy ≈ 0.89–0.90, confirming consistent recognition across the entire dataset. The results show that MFCC features retain discriminative speaker-specific spectral characteristics even under noise and that CNN-based classification significantly outperforms traditional approaches in terms of robustness.

The proposed MFCC–CNN approach provides high identification accuracy in clean environments and maintains competitive performance under noise after data augmentation. The obtained results confirm the practical applicability of the developed system for reliable speaker verification in acoustically unstable environments, including remote biometric authentication, access control, and intelligent communication systems

Author Biographies

Muhabbat Khizirova, Almaty University of Power Engineering and Telecommunications named after Gumarbek Daukeyev

Candidate of Physico-Mathematical Sciences, Associate Professor

Department of Telecommunications Engineering

Katipa Chezhimbayeva, Almaty University of Power Engineering and Telecommunications named after Gumarbek Daukeyev

Candidate of Technical Sciences, Professor

Department of Telecommunication Engineering

Аbdurazak Kassimov, Almaty University of Power Engineering and Telecommunications named after Gumarbek Daukeyev

Candidate of Technical Sciences, Associate Professor, Professor-Lecturer

Department of Telecommunications Engineering

Muratbek Yermekbaev, Almaty University of Power Engineering and Telecommunications named after Gumarbek Daukeyev

PhD, Associate Professor

Department of Telecommunications Engineering

Assiya Iskakova, Almaty University of Power Engineering and Telecommunications named after Gumarbek Daukeyev

Master of Technical Sciences, PhD-Student

Department of Telecommunications Engineering

Zhaina Abilkaiyr, Almaty University of Power Engineering and Telecommunications named after Gumarbek Daukeyev

Science Degree Master

Department Telecommunications Engineering

References

  1. Ahmad, Kh. M., Zhirkov, V. F. (2007). Introduction to digital processing of speech signals. Vladimir State University Press.
  2. Beigi, H. (2011). Fundamentals of Speaker Recognition. Springer, 942. https://doi.org/10.1007/978-0-387-77592-0
  3. Chauhan, N., Isshiki, T., Li, D. (2024). Enhancing Speaker Recognition Models with Noise-Resilient Feature Optimization Strategies. Acoustics, 6 (2), 439–469. https://doi.org/10.3390/acoustics6020024
  4. Ming, J., Hazen, T. J., Glass, J. R., Reynolds, D. A. (2007). Robust Speaker Recognition in Noisy Conditions. IEEE Transactions on Audio, Speech and Language Processing, 15 (5), 1711–1723. https://doi.org/10.1109/tasl.2007.899278
  5. Ji, Z., Cheng, G., Lu, T., Shao, Z. (2024). Speaker recognition system based on MFCC feature extraction CNN architecture. Academic Journal of Computing & Information Science, 7 (7). https://doi.org/10.25236/ajcis.2024.070707
  6. From i-vectors to x-vectors – a generational change in speaker recognition illustrated on the NFI-FRIDA database (2019). Oxford Wave Research. Available at: https://oxfordwaveresearch.com/wp-content/uploads/2020/02/IAFPA19_xvectors_Kelly_et_al_presentation.pdf
  7. Peters, C. A. (2001). Statistics for Analysis of Experimental Data. Environmental Engineering Processes Laboratory Manual. Available at: https://www.researchgate.net/publication/280580217_Statistics_for_Analysis_of_Experimental_Data
  8. Singh, M. K. (2024). Speaker Identification Using MFCC Feature Extraction ANN Classification Technique. Wireless Personal Communications, 136 (1), 453–467. https://doi.org/10.1007/s11277-024-11282-1
  9. Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S. (2018). X-Vectors: Robust DNN Embeddings for Speaker Recognition. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 5329–5333. https://doi.org/10.1109/icassp.2018.8461375
  10. Sumithra, M. G., Thanuskodi, K., Archana, A. H. J. J. (2011). A new speaker recognition system with combined feature extraction techniques. Journal of Computer Science, 7(4), 459–465. https://doi.org/10.3844/jcssp.2011.459.465
  11. Uncini, A. (2022). Digital Audio Processing Fundamentals. Springer, 716. https://doi.org/10.1007/978-3-031-14228-4
  12. Zhumay, I., Tumanbayeva, K., Chezhimbayeva, K., Kalibek, K. (2025). Forecasting anomalies in network traffic. Eastern-European Journal of Enterprise Technologies, 2 (2 (134)), 96–111. https://doi.org/10.15587/1729-4061.2025.326779
  13. Chezhimbayeva, K., Konyrova, M., Kumyzbayeva, S., Kadylbekkyzy, E. (2021). Quality assessment of the contact center while implementation the IP IVR system by using teletraffic theory. Eastern-European Journal of Enterprise Technologies, 6 (3 (114)), 64–71. https://doi.org/10.15587/1729-4061.2021.244976
  14. Nurzhaubayeva, G., Haris, N., Chezhimbayeva, K. (2024). Design of the Wearable Microstrip Yagi-Uda Antenna for IoT Applications. International Journal on Communications Antenna and Propagation (IRECAP), 14 (1), 24. https://doi.org/10.15866/irecap.v14i1.24315
Development and increase of noise immunity of a model of biometric identification of a speaker based on metal-frequency cepstral coefficients and a convolutional neural network

Downloads

Published

2025-12-30

How to Cite

Khizirova, M., Chezhimbayeva, K., Kassimov А., Yermekbaev, M., Iskakova, A., & Abilkaiyr, Z. (2025). Development and increase of noise immunity of a model of biometric identification of a speaker based on metal-frequency cepstral coefficients and a convolutional neural network. Eastern-European Journal of Enterprise Technologies, 6(9 (138), 37–53. https://doi.org/10.15587/1729-4061.2025.347451

Issue

Section

Information and controlling system