Speaker recognition by ultrashort utterances
DOI:
https://doi.org/10.15587/1729-4061.2025.327907Keywords:
announcer recognition, ultra-short utterances, phoneme-by-phoneme recognition, ECAPA-TDNN, phonemes of the Kazakh languageAbstract
The object of this study is the accuracy of announcer identification based on short utterances.
To solve the task of speaker identification based on ultra-short speech utterances, a phoneme-by-phoneme approach to constructing voice models has been proposed within the framework of the study. The validity of this approach is based on the fact that short utterances usually contain a limited number of phonemes. In this regard, a hypothesis was put forward assuming that in order to increase the accuracy of announcer identification based on short utterances, it is necessary to analyze the sound of specific phonemes by different announcers.
The experiments involved speech recordings of monosyllabic words with corresponding phonemes, on the basis of which, using the ECAPA-TDNN neural network architecture, announcer voice models were constructed. The experimental studies showed that voice models constructed based on the sounds of only one model provide higher announcer identification accuracy compared to generalized models constructed based on all speech sounds.
It was also found that different phonemes provide different announcer identification accuracy. For example, with a speech signal duration of 2–3 seconds, the accuracy of announcer identification by the generalized model was 75 %. And the accuracy of announcer identification using a model built on the basis of only one phoneme "E", with the same input data, was 85 %, which is 10 percentage points higher than that of the generalized model
References
- Sharif-Noughabi, M., Razavi, S. M., Mohamadzadeh, S. (2025). Improving the Performance of Speaker Recognition System Using Optimized VGG Convolutional Neural Network and Data Augmentation. International Journal of Engineering, 38 (10), 2414–2425. https://doi.org/10.5829/ije.2025.38.10a.17
- Tomar, S., Koolagudi, S. G. (2025). Blended-emotional speech for Speaker Recognition by using the fusion of Mel-CQT spectrograms feature extraction. Expert Systems with Applications, 276, 127184. https://doi.org/10.1016/j.eswa.2025.127184
- Chauhan, N., Isshiki, T., Li, D. (2024). Enhancing Speaker Recognition Models with Noise-Resilient Feature Optimization Strategies. Acoustics, 6 (2), 439–469. https://doi.org/10.3390/acoustics6020024
- Kohler, O., Imtiaz, M. (2025). Investigation of Text-Independent Speaker Verification by Support Vector Machine-Based Machine Learning Approaches. Electronics, 14 (5), 963. https://doi.org/10.3390/electronics14050963
- Missaoui, I., Lachiri, Z. (2025). Stationary wavelet Filtering Cepstral coefficients (SWFCC) for robust speaker identification. Applied Acoustics, 231, 110435. https://doi.org/10.1016/j.apacoust.2024.110435
- Zhang, X., Tang, J., Cao, H., Wang, C., Shen, C., Liu, J. (2025). A Self-Supervised Method for Speaker Recognition in Real Sound Fields with Low SNR and Strong Reverberation. Applied Sciences, 15 (6), 2924. https://doi.org/10.3390/app15062924
- Li, P., Hoi, L. M., Wang, Y., Yang, X., Im, S. K. (2025). Enhancing Speaker Recognition with CRET Model: a fusion of CONV2D, RESNET and ECAPA-TDNN. EURASIP Journal on Audio, Speech, and Music Processing, 2025 (1). https://doi.org/10.1186/s13636-025-00396-4
- Ohi, A. Q., Mridha, M. F., Hamid, M. A., Monowar, M. M., Lee, D., Kim, J. (2020). A Lightweight Speaker Recognition System Using Timbre Properties. arXiv. https://doi.org/10.48550/arXiv.2010.05502
- Kye, S. M., Jung, Y., Lee, H. B., Hwang, S. J., Kim, H. (2020). Meta-Learning for Short Utterance Speaker Recognition with Imbalance Length Pairs. Interspeech 2020. https://doi.org/10.21437/interspeech.2020-1283
- Wang, W., Zhao, H., Yang, Y., Chang, Y., You, H. (2023). Few-shot short utterance speaker verification using meta-learning. PeerJ Computer Science, 9, e1276. https://doi.org/10.7717/peerj-cs.1276
- Chen, Y., Zheng, S., Wang, H., Cheng, L., Zhu, T., Huang, R. et al. (2025). 3D-Speaker-Toolkit: An Open-Source Toolkit for Multimodal Speaker Verification and Diarization. ICASSP 2025 - 2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 1–5. https://doi.org/10.1109/icassp49660.2025.10888389
- Desplanques, B., Thienpondt, J., Demuynck, K. (2020). ECAPA-TDNN: Emphasized Channel Attention, Propagation and Aggregation in TDNN Based Speaker Verification. Interspeech 2020. https://doi.org/10.21437/interspeech.2020-2650
- Ravanelli, M., Parcollet, T., Moumen, A., de Langen, S., Subakan, C., Plantingaet, P. al. (2024). Open-Source Conversational AI with SpeechBrain 1.0. arXiv. https://arxiv.org/abs/2407.00463
- Ravanelli, M., Parcollet, T., Plantinga, P., Rouhe, A., Cornell, S., Lugosch, L. et al. (2021). SpeechBrain: A General-Purpose Speech Toolkit. arXiv. https://arxiv.org/abs/2106.04624
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Bekbolat Medetov, Aigul Nurlankyzy, Timur Namazbayev, Ainur Akhmediyarova, Kairatbek Zhetpisbayev, Ainur Zhetpisbayeva, Aliya Kargulova

This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.





