Two-factor authentication based on keyword spotting and speaker verification
DOI:
https://doi.org/10.30837/2522-9818.2025.3.005Abstract
The subject matter of the article is the development and evaluation of a two-factor speaker authentication method based on voiceprint
identification and keyword spotting (KWS), designed for secure voice-based access in human-machine interfaces, especially for
users with limited mobility. The goal of the work is to create a method for managing speaker authentication using convolutional
neural networks (CNNs), comparing the efficiency of two widely used spectral feature extraction techniques – Mel-Frequency
Cepstral Coefficients (MFCC) and Short-Time Fourier Transform (STFT) spectrograms. The following tasks were solved in the
article: a model of a two-factor authentication method is proposed, which includes speaker identification and voice
password recognition; the quality of MFCC and STFT spectrograms features is compared; the influence of the number
of epochs, CNN architecture and training parameters on the system accuracy is evaluated; the effect of the sampling rate
on the performance of the models was investigated. The following methods are used: deep learning methods with CNN
architecture, fine-tuning, MFCC, and STFT feature extraction, mathematical and statistical analysis of training efficiency,
and system performance metrics. The following results were obtained: the method achieved 97.95% accuracy in speaker
identification using MFCCs after 60 training epochs, and 99.82% accuracy in voice password verification using the same
CNN structure after 20 epochs. The average accuracy of the entire authentication process was 98.75%. Moreover, using MFCC
features reduced training time by a factor of 23 and memory consumption by a factor of 7 compared to STFT spectrograms.
Conclusions: the effectiveness of a two-factor voice authentication method that combines speaker identification by acoustic
voice characteristics and voice password verification was implemented and studied. Further research directions include studying
the impact of alternative spectral features (in particular, CQCC, GFCC, prosodic parameters) on improving accuracy and resistance
to spoofing. Special attention will be paid to optimizing the model for energy-efficient use on portable devices.
References
References
Mourtzis, D., Angelopoulos, J., Panopoulos, N. (2023), "The Future of the Human–Machine Interface (HMI) in Society 5.0".
Future Internet, № 15, 162 р. DOI: https://doi.org/10.3390/fi15050162
Grobelna, I., Mailland, D., Horwat, M. (2025), "Design of Automotive HMI: New Challenges in Enhancing User Experience,
Safety, and Security". Appl. Sci. № 15, 5572 р. DOI: https://doi.org/10.3390/app15105572
Esquivel, P. et al. (2024), "Voice Assistant Utilization among the Disability Community for Independent Living:
A Rapid Review of Recent Evidence", Human Behavior and Emerging Technologies, Vol. 2024, №. 1, 6494944 р.
DOI: https://doi.org/10.1155/2024/6494944
Semary, H. E., Al-Karawi, K. A. (2024), "Abdelwahab M. M. Using voice technologies to support disabled people",
Journal of Disability Research, 2024. Vol. 3. №. 1. DOI: https://doi.org/10.57197/jdr-2023-0063
Lawrence, I. D., Pavitra, A. R. R. (2024), "Voice-controlled drones for smart city applications", Sustainable Innovation for
Industry 6.0. Р. 162–177. DOI: DOI: 10.1109/ICUFN.2017.7993759
Ryu, R., Yeom, S., Kim, S. H., Herbert, D. (2021), "Continuous multimodal biometric authentication schemes: a systematic
review", IEEE Access. Vol. 9. Р. 34541-34557. DOI: 10.1109/ACCESS.2021.3061589
Barkovska, O., Liapin, Y., Muzyka, T., Ryndyk, I., Botnar, P. (2024), "Gaze direction monitoring model in computer
system for academic performance assessment. Civil law aspect", Information Technologies and Learning Tools, Vol 99,
№1, Р. 63–75. DOI: 10.33407/itlt.v99i1.5503
Shaheed, K., Mao, A., Qureshi, I. et al. (2021), "A Systematic Review on Physiological-Based Biometric Recognition Systems:
Current and Future Trends". Arch Computat Methods Eng 28, Р. 4917–4960. DOI: https://doi.org/10.1007/s11831-021-09560-3
Sasongko, S. M. A., Tsaury, S., Ariessaputra, S., Ch, S. (2023), "Mel Frequency Cepstral Coefficients (MFCC) Method and
Multiple Adaline Neural Network Model for Speaker Identification". International Journal on Informatics Visualization,
№ 7(4), Р. 2306–2312. DOI: https://doi.org/10.30630/joiv.7.4.1376
Desplanques, B., Thienpondt, J., & Demuynck, K. (2020), "ECAPA-TDNN: Emphasized Channel Attention, Propagation and
Aggregation in TDNN Based Speaker Verification". In Interspeech 2020, Р. 3830–3834. DOI:
https://doi.org/10.21437/Interspeech.2020-2650
Jahangir, R., Alreshoodi, M., Alarfaj, F. K. (2025), "Spectrogram Features-Based Automatic Speaker Identification for
Smart Services". Applied Artificial Intelligence, № 39(1). DOI: https://doi.org/10.1080/08839514.2025.2459476
Tirumala, S. S., Shahamiri, S. R., Garhwal, A. S., Wang, R. (2017), "Speaker Identification Features Extraction Methods:
A Systematic Review". Expert Systems with Applications, № 90, Р. 250–271. DOI: https://doi.org/10.1016/j.eswa.2017.08.015
Iliev, Y.; Ilieva, G. (2023), "A Framework for Smart Home System with Voice Control Using NLP Methods". Electronics
, № 12, 116 р. DOI: https://doi.org/10.3390/electronics1201011614
Kim, Y., Hyon, Y., Lee, S., Woo, S. D., Ha, T., Chung, C. (2022), "The coming era of a new auscultation system for
analyzing respiratory sounds", BMC Pulmonary Medicine, Vol. 22, №. 1. 119 р. DOI: 10.1186/s12890-022-01896-1
Barkovska, O, Havrashenko, А. (2024), "Research of the impact of noise reduction methods on the quality of
audio signal recovery", Information and control systems at railway transport, 2024, Vol. 29, №. 3. Р. 57–65.
DOI: https://doi.org/10.18664/ikszt.v29i3.313606
Zaman, K., Sah, M., Direkoglu, C., Unoki, M. (2023), "A Survey of Audio Classification Using Deep Learning",
IEEE Access, Vol. 11, Р. 106620–106649. DOI: 10.1109/ACCESS.2023.3318015
Xie, X., Cai, H., Li, C., Wu, Y., Ding, F. (2023), "A Voice Disease Detection Method Based on MFCCs and Shallow CNN",
Journal of Voice, Oct. 2023, DOI: https://doi.org/10.1016/j.jvoice.2023.09.024
Tu, Y., Lin, W., Mak, M. W. (2022), "A survey on text-dependent and text-independent speaker verification", IEEE Access.
Vol. 10. Р. 99038-99049. DOI: DOI: 10.1109/ACCESS.2022.3206541
Luitel, Sophina, Mohd, Anwar. (2022), "Audio Sentiment Analysis Using Spectrogram and Bag-of-Visual- Words",
IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI), IEEE, Р. 200–205.
DOI: https://doi.org/10.1109/IRI54793.2022.00052
Singh, V. K., Sharma, K., Sur, S. N. (2023), "A survey on preprocessing and classification techniques for acoustic scene",
Expert Systems with Applications, Vol. 229, 120520 р. DOI: https://doi.org/10.1016/j.eswa.2023.120520
Labied, M., Belangour, A., Banane, M., Erraissi, A. (2022), "An overview of Automatic Speech Recognition Preprocessing
Techniques", 2022 International Conference on Decision Aid Sciences and Applications (DASA), Chiangrai, Thailand,
Р. 804–809, DOI: 10.1109/DASA54658.2022.9765043
Downloads
Published
How to Cite
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Our journal abides by the Creative Commons copyright rights and permissions for open access journals.
Authors who publish with this journal agree to the following terms:
Authors hold the copyright without restrictions and grant the journal right of first publication with the work simultaneously licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0) that allows others to share the work with an acknowledgment of the work's authorship and initial publication in this journal.
Authors are able to enter into separate, additional contractual arrangements for the non-commercial and non-exclusive distribution of the journal's published version of the work (e.g., post it to an institutional repository or publish it in a book), with an acknowledgment of its initial publication in this journal.
Authors are permitted and encouraged to post their published work online (e.g., in institutional repositories or on their website) as it can lead to productive exchanges, as well as earlier and greater citation of published work.












