Development of security systems using DNN and i & x-vector classifiers

Orken Mamyrbayev; Aizat Kydyrbekova; Keylan Alimhan; Dina Oralbekova; Bagashar Zhumazhanov; Bulbul Nuranbayeva

doi:10.15587/1729-4061.2021.239186

Автор(и)

Orken Mamyrbayev Institute of Information and Computational Technologies, Казахстан https://orcid.org/0000-0001-8318-3794
Aizat Kydyrbekova Al-Farabi Kazakh National University, Казахстан https://orcid.org/0000-0001-5740-4100
Keylan Alimhan L. N. Gumilyov Eurasian National University, Казахстан https://orcid.org/0000-0003-0766-2229
Dina Oralbekova Satbayev University, Казахстан https://orcid.org/0000-0003-4975-6493
Bagashar Zhumazhanov Institute of Information and Computational Technologies, Казахстан https://orcid.org/0000-0002-5035-9076
Bulbul Nuranbayeva Caspian University, Казахстан https://orcid.org/0000-0003-3426-1914

DOI:

https://doi.org/10.15587/1729-4061.2021.239186

Ключові слова:

система безпеки, ідентифікація голосу, розпізнавання голосу, голосова біометрія, x-вектор, i-вектор

Анотація

Широке використання біометричних систем викликає підвищений інтерес з боку кіберзлочинців до організації атак для їхнього злому. Таким чином, розробка систем біометричної ідентифікації повинна здійснюватися з урахуванням захисту від цих атак. Розробка нових методів і алгоритмів ідентифікації на основі представлення випадково згенерованих ключових ознак з біометричної бази користувацьких стандартів допоможе мінімізувати недоліки вищевказаних методів біометричної ідентифікації користувачів. Ми представляємо реалізацію системи безпеки на основі ідентифікації голосу в якості ключа управління доступом, і алгоритм перевірки, розроблений з використанням функціональних блоків MATLAB, які можуть розпізнавати особистість людини за його голосом. Наше дослідження показало, що точність цієї системи ідентифікації користувачів за індивідуальними характеристиками голосу становить 90 %. Експериментально доведено, що традиційні MFCC з використанням класифікаторів DNN та i й x-векторів, можуть досягати хороших результатів. У статті розглядаються і аналізуються найбільш відомі з літератури підходи до проблеми ідентифікації користувачів за голосом: методи динамічного програмування, векторне квантування, суміші гауссових процесів, прихована марковська модель. Розроблений програмний комплекс біометричної ідентифікації користувачів за голосом і реалізований в комплексі метод формування стандартів голосу користувача дозволяють знизити кількість помилок при ідентифікації користувачів інформаційних систем за голосом в середньому в 1,5 рази. Запропонована нами система більше підходить для розпізнавання голосу з точки зору точності, безпеки і складності. Застосування отриманих результатів дозволить підвищити захищеність процесу ідентифікації в інформаційних системах від різних атак.

Біографії авторів

Orken Mamyrbayev, Institute of Information and Computational Technologies

PhD, Associate Professor, Deputy General Director in Science

Laboratory of Computer Engineering of Intelligent Systems

Aizat Kydyrbekova, Al-Farabi Kazakh National University

Researcher

Department of Information Systems

Keylan Alimhan, L. N. Gumilyov Eurasian National University

Doctor of Science Degree in Mathematical Sciences, Professor

Department of Mathematical and Computer Modeling

Dina Oralbekova, Satbayev University

Researcher

Department of "Cybersecurity, Information Processing and Storage"

Bagashar Zhumazhanov, Institute of Information and Computational Technologies

Software Engineering

Bulbul Nuranbayeva, Caspian University

Professor, Leader of "Oil and Gas Business" Programs

Посилання

Mohamed, S., Martono, W. (2009). Design of fusion classifiers for voice-based access control system of building security. WRI World Congress of Informatics and Information Engineering. Los Angeles, 80–84. doi: http://doi.org/10.1109/csie.2009.983
Tirumala, S. S., Shahamiri, S. R., Garhwal, A. S., Wang, R. (2017). Speaker identification features extraction methods: A systematic review. Expert Systems with Applications, 90, 250–271. doi: http://doi.org/10.1016/j.eswa.2017.08.015
Zeinali, H., BabaAli, B., Hadian, H. (2018). Online signature verification using i‐vector representation. IET Biometrics, 7 (5), 405–414. doi: http://doi.org/10.1049/iet-bmt.2017.0059
Bimbot, F., Bonastre, J.-F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S. et. al. (2004). A Tutorial on Text-Independent Speaker Verification. EURASIP Journal on Advances in Signal Processing, 2004 (4). doi: http://doi.org/10.1155/s1110865704310024
Finnian, K., Anil, A., Forth, O., van der Vloed, D. (2019). From i-vectors to x-vectors – a generational change in speaker recognition illustrated on the NFI-FRIDA database. IAFPA conference. Istanbul.
Qi, D., Longmei, N., Jinfu, X. (2018). A Speech Privacy Protection Method Based on Sound Masking and Speech Corpus. Procedia Computer Science, 131, 1269–1274. doi: http://doi.org/10.1016/j.procs.2018.04.342
Kelly, F., Forth, O., Kent, S., Gerlach, L., Alexander, A. (2019). Deep neural network based forensic automatic speaker recognition in VOCALISE using x-vectors. Audio Engineering Society (AES) Forensics Conference 2019. Porto.
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S. (2018). X-Vectors: Robust DNN Embeddings for Speaker Recognition. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). doi: http://doi.org/10.1109/icassp.2018.8461375
Van der Vloed, D., Bouten, J., Kelly, F., and Alexander A. (2018). NFI-FRIDA – Forensically Realistic Inter-Device Audio. IAFPA 2018.
Tiwari, V., Hashmi, M. F., Keskar, A., Shivaprakash, N. C. (2019). Speaker identification using multi-modal i-vector approach for varying length speech in voice interactive systems. Cognitive Systems Research, 57, 66–77. doi: http://doi.org/10.1016/j.cogsys.2018.09.028
Khaikin, S., Kussul, N. N. (Ed.) (2006). Neural networks: full course. Moscow: Publishing house "Williams", 1104.
Eskimez, S. E., Soufleris, P., Duan, Z., Heinzelman, W. (2018). Front-end speech enhancement for commercial speaker verification systems. Speech Communication, 99, 101–113. doi: http://doi.org/10.1016/j.specom.2018.03.008
Devan, P., Khare, N. (2020). An efficient XGBoost–DNN-based classification model for network intrusion detection system. Neural Computing and Applications, 32 (16), 12499–12514. doi: http://doi.org/10.1007/s00521-020-04708-x
Vapnik, V. N., Chervonenkis, A. Ia. (1974). Teoriia raspoznavaniia obrazov (statisticheskie problemy obucheniia). Moscow: Nauka, 416.
Yu, H., Tan, Z.-H., Ma, Z., Martin, R., Guo, J. (2018). Spoofing Detection in Automatic Speaker Verification Systems Using DNN Classifiers and Dynamic Acoustic Features. IEEE Transactions on Neural Networks and Learning Systems, 29 (10), 4633–4644. doi: http://doi.org/10.1109/tnnls.2017.2771947
Guo, J., Nookala, U. A., Alwan, A. (2017). CNN-Based Joint Mapping of Short and Long Utterance i-Vectors for Speaker Verification Using Short Utterances. Interspeech 2017. doi: http://doi.org/10.21437/interspeech.2017-430
Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., Dahlgren, N. L., Zue, V. (1993). TIMIT speech data corpus. Philadelphia: Linguistic Data Consortium. doi: https://doi.org/10.35111/17gk-bn40
Richardson, F., Reynolds, D., Dehak, N. (2015). Deep Neural Network Approaches to Speaker and Language Recognition. IEEE Signal Processing Letters, 22 (10), 1671–1675. doi: http://doi.org/10.1109/lsp.2015.2420092
Tailor, J. H., Shah, D. B. (2017). HMM-Based Lightweight Speech Recognition System for Gujarati Language. Lecture Notes in Networks and Systems, 451–461. doi: http://doi.org/10.1007/978-981-10-3920-1_46
Prasetio, B. H., Syauqy, D. (2017). Design of Speaker Verification using Dynamic Time Warping (DTW) on Graphical Programming for Authentication Process. Journal of Information Technology and Computer Science, 2 (1), 11–18. doi: http://doi.org/10.25126/jitecs.20172124
Mahalakshmi P., Shayon, Ashok, S. (2015). MFCC and VQ based voice recognition security system. International Journal of Applied Engineering Research, January, 10 (59), 219–233.
Krom, G. de. (1994). Consistency and Reliability of Voice Quality Ratings for Different Types of Speech Fragments. Journal of Speech, Language, and Hearing Research, 37 (5), 985–1000. doi: http://doi.org/10.1044/jshr.3705.985
Campbell, J. P. (1997). Speaker recognition: a tutorial. Proceedings of the IEEE, 85 (9), 1437–1462. doi: http://doi.org/10.1109/5.628714
Yu, Y., He, J., Zhu, N., Cai, F., Pathan, M. S. (2018). A new method for identity authentication using mobile terminals. Procedia Computer Science, 131, 771–778. doi: http://doi.org/10.1016/j.procs.2018.04.323
Ranjan, S., Yu, C., Zhang, C., Kelly, F., Hansen, J. H. L. (2016). Language recognition using deep neural networks with very limited training data. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). doi: http://doi.org/10.1109/icassp.2016.7472795
Li, W., Fu, T., You, H., Zhu, J., Chen, N. (2016). Feature sparsity analysis for i-vector based speaker verification. Speech Communication, 80, 60–70. doi: http://doi.org/10.1016/j.specom.2016.02.008
Tirumala, S. S., Shahamiri, S. R., Garhwal, A. S., Wang, R. (2017). Speaker identification features extraction methods: A systematic review. Expert Systems with Applications, 90, 250–271. doi: http://doi.org/10.1016/j.eswa.2017.08.015
Shrawankar, U., Thakare, V. M. (2013). Techniques for feature extraction in speech recognition system: a comparative study. International Journal of Computer Applications in Engineering, Technology and Science, 412–418.
Kalimoldayev, M. N., Mamyrbayev, O. Zh., Kydyrbekova, A. S., Mekebayev, N. O. (2020). Algorithms for Detection Gender Using Neural Networks. International journal of circuits, systems and signal processing, 14, 154–159. doi: http://doi.org/10.46300/9106.2020.14.24
Bishop, C. M. (2006). Pattern Recognition and Machine Learning (Information Science and Statistics). Secaucus: Springer-Verlag New York, Inc.
Ibrahim, N. S., Ramli, D. A. (2018). I-vector Extraction for Speaker Recognition Based on Dimensionality Reduction. Procedia Computer Science, 126, 1534–1540. doi: http://doi.org/10.1016/j.procs.2018.08.126
Lozano-Diez, A., Zazo, R., Toledano, D. T., Gonzalez-Rodriguez, J. (2017). An analysis of the influence of deep neural network (DNN) topology in bottleneck feature based language recognition. PLOS ONE, 12(8), e0182580. doi: http://doi.org/10.1371/journal.pone.0182580
Li, L., Wang, D., Zhang, X., Zheng, T. F., Jin, P. (2016). System combination for short utterance speaker recognition. 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). doi: http://doi.org/10.1109/apsipa.2016.7820903
Prince, S. J. D., Elder, J. H. (2007). Probabilistic Linear Discriminant Analysis for Inferences About Identity. 2007 IEEE 11th International Conference on Computer Vision. doi: http://doi.org/10.1109/iccv.2007.4409052
Mamyrbayev, O., Turdalyuly, M., Mekebayev, N., Alimhan, K., Kydyrbekova, A., Turdalykyzy, T. (2019). Automatic Recognition of Kazakh Speech Using Deep Neural Networks. Intelligent Information and Database Systems Proceedings, Part II, 465–474. doi: http://doi.org/10.1007/978-3-030-14802-7_40
Kydyrbekova, A., Othman, M., Mamyrbayev, O., Akhmediyarova, A., Zhumazhanov, B. (2020). Identification and authentication of user voice using DNN features and i-vector. Cogent Engineering, 7 (1), 1751557. doi: http://doi.org/10.1080/23311916.2020.1751557
Naidu, B. R., Babu, M. S. P. (2018). Biometric authentication data with three traits using compression technique, HOG, GMM and fusion technique. Data in Brief, 18, 1976–1986. doi: http://doi.org/10.1016/j.dib.2018.03.115
Richardson, F., Reynolds, D., Dehak, N. (2015). A unified deep neural network for speaker and language recognition. arXiv preprint arXiv:1504.00923.
Snyder, D., Garcia-Romero, D., Povey, D. (2015). Time delay deep neural network-based universal background models for speaker recognition. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 92–97. doi: http://doi.org/10.1109/asru.2015.7404779
D. Snyder, D., Garcia-Romero, D., Povey, D., Khudanpur, S. (2017). Deep Neural Network Embeddings for Text-Independent Speaker Verification. Interspeech 2017, 999–1003. doi: http://doi.org/10.21437/interspeech.2017-620