Development of security systems using DNN and i & x-vector classifiers

Orken Mamyrbayev; Aizat Kydyrbekova; Keylan Alimhan; Dina Oralbekova; Bagashar Zhumazhanov; Bulbul Nuranbayeva

doi:10.15587/1729-4061.2021.239186

Authors

Orken Mamyrbayev Institute of Information and Computational Technologies, Kazakhstan https://orcid.org/0000-0001-8318-3794
Aizat Kydyrbekova Al-Farabi Kazakh National University, Kazakhstan https://orcid.org/0000-0001-5740-4100
Keylan Alimhan L. N. Gumilyov Eurasian National University, Kazakhstan https://orcid.org/0000-0003-0766-2229
Dina Oralbekova Satbayev University, Kazakhstan https://orcid.org/0000-0003-4975-6493
Bagashar Zhumazhanov Institute of Information and Computational Technologies, Kazakhstan https://orcid.org/0000-0002-5035-9076
Bulbul Nuranbayeva Caspian University, Kazakhstan https://orcid.org/0000-0003-3426-1914

DOI:

https://doi.org/10.15587/1729-4061.2021.239186

Keywords:

security system, voice identification, voice recognition, voice biometric, x-vector, i-vector

Abstract

The widespread use of biometric systems entails increased interest from cybercriminals aimed at developing attacks to crack them. Thus, the development of biometric identification systems must be carried out taking into account protection against these attacks. The development of new methods and algorithms for identification based on the presentation of randomly generated key features from the biometric base of user standards will help to minimize the disadvantages of the above methods of biometric identification of users. We present an implementation of a security system based on voice identification as an access control key and a verification algorithm developed using MATLAB function blocks that can authenticate a person's identity by his or her voice. Our research has shown an accuracy of 90 % for this user identification system for individual voice characteristics. It has been experimentally proven that traditional MFCCs using DNN and i and x-vector classifiers can achieve good results. The paper considers and analyzes the most well-known approaches from the literature to the problem of user identification by voice: dynamic programming methods, vector quantization, mixtures of Gaussian processes, hidden Markov model. The developed software package for biometric identification of users by voice and the method of forming the user's voice standards implemented in the complex allows reducing the number of errors in identifying users of information systems by voice by an average of 1.5 times. Our proposed system better defines voice recognition in terms of accuracy, security and complexity. The application of the results obtained will improve the security of the identification process in information systems from various attacks.

Author Biographies

Orken Mamyrbayev, Institute of Information and Computational Technologies

PhD, Associate Professor, Deputy General Director in Science

Laboratory of Computer Engineering of Intelligent Systems

Aizat Kydyrbekova, Al-Farabi Kazakh National University

Researcher

Department of Information Systems

Keylan Alimhan, L. N. Gumilyov Eurasian National University

Doctor of Science Degree in Mathematical Sciences, Professor

Department of Mathematical and Computer Modeling

Dina Oralbekova, Satbayev University

Researcher

Department of "Cybersecurity, Information Processing and Storage"

Bagashar Zhumazhanov, Institute of Information and Computational Technologies

Software Engineering

Bulbul Nuranbayeva, Caspian University

Professor, Leader of "Oil and Gas Business" Programs

References

Mohamed, S., Martono, W. (2009). Design of fusion classifiers for voice-based access control system of building security. WRI World Congress of Informatics and Information Engineering. Los Angeles, 80–84. doi: http://doi.org/10.1109/csie.2009.983
Tirumala, S. S., Shahamiri, S. R., Garhwal, A. S., Wang, R. (2017). Speaker identification features extraction methods: A systematic review. Expert Systems with Applications, 90, 250–271. doi: http://doi.org/10.1016/j.eswa.2017.08.015
Zeinali, H., BabaAli, B., Hadian, H. (2018). Online signature verification using i‐vector representation. IET Biometrics, 7 (5), 405–414. doi: http://doi.org/10.1049/iet-bmt.2017.0059
Bimbot, F., Bonastre, J.-F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S. et. al. (2004). A Tutorial on Text-Independent Speaker Verification. EURASIP Journal on Advances in Signal Processing, 2004 (4). doi: http://doi.org/10.1155/s1110865704310024
Finnian, K., Anil, A., Forth, O., van der Vloed, D. (2019). From i-vectors to x-vectors – a generational change in speaker recognition illustrated on the NFI-FRIDA database. IAFPA conference. Istanbul.
Qi, D., Longmei, N., Jinfu, X. (2018). A Speech Privacy Protection Method Based on Sound Masking and Speech Corpus. Procedia Computer Science, 131, 1269–1274. doi: http://doi.org/10.1016/j.procs.2018.04.342
Kelly, F., Forth, O., Kent, S., Gerlach, L., Alexander, A. (2019). Deep neural network based forensic automatic speaker recognition in VOCALISE using x-vectors. Audio Engineering Society (AES) Forensics Conference 2019. Porto.
Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S. (2018). X-Vectors: Robust DNN Embeddings for Speaker Recognition. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). doi: http://doi.org/10.1109/icassp.2018.8461375
Van der Vloed, D., Bouten, J., Kelly, F., and Alexander A. (2018). NFI-FRIDA – Forensically Realistic Inter-Device Audio. IAFPA 2018.
Tiwari, V., Hashmi, M. F., Keskar, A., Shivaprakash, N. C. (2019). Speaker identification using multi-modal i-vector approach for varying length speech in voice interactive systems. Cognitive Systems Research, 57, 66–77. doi: http://doi.org/10.1016/j.cogsys.2018.09.028
Khaikin, S., Kussul, N. N. (Ed.) (2006). Neural networks: full course. Moscow: Publishing house "Williams", 1104.
Eskimez, S. E., Soufleris, P., Duan, Z., Heinzelman, W. (2018). Front-end speech enhancement for commercial speaker verification systems. Speech Communication, 99, 101–113. doi: http://doi.org/10.1016/j.specom.2018.03.008
Devan, P., Khare, N. (2020). An efficient XGBoost–DNN-based classification model for network intrusion detection system. Neural Computing and Applications, 32 (16), 12499–12514. doi: http://doi.org/10.1007/s00521-020-04708-x
Vapnik, V. N., Chervonenkis, A. Ia. (1974). Teoriia raspoznavaniia obrazov (statisticheskie problemy obucheniia). Moscow: Nauka, 416.
Yu, H., Tan, Z.-H., Ma, Z., Martin, R., Guo, J. (2018). Spoofing Detection in Automatic Speaker Verification Systems Using DNN Classifiers and Dynamic Acoustic Features. IEEE Transactions on Neural Networks and Learning Systems, 29 (10), 4633–4644. doi: http://doi.org/10.1109/tnnls.2017.2771947
Guo, J., Nookala, U. A., Alwan, A. (2017). CNN-Based Joint Mapping of Short and Long Utterance i-Vectors for Speaker Verification Using Short Utterances. Interspeech 2017. doi: http://doi.org/10.21437/interspeech.2017-430
Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., Dahlgren, N. L., Zue, V. (1993). TIMIT speech data corpus. Philadelphia: Linguistic Data Consortium. doi: https://doi.org/10.35111/17gk-bn40
Richardson, F., Reynolds, D., Dehak, N. (2015). Deep Neural Network Approaches to Speaker and Language Recognition. IEEE Signal Processing Letters, 22 (10), 1671–1675. doi: http://doi.org/10.1109/lsp.2015.2420092
Tailor, J. H., Shah, D. B. (2017). HMM-Based Lightweight Speech Recognition System for Gujarati Language. Lecture Notes in Networks and Systems, 451–461. doi: http://doi.org/10.1007/978-981-10-3920-1_46
Prasetio, B. H., Syauqy, D. (2017). Design of Speaker Verification using Dynamic Time Warping (DTW) on Graphical Programming for Authentication Process. Journal of Information Technology and Computer Science, 2 (1), 11–18. doi: http://doi.org/10.25126/jitecs.20172124
Mahalakshmi P., Shayon, Ashok, S. (2015). MFCC and VQ based voice recognition security system. International Journal of Applied Engineering Research, January, 10 (59), 219–233.
Krom, G. de. (1994). Consistency and Reliability of Voice Quality Ratings for Different Types of Speech Fragments. Journal of Speech, Language, and Hearing Research, 37 (5), 985–1000. doi: http://doi.org/10.1044/jshr.3705.985
Campbell, J. P. (1997). Speaker recognition: a tutorial. Proceedings of the IEEE, 85 (9), 1437–1462. doi: http://doi.org/10.1109/5.628714
Yu, Y., He, J., Zhu, N., Cai, F., Pathan, M. S. (2018). A new method for identity authentication using mobile terminals. Procedia Computer Science, 131, 771–778. doi: http://doi.org/10.1016/j.procs.2018.04.323
Ranjan, S., Yu, C., Zhang, C., Kelly, F., Hansen, J. H. L. (2016). Language recognition using deep neural networks with very limited training data. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). doi: http://doi.org/10.1109/icassp.2016.7472795
Li, W., Fu, T., You, H., Zhu, J., Chen, N. (2016). Feature sparsity analysis for i-vector based speaker verification. Speech Communication, 80, 60–70. doi: http://doi.org/10.1016/j.specom.2016.02.008
Tirumala, S. S., Shahamiri, S. R., Garhwal, A. S., Wang, R. (2017). Speaker identification features extraction methods: A systematic review. Expert Systems with Applications, 90, 250–271. doi: http://doi.org/10.1016/j.eswa.2017.08.015
Shrawankar, U., Thakare, V. M. (2013). Techniques for feature extraction in speech recognition system: a comparative study. International Journal of Computer Applications in Engineering, Technology and Science, 412–418.
Kalimoldayev, M. N., Mamyrbayev, O. Zh., Kydyrbekova, A. S., Mekebayev, N. O. (2020). Algorithms for Detection Gender Using Neural Networks. International journal of circuits, systems and signal processing, 14, 154–159. doi: http://doi.org/10.46300/9106.2020.14.24
Bishop, C. M. (2006). Pattern Recognition and Machine Learning (Information Science and Statistics). Secaucus: Springer-Verlag New York, Inc.
Ibrahim, N. S., Ramli, D. A. (2018). I-vector Extraction for Speaker Recognition Based on Dimensionality Reduction. Procedia Computer Science, 126, 1534–1540. doi: http://doi.org/10.1016/j.procs.2018.08.126
Lozano-Diez, A., Zazo, R., Toledano, D. T., Gonzalez-Rodriguez, J. (2017). An analysis of the influence of deep neural network (DNN) topology in bottleneck feature based language recognition. PLOS ONE, 12(8), e0182580. doi: http://doi.org/10.1371/journal.pone.0182580
Li, L., Wang, D., Zhang, X., Zheng, T. F., Jin, P. (2016). System combination for short utterance speaker recognition. 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). doi: http://doi.org/10.1109/apsipa.2016.7820903
Prince, S. J. D., Elder, J. H. (2007). Probabilistic Linear Discriminant Analysis for Inferences About Identity. 2007 IEEE 11th International Conference on Computer Vision. doi: http://doi.org/10.1109/iccv.2007.4409052
Mamyrbayev, O., Turdalyuly, M., Mekebayev, N., Alimhan, K., Kydyrbekova, A., Turdalykyzy, T. (2019). Automatic Recognition of Kazakh Speech Using Deep Neural Networks. Intelligent Information and Database Systems Proceedings, Part II, 465–474. doi: http://doi.org/10.1007/978-3-030-14802-7_40
Kydyrbekova, A., Othman, M., Mamyrbayev, O., Akhmediyarova, A., Zhumazhanov, B. (2020). Identification and authentication of user voice using DNN features and i-vector. Cogent Engineering, 7 (1), 1751557. doi: http://doi.org/10.1080/23311916.2020.1751557
Naidu, B. R., Babu, M. S. P. (2018). Biometric authentication data with three traits using compression technique, HOG, GMM and fusion technique. Data in Brief, 18, 1976–1986. doi: http://doi.org/10.1016/j.dib.2018.03.115
Richardson, F., Reynolds, D., Dehak, N. (2015). A unified deep neural network for speaker and language recognition. arXiv preprint arXiv:1504.00923.
Snyder, D., Garcia-Romero, D., Povey, D. (2015). Time delay deep neural network-based universal background models for speaker recognition. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 92–97. doi: http://doi.org/10.1109/asru.2015.7404779
D. Snyder, D., Garcia-Romero, D., Povey, D., Khudanpur, S. (2017). Deep Neural Network Embeddings for Text-Independent Speaker Verification. Interspeech 2017, 999–1003. doi: http://doi.org/10.21437/interspeech.2017-620

Development of security systems using DNN and i & x-vector classifiers

Authors

DOI:

Keywords:

Abstract

Author Biographies

Orken Mamyrbayev, Institute of Information and Computational Technologies

Aizat Kydyrbekova, Al-Farabi Kazakh National University

Keylan Alimhan, L. N. Gumilyov Eurasian National University

Dina Oralbekova, Satbayev University

Bagashar Zhumazhanov, Institute of Information and Computational Technologies

Bulbul Nuranbayeva, Caspian University

References

Downloads

Published

How to Cite

Issue

Section

License

Language

Information

Make a Submission

Developed By

Current Issue