Development of security systems using DNN and i & x-vector classifiers

Authors

DOI:

https://doi.org/10.15587/1729-4061.2021.239186

Keywords:

security system, voice identification, voice recognition, voice biometric, x-vector, i-vector

Abstract

The widespread use of biometric systems entails increased interest from cybercriminals aimed at developing attacks to crack them. Thus, the development of biometric identification systems must be carried out taking into account protection against these attacks. The development of new methods and algorithms for identification based on the presentation of randomly generated key features from the biometric base of user standards will help to minimize the disadvantages of the above methods of biometric identification of users. We present an implementation of a security system based on voice identification as an access control key and a verification algorithm developed using MATLAB function blocks that can authenticate a person's identity by his or her voice. Our research has shown an accuracy of 90 % for this user identification system for individual voice characteristics. It has been experimentally proven that traditional MFCCs using DNN and i and x-vector classifiers can achieve good results. The paper considers and analyzes the most well-known approaches from the literature to the problem of user identification by voice: dynamic programming methods, vector quantization, mixtures of Gaussian processes, hidden Markov model. The developed software package for biometric identification of users by voice and the method of forming the user's voice standards implemented in the complex allows reducing the number of errors in identifying users of information systems by voice by an average of 1.5 times. Our proposed system better defines voice recognition in terms of accuracy, security and complexity. The application of the results obtained will improve the security of the identification process in information systems from various attacks.

Author Biographies

Orken Mamyrbayev, Institute of Information and Computational Technologies

PhD, Associate Professor, Deputy General Director in Science

Laboratory of Computer Engineering of Intelligent Systems

Aizat Kydyrbekova, Al-Farabi Kazakh National University

Researcher

Department of Information Systems

Keylan Alimhan, L. N. Gumilyov Eurasian National University

Doctor of Science Degree in Mathematical Sciences, Professor

Department of Mathematical and Computer Modeling

Dina Oralbekova, Satbayev University

Researcher

Department of "Cybersecurity, Information Processing and Storage"

Bagashar Zhumazhanov, Institute of Information and Computational Technologies

Software Engineering

Bulbul Nuranbayeva, Caspian University

Professor, Leader of "Oil and Gas Business" Programs

 

References

  1. Mohamed, S., Martono, W. (2009). Design of fusion classifiers for voice-based access control system of building security. WRI World Congress of Informatics and Information Engineering. Los Angeles, 80–84. doi: http://doi.org/10.1109/csie.2009.983
  2. Tirumala, S. S., Shahamiri, S. R., Garhwal, A. S., Wang, R. (2017). Speaker identification features extraction methods: A systematic review. Expert Systems with Applications, 90, 250–271. doi: http://doi.org/10.1016/j.eswa.2017.08.015
  3. Zeinali, H., BabaAli, B., Hadian, H. (2018). Online signature verification using i‐vector representation. IET Biometrics, 7 (5), 405–414. doi: http://doi.org/10.1049/iet-bmt.2017.0059
  4. Bimbot, F., Bonastre, J.-F., Fredouille, C., Gravier, G., Magrin-Chagnolleau, I., Meignier, S. et. al. (2004). A Tutorial on Text-Independent Speaker Verification. EURASIP Journal on Advances in Signal Processing, 2004 (4). doi: http://doi.org/10.1155/s1110865704310024
  5. Finnian, K., Anil, A., Forth, O., van der Vloed, D. (2019). From i-vectors to x-vectors – a generational change in speaker recognition illustrated on the NFI-FRIDA database. IAFPA conference. Istanbul.
  6. Qi, D., Longmei, N., Jinfu, X. (2018). A Speech Privacy Protection Method Based on Sound Masking and Speech Corpus. Procedia Computer Science, 131, 1269–1274. doi: http://doi.org/10.1016/j.procs.2018.04.342
  7. Kelly, F., Forth, O., Kent, S., Gerlach, L., Alexander, A. (2019). Deep neural network based forensic automatic speaker recognition in VOCALISE using x-vectors. Audio Engineering Society (AES) Forensics Conference 2019. Porto.
  8. Snyder, D., Garcia-Romero, D., Sell, G., Povey, D., Khudanpur, S. (2018). X-Vectors: Robust DNN Embeddings for Speaker Recognition. 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). doi: http://doi.org/10.1109/icassp.2018.8461375
  9. Van der Vloed, D., Bouten, J., Kelly, F., and Alexander A. (2018). NFI-FRIDA – Forensically Realistic Inter-Device Audio. IAFPA 2018.
  10. Tiwari, V., Hashmi, M. F., Keskar, A., Shivaprakash, N. C. (2019). Speaker identification using multi-modal i-vector approach for varying length speech in voice interactive systems. Cognitive Systems Research, 57, 66–77. doi: http://doi.org/10.1016/j.cogsys.2018.09.028
  11. Khaikin, S., Kussul, N. N. (Ed.) (2006). Neural networks: full course. Moscow: Publishing house "Williams", 1104.
  12. Eskimez, S. E., Soufleris, P., Duan, Z., Heinzelman, W. (2018). Front-end speech enhancement for commercial speaker verification systems. Speech Communication, 99, 101–113. doi: http://doi.org/10.1016/j.specom.2018.03.008
  13. Devan, P., Khare, N. (2020). An efficient XGBoost–DNN-based classification model for network intrusion detection system. Neural Computing and Applications, 32 (16), 12499–12514. doi: http://doi.org/10.1007/s00521-020-04708-x
  14. Vapnik, V. N., Chervonenkis, A. Ia. (1974). Teoriia raspoznavaniia obrazov (statisticheskie problemy obucheniia). Moscow: Nauka, 416.
  15. Yu, H., Tan, Z.-H., Ma, Z., Martin, R., Guo, J. (2018). Spoofing Detection in Automatic Speaker Verification Systems Using DNN Classifiers and Dynamic Acoustic Features. IEEE Transactions on Neural Networks and Learning Systems, 29 (10), 4633–4644. doi: http://doi.org/10.1109/tnnls.2017.2771947
  16. Guo, J., Nookala, U. A., Alwan, A. (2017). CNN-Based Joint Mapping of Short and Long Utterance i-Vectors for Speaker Verification Using Short Utterances. Interspeech 2017. doi: http://doi.org/10.21437/interspeech.2017-430
  17. Garofolo, J. S., Lamel, L. F., Fisher, W. M., Fiscus, J. G., Pallett, D. S., Dahlgren, N. L., Zue, V. (1993). TIMIT speech data corpus. Philadelphia: Linguistic Data Consortium. doi: https://doi.org/10.35111/17gk-bn40
  18. Richardson, F., Reynolds, D., Dehak, N. (2015). Deep Neural Network Approaches to Speaker and Language Recognition. IEEE Signal Processing Letters, 22 (10), 1671–1675. doi: http://doi.org/10.1109/lsp.2015.2420092
  19. Tailor, J. H., Shah, D. B. (2017). HMM-Based Lightweight Speech Recognition System for Gujarati Language. Lecture Notes in Networks and Systems, 451–461. doi: http://doi.org/10.1007/978-981-10-3920-1_46
  20. Prasetio, B. H., Syauqy, D. (2017). Design of Speaker Verification using Dynamic Time Warping (DTW) on Graphical Programming for Authentication Process. Journal of Information Technology and Computer Science, 2 (1), 11–18. doi: http://doi.org/10.25126/jitecs.20172124
  21. Mahalakshmi P., Shayon, Ashok, S. (2015). MFCC and VQ based voice recognition security system. International Journal of Applied Engineering Research, January, 10 (59), 219–233.
  22. Krom, G. de. (1994). Consistency and Reliability of Voice Quality Ratings for Different Types of Speech Fragments. Journal of Speech, Language, and Hearing Research, 37 (5), 985–1000. doi: http://doi.org/10.1044/jshr.3705.985
  23. Campbell, J. P. (1997). Speaker recognition: a tutorial. Proceedings of the IEEE, 85 (9), 1437–1462. doi: http://doi.org/10.1109/5.628714
  24. Yu, Y., He, J., Zhu, N., Cai, F., Pathan, M. S. (2018). A new method for identity authentication using mobile terminals. Procedia Computer Science, 131, 771–778. doi: http://doi.org/10.1016/j.procs.2018.04.323
  25. Ranjan, S., Yu, C., Zhang, C., Kelly, F., Hansen, J. H. L. (2016). Language recognition using deep neural networks with very limited training data. 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). doi: http://doi.org/10.1109/icassp.2016.7472795
  26. Li, W., Fu, T., You, H., Zhu, J., Chen, N. (2016). Feature sparsity analysis for i-vector based speaker verification. Speech Communication, 80, 60–70. doi: http://doi.org/10.1016/j.specom.2016.02.008
  27. Tirumala, S. S., Shahamiri, S. R., Garhwal, A. S., Wang, R. (2017). Speaker identification features extraction methods: A systematic review. Expert Systems with Applications, 90, 250–271. doi: http://doi.org/10.1016/j.eswa.2017.08.015
  28. Shrawankar, U., Thakare, V. M. (2013). Techniques for feature extraction in speech recognition system: a comparative study. International Journal of Computer Applications in Engineering, Technology and Science, 412–418.
  29. Kalimoldayev, M. N., Mamyrbayev, O. Zh., Kydyrbekova, A. S., Mekebayev, N. O. (2020). Algorithms for Detection Gender Using Neural Networks. International journal of circuits, systems and signal processing, 14, 154–159. doi: http://doi.org/10.46300/9106.2020.14.24
  30. Bishop, C. M. (2006). Pattern Recognition and Machine Learning (Information Science and Statistics). Secaucus: Springer-Verlag New York, Inc.
  31. Ibrahim, N. S., Ramli, D. A. (2018). I-vector Extraction for Speaker Recognition Based on Dimensionality Reduction. Procedia Computer Science, 126, 1534–1540. doi: http://doi.org/10.1016/j.procs.2018.08.126
  32. Lozano-Diez, A., Zazo, R., Toledano, D. T., Gonzalez-Rodriguez, J. (2017). An analysis of the influence of deep neural network (DNN) topology in bottleneck feature based language recognition. PLOS ONE, 12(8), e0182580. doi: http://doi.org/10.1371/journal.pone.0182580
  33. Li, L., Wang, D., Zhang, X., Zheng, T. F., Jin, P. (2016). System combination for short utterance speaker recognition. 2016 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA). doi: http://doi.org/10.1109/apsipa.2016.7820903
  34. Prince, S. J. D., Elder, J. H. (2007). Probabilistic Linear Discriminant Analysis for Inferences About Identity. 2007 IEEE 11th International Conference on Computer Vision. doi: http://doi.org/10.1109/iccv.2007.4409052
  35. Mamyrbayev, O., Turdalyuly, M., Mekebayev, N., Alimhan, K., Kydyrbekova, A., Turdalykyzy, T. (2019). Automatic Recognition of Kazakh Speech Using Deep Neural Networks. Intelligent Information and Database Systems Proceedings, Part II, 465–474. doi: http://doi.org/10.1007/978-3-030-14802-7_40
  36. Kydyrbekova, A., Othman, M., Mamyrbayev, O., Akhmediyarova, A., Zhumazhanov, B. (2020). Identification and authentication of user voice using DNN features and i-vector. Cogent Engineering, 7 (1), 1751557. doi: http://doi.org/10.1080/23311916.2020.1751557
  37. Naidu, B. R., Babu, M. S. P. (2018). Biometric authentication data with three traits using compression technique, HOG, GMM and fusion technique. Data in Brief, 18, 1976–1986. doi: http://doi.org/10.1016/j.dib.2018.03.115
  38. Richardson, F., Reynolds, D., Dehak, N. (2015). A unified deep neural network for speaker and language recognition. arXiv preprint arXiv:1504.00923.
  39. Snyder, D., Garcia-Romero, D., Povey, D. (2015). Time delay deep neural network-based universal background models for speaker recognition. 2015 IEEE Workshop on Automatic Speech Recognition and Understanding (ASRU), 92–97. doi: http://doi.org/10.1109/asru.2015.7404779
  40. D. Snyder, D., Garcia-Romero, D., Povey, D., Khudanpur, S. (2017). Deep Neural Network Embeddings for Text-Independent Speaker Verification. Interspeech 2017, 999–1003. doi: http://doi.org/10.21437/interspeech.2017-620

Downloads

Published

2021-08-31

How to Cite

Mamyrbayev, O., Kydyrbekova, A., Alimhan, K., Oralbekova, D., Zhumazhanov, B., & Nuranbayeva, B. (2021). Development of security systems using DNN and i & x-vector classifiers. Eastern-European Journal of Enterprise Technologies, 4(9(112), 32–45. https://doi.org/10.15587/1729-4061.2021.239186

Issue

Section

Information and controlling system