Development of the method of automatic determination of the speaker gender on the basis of joint evaluation of frequency moments of basic tons and formant frequencies
Keywords:speaker gender recognition, formant-band signs, asymmetry coefficient, pitch frequency
The object of research is the methods of recognizing the speaker gender by means of speech signals. One of the most problematic places is insufficient knowledge of the choice of signs and decisive rules. This is necessary to increase the probability of correct recognition and noise immunity of gender recognition by voice signals in conditions of interference. It is also important to simplify the implementation of algorithms for recognizing the speaker gender.
For recognition of the speaker gender, a new set of classification characteristics is selected, including the joint use of estimates of the average value of the pitch frequency, its kurtosis coefficient, estimates of the mean values of the formants and their asymmetry coefficients. In the course of the research, the method of statistical testing of the proposed algorithms on a personal computer is used. The experiments are carried out using real audio signals input from a microphone into a personal computer for both female and male representatives, and recorded as separate files. For this purpose, 10 standards of 10 words are used for each of the 5 female speakers and 5 male speakers.
Based on the results of statistical tests for an algorithm involving the joint use of estimates of the mean value of the pitch frequency, its kurtosis coefficient, estimates of the mean values of the formants and their asymmetry coefficients, an average probability of correct recognition is obtained 1. With the additional action of additive noise of the Gaussian type, white noise and the ratio of the signal/noise q=20, for such algorithm the probability of correct recognition is experimentally obtained – 0.8. For the decision algorithm, which uses only estimates of the average value of the pitch frequency and its kurtosis coefficient, an average probability of correct recognition is estimated at 0.9. This indicates more noise immunity of such algorithms.
In the future, the use of the obtained results not only for Russian and Ukrainian languages, but also for a number of foreign languages is supposed.
- Kalyuzhnyi, A. Ya., Semenov, V. Yu. (2009). Metod identifikatsii pola diktora na osnove modelirovaniya akusticheskikh parametrov golosa gaussovymi smesyami. Akustichniy vіsnik, 12 (2), 31–38.
- Scheme, E., Castillo-Guerra, E., Englehart, K., Kizhanatham, A. (2006). Practical Considerations for Real-Time Implementation of Speech-Based Gender Detection. Lecture notes in computer science, 4225, 426–436. doi: http://doi.org/10.1007/11892755_44
- Sorokin, V. N., Makarov, I. S. (2008). Opredelenie pola diktora po golosu. Akusticheskiy zhurnal, 54 (4), 659–668.
- Zeng, Y.-M., Wu, Z.-Y., Falk, T., Chang, W.-Y. (2006). Robust GMM-based gender classification using pitch and RASTA-PLP parameters of speech. Proceedings of the Fifth International Conference on Machine Learning and Cybernetics. Dalian, 3376–3379. doi: http://doi.org/10.1109/icmlc.2006.258497
- Faek, F. (2015). Objective Gender and Age Recognition from Speech Sentences. Aro, The Scientific Journal of Koya University, 3 (2), 24–29. doi: http://doi.org/10.14500/aro.10072
- Jayasankar, T., Vinothkumar, K., Vijayaselvi, A. (2017). Automatic Gender Identiﬁcation in Speech Recognition by Genetic Algorithm. Applied Mathematics & Information Sciences, 11 (3), 907–913. doi: http://doi.org/10.18576/amis/110331
- Ahmad, J., Fiaz, M., Kwon, S.-I., Sodanil, M., Vo, B., Wook Baik, S. (2015). Gender Identification using MFCC for Telephone Applications – A Comparative Study. International Journal of Computer Science and Electronics Engineering, 3 (5), 351–355.
- Levitan, S. I., Mishra, T., Bangalore, S. (2016). Automatic identification of gender from speech. Proceeding of Speech Prosody, 84–88. doi: http://doi.org/10.21437/speechprosody.2016-18
- Yucesoy, E., Nabiyev, V. V. (2013). Gender identification of a speaker using MFCC and GMM. 2013 8th International Conference on Electrical and Electronics Engineering (ELECO). Bursa. doi: http://doi.org/10.1109/eleco.2013.6713922
- Harb, H., Chen, L. (2003). Gender identification using a general audio classifier. 2003 International Conference on Multimedia and Expo. ICME ’03. Proceedings (Cat. No.03TH8698). Baltimore. doi: http://doi.org/10.1109/icme.2003.1221721
- Presnyakov, I. N., Omelchenko, S. V. (2003). Pomekhoustoychivye algoritmy segmentatsii rechi v sistemakh obrabotki. Radiotekhnika, 131, 165–177.
- Sorokin, V. N., Tsyplikhin, A. I. (2004). Segmentatsiya i raspoznavanie glasnykh. Informatsionnye protsessy, 4 (2), 202–220.
- Presnyakov, I. N., Omelchenko, A. V., Omelchenko, S. V. (2002). Avtomaticheskoe raspoznavanie rechi kanalakh peredachi. Radioelektronika i informatika nauchno-tekhnicheskiy zhurnal, 1, 26–31.
- Rabiner, L. R., Schafer, R. W. (1978). Digital Processing of Speech Signals. Pearson; US edition, 962.
- Marple, S. L. (1987). Digital Spectral Analysis: With Applications/Disk,Pc/MS Dos/IBM/Pc/at. Prentice Hall Signal Processing Series, 492.
- Presnyakov, I. N., Omelchenko, S. V. (2003). Avtomaticheskoe raspoznavanie razdel'nykh slov i fonem rechi. Radioelektronika i informatika, 2, 41–47.
- Presnyakov, I. N., Omelchenko, S. V. (2004). Algoritmy raspoznavaniya rechi. Avtomatizirovannye sistemy upravleniya i pribory avtomatiki, 126, 136–145.
How to Cite
Copyright (c) 2018 Sergey Omelchenko
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.