Development of the method of automatic determination of the speaker gender on the basis of joint evaluation of frequency moments of basic tons and formant frequencies

Sergey Omelchenko

doi:10.15587/2312-8372.2018.134977

Authors

Sergey Omelchenko Kharkiv National University of Radio Electronics, 14, Nauky ave., Kharkiv, Ukraine, 61166, Ukraine https://orcid.org/0000-0002-3998-978X

DOI:

https://doi.org/10.15587/2312-8372.2018.134977

Keywords:

speaker gender recognition, formant-band signs, asymmetry coefficient, pitch frequency

Abstract

The object of research is the methods of recognizing the speaker gender by means of speech signals. One of the most problematic places is insufficient knowledge of the choice of signs and decisive rules. This is necessary to increase the probability of correct recognition and noise immunity of gender recognition by voice signals in conditions of interference. It is also important to simplify the implementation of algorithms for recognizing the speaker gender.

For recognition of the speaker gender, a new set of classification characteristics is selected, including the joint use of estimates of the average value of the pitch frequency, its kurtosis coefficient, estimates of the mean values of the formants and their asymmetry coefficients. In the course of the research, the method of statistical testing of the proposed algorithms on a personal computer is used. The experiments are carried out using real audio signals input from a microphone into a personal computer for both female and male representatives, and recorded as separate files. For this purpose, 10 standards of 10 words are used for each of the 5 female speakers and 5 male speakers.

Based on the results of statistical tests for an algorithm involving the joint use of estimates of the mean value of the pitch frequency, its kurtosis coefficient, estimates of the mean values of the formants and their asymmetry coefficients, an average probability of correct recognition is obtained 1. With the additional action of additive noise of the Gaussian type, white noise and the ratio of the signal/noise q=20, for such algorithm the probability of correct recognition is experimentally obtained – 0.8. For the decision algorithm, which uses only estimates of the average value of the pitch frequency and its kurtosis coefficient, an average probability of correct recognition is estimated at 0.9. This indicates more noise immunity of such algorithms.

In the future, the use of the obtained results not only for Russian and Ukrainian languages, but also for a number of foreign languages is supposed.

Author Biography

Sergey Omelchenko, Kharkiv National University of Radio Electronics, 14, Nauky ave., Kharkiv, Ukraine, 61166

PhD, Associate Professor

Department of Information Network Engineering

References

Kalyuzhnyi, A. Ya., Semenov, V. Yu. (2009). Metod identifikatsii pola diktora na osnove modelirovaniya akusticheskikh parametrov golosa gaussovymi smesyami. Akustichniy vіsnik, 12 (2), 31–38.
Scheme, E., Castillo-Guerra, E., Englehart, K., Kizhanatham, A. (2006). Practical Considerations for Real-Time Implementation of Speech-Based Gender Detection. Lecture notes in computer science, 4225, 426–436. doi: http://doi.org/10.1007/11892755_44
Sorokin, V. N., Makarov, I. S. (2008). Opredelenie pola diktora po golosu. Akusticheskiy zhurnal, 54 (4), 659–668.
Zeng, Y.-M., Wu, Z.-Y., Falk, T., Chang, W.-Y. (2006). Robust GMM-based gender classification using pitch and RASTA-PLP parameters of speech. Proceedings of the Fifth International Conference on Machine Learning and Cybernetics. Dalian, 3376–3379. doi: http://doi.org/10.1109/icmlc.2006.258497
Faek, F. (2015). Objective Gender and Age Recognition from Speech Sentences. Aro, The Scientific Journal of Koya University, 3 (2), 24–29. doi: http://doi.org/10.14500/aro.10072
Jayasankar, T., Vinothkumar, K., Vijayaselvi, A. (2017). Automatic Gender Identiﬁcation in Speech Recognition by Genetic Algorithm. Applied Mathematics & Information Sciences, 11 (3), 907–913. doi: http://doi.org/10.18576/amis/110331
Ahmad, J., Fiaz, M., Kwon, S.-I., Sodanil, M., Vo, B., Wook Baik, S. (2015). Gender Identification using MFCC for Telephone Applications – A Comparative Study. International Journal of Computer Science and Electronics Engineering, 3 (5), 351–355.
Levitan, S. I., Mishra, T., Bangalore, S. (2016). Automatic identification of gender from speech. Proceeding of Speech Prosody, 84–88. doi: http://doi.org/10.21437/speechprosody.2016-18
Yucesoy, E., Nabiyev, V. V. (2013). Gender identification of a speaker using MFCC and GMM. 2013 8th International Conference on Electrical and Electronics Engineering (ELECO). Bursa. doi: http://doi.org/10.1109/eleco.2013.6713922
Harb, H., Chen, L. (2003). Gender identification using a general audio classifier. 2003 International Conference on Multimedia and Expo. ICME ’03. Proceedings (Cat. No.03TH8698). Baltimore. doi: http://doi.org/10.1109/icme.2003.1221721
Presnyakov, I. N., Omelchenko, S. V. (2003). Pomekhoustoychivye algoritmy segmentatsii rechi v sistemakh obrabotki. Radiotekhnika, 131, 165–177.
Sorokin, V. N., Tsyplikhin, A. I. (2004). Segmentatsiya i raspoznavanie glasnykh. Informatsionnye protsessy, 4 (2), 202–220.
Presnyakov, I. N., Omelchenko, A. V., Omelchenko, S. V. (2002). Avtomaticheskoe raspoznavanie rechi kanalakh peredachi. Radioelektronika i informatika nauchno-tekhnicheskiy zhurnal, 1, 26–31.
Rabiner, L. R., Schafer, R. W. (1978). Digital Processing of Speech Signals. Pearson; US edition, 962.
Marple, S. L. (1987). Digital Spectral Analysis: With Applications/Disk,Pc/MS Dos/IBM/Pc/at. Prentice Hall Signal Processing Series, 492.
Presnyakov, I. N., Omelchenko, S. V. (2003). Avtomaticheskoe raspoznavanie razdel'nykh slov i fonem rechi. Radioelektronika i informatika, 2, 41–47.
Presnyakov, I. N., Omelchenko, S. V. (2004). Algoritmy raspoznavaniya rechi. Avtomatizirovannye sistemy upravleniya i pribory avtomatiki, 126, 136–145.

Development of the method of automatic determination of the speaker gender on the basis of joint evaluation of frequency moments of basic tons and formant frequencies

Authors

DOI:

Keywords:

Abstract

Author Biography

Sergey Omelchenko, Kharkiv National University of Radio Electronics, 14, Nauky ave., Kharkiv, Ukraine, 61166

References

Downloads

Published

How to Cite

Issue

Section

License

Information site

Language

Information

Developed By

Current Issue