A comparison of convolutional neural networks for Kazakh sign language recognition

Chingiz Kenshimov; Samat Mukhanov; Timur Merembayev; Didar Yedilkhan

doi:10.15587/1729-4061.2021.241535

Authors

Chingiz Kenshimov Institute of Information and Computational Technologies, Kazakhstan https://orcid.org/0000-0002-5923-4958
Samat Mukhanov International Information Technology University, Kazakhstan https://orcid.org/0000-0001-8761-4272
Timur Merembayev Institute of Information and Computational Technologies, Kazakhstan https://orcid.org/0000-0001-8185-235X
Didar Yedilkhan Institute of Information and Computational Technologies, Kazakhstan https://orcid.org/0000-0002-6343-5277

DOI:

https://doi.org/10.15587/1729-4061.2021.241535

Keywords:

hand gesture recognition, sign language recognition, convolutional neural network (CNN), deep learning

Abstract

For people with disabilities, sign language is the most important means of communication. Therefore, more and more authors of various papers and scientists around the world are proposing solutions to use intelligent hand gesture recognition systems. Such a system is aimed not only for those who wish to understand a sign language, but also speak using gesture recognition software. In this paper, a new benchmark dataset for Kazakh fingerspelling, able to train deep neural networks, is introduced. The dataset contains more than 10122 gesture samples for 42 alphabets. The alphabet has its own peculiarities as some characters are shown in motion, which may influence sign recognition.

Research and analysis of convolutional neural networks, comparison, testing, results and analysis of LeNet, AlexNet, ResNet and EffectiveNet – EfficientNetB7 methods are described in the paper. EffectiveNet architecture is state-of-the-art (SOTA) and is supposed to be a new one compared to other architectures under consideration. On this dataset, we showed that the LeNet and EffectiveNet networks outperform other competing algorithms. Moreover, EffectiveNet can achieve state-of-the-art performance on nother hand gesture datasets.

The architecture and operation principle of these algorithms reflect the effectiveness of their application in sign language recognition. The evaluation of the CNN model score is conducted by using the accuracy and penalty matrix. During training epochs, LeNet and EffectiveNet showed better results: accuracy and loss function had similar and close trends. The results of EffectiveNet were explained by the tools of the SHapley Additive exPlanations (SHAP) framework. SHAP explored the model to detect complex relationships between features in the images. Focusing on the SHAP tool may help to further improve the accuracy of the model

Supporting Agency

This research is funded by the Science Committee of the Ministry of Education and Science of the Republic of Kazakhstan (AP08053034)

Author Biographies

Chingiz Kenshimov, Institute of Information and Computational Technologies

PhD, Leading Researcher

Laboratory of Artificial Intelligence and Robotics

Samat Mukhanov, International Information Technology University

Doctoral Student, Vice-Dean of Faculty, Senior Lecturer

Faculty of Computer Technologies and Cybersecurity

Department of Computer Engineering and Information Security

Laboratory of Artificial Intelligence and Robotics

Timur Merembayev, Institute of Information and Computational Technologies

Postgraduate Student, Software-Engineer

Laboratory of Artificial Intelligence and Robotics

Didar Yedilkhan, Institute of Information and Computational Technologies

PhD, Software-Engineer

Laboratory of Artificial Intelligence and Robotics

References

Bazarevsky, V., Fan, Zh. (2019). On-device, real-time hand tracking with mediapipe. Google AI Blog. Available at: https://ai.googleblog.com/2019/08/on-device-real-time-hand-tracking-with.html
Lee, A., Cho, Y., Jin, S., Kim, N. (2020). Enhancement of surgical hand gesture recognition using a capsule network for a contactless interface in the operating room. Computer Methods and Programs in Biomedicine, 190, 105385. doi: https://doi.org/10.1016/j.cmpb.2020.105385
Bilgin, M., Mutludogan, K. (2019). American Sign Language Character Recognition with Capsule Networks. 2019 3rd International Symposium on Multidisciplinary Studies and Innovative Technologies (ISMSIT). doi: https://doi.org/10.1109/ismsit.2019.8932829
Adithya, V., Rajesh, R. (2020). A Deep Convolutional Neural Network Approach for Static Hand Gesture Recognition. Procedia Computer Science, 171, 2353–2361. doi: https://doi.org/10.1016/j.procs.2020.04.255
Mantecón, T., del-Blanco, C. R., Jaureguizar, F., García, N. (2016). Hand Gesture Recognition Using Infrared Imagery Provided by Leap Motion Controller. Lecture Notes in Computer Science, 47–57. doi: https://doi.org/10.1007/978-3-319-48680-2_5
Kumar, A., Thankachan, K., Dominic, M. M. (2016). Sign language recognition. 2016 3rd International Conference on Recent Advances in Information Technology (RAIT). doi: https://doi.org/10.1109/rait.2016.7507939
Haberdar, H., Albayrak, S. (2005). Real Time Isolated Turkish Sign Language Recognition from Video Using Hidden Markov Models with Global Features. Lecture Notes in Computer Science, 677–687. doi: https://doi.org/10.1007/11569596_70
Saykol, E., Türe, H. T., Sirvanci, A. M., Turan, M. (2016). Posture labeling based gesture classification for Turkish sign language using depth values. Kybernetes, 45 (4), 604–621. doi: https://doi.org/10.1108/k-04-2015-0107
Kudubayeva, S. A., Ryumin, D. A., Kalzhanov, M. U. (2016). The method of basis vectors for recognition sign language by using sensor KINECT. Journal of Mathematics, Mechanics and Computer Science, 91 (3), 86–96. Available at: https://bm.kaznu.kz/index.php/kaznu/article/view/541
Uskenbayeva, R. K., Mukhanov, S. B. (2020). Contour analysis of external images. Proceedings of the 6th International Conference on Engineering & MIS 2020. doi: https://doi.org/10.1145/3410352.3410811
Lecun, Y., Bottou, L., Bengio, Y., Haffner, P. (1998). Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86 (11), 2278–2324. doi: https://doi.org/10.1109/5.726791
Krizhevsky, A., Sutskever, I., Hinton, G. E. (2017). ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60 (6), 84–90. doi: https://doi.org/10.1145/3065386
He, K., Zhang, X., Ren, S., Sun, J. (2016). Deep Residual Learning for Image Recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). doi: https://doi.org/10.1109/cvpr.2016.90
Tan, M., Le, Q. V. (2019). EfficientNet: Rethinking model scaling for convolutional neural networks. Proceedings of the 36th International Conference on Machine Learning. Available at: https://proceedings.mlr.press/v97/tan19a/tan19a.pdf
Merembayev, T., Kurmangaliyev, D., Bekbauov, B., Amanbek, Y. (2021). A Comparison of Machine Learning Algorithms in Predicting Lithofacies: Case Studies from Norway and Kazakhstan. Energies, 14 (7), 1896. doi: https://doi.org/10.3390/en14071896
Lundberg, S. M., Lee, S.-I. (2017). A unified approach to interpreting model predictions. 31st Conference on Neural Information Processing Systems (NIPS 2017). Available at: https://arxiv.org/pdf/1705.07874.pdf

A comparison of convolutional neural networks for Kazakh sign language recognition

Authors

DOI:

Keywords:

Abstract

Supporting Agency

Author Biographies

Chingiz Kenshimov, Institute of Information and Computational Technologies

Samat Mukhanov, International Information Technology University

Timur Merembayev, Institute of Information and Computational Technologies

Didar Yedilkhan, Institute of Information and Computational Technologies

References

Downloads

Published

How to Cite

Issue

Section

License

Language

Information

Make a Submission

Developed By

Current Issue