Study of handwritten character recognition algorithms for different languages using the KAN Neural Network Model

Authors

  • A.V. Serhiienko State Higher Education Institution «Priazovskyi state technical university», Dnipro, Ukraine https://orcid.org/0000-0003-1328-2572
  • E.A. Kolomoichenko State Higher Education Institution «Priazovskyi state technical university», Dnipro, Ukraine

DOI:

https://doi.org/10.31498/2225-6733.49.1.2024.321184

Keywords:

optical character recognition, neural network, Kolmogorov-Arnold network, transformer architecture, Kolmogorov-Arnold transformer, rational functions

Abstract

The paper analyzed the most effective existing methods of optical character recognition that use deep learning neural networks in their structure. The analysis revealed that modern neural network architectures with the best recognition accuracy indicators have a constant accuracy limit. It was also found that each analyzed neural network architecture contains a multilayer perceptron in its structure. To optimize the recognition performance of neural networks, it was proposed to use the Kolmogorov-Arnold network as an alternative to multilayer perceptron based networks. The architecture of the created model is based on a two-component transformer, the first component is a visual transformer used as an encoder, the second is a language transformer used as a decoder. The Kolmogorov-Arnold network replaces the feedforward network based on a multilayer perceptron, in each transformer – encoder and decoder. Improvement of existing neural network results is ensured through transfer learning, for which group rational functions are used as the main learning elements of the Kolmogorov-Arnold network. The model was trained on sets of images of text lines from three different writing systems: alphabetic, abugida and logographic; which are represented by the scripts: English, Devanagari and Chinese. As a result of experimental studies, high character recognition rates were found for the Chinese and Devanagari data sets but low for the English script, for the model with the Kolmogorov-Arnold network. The obtained results indicate new possibilities for increasing the reliability and efficiency of modern handwriting recognition systems

Author Biographies

A.V. Serhiienko, State Higher Education Institution «Priazovskyi state technical university», Dnipro

PhD (Engineering), associate professor

E.A. Kolomoichenko , State Higher Education Institution «Priazovskyi state technical university», Dnipro

Master's student

References

Plamondon R., Srihari S. N. Online and off-line handwriting recognition: a comprehensive survey. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2000. Vol. 22. Iss. 1. Pp. 63-84. DOI: https://doi.org/10.1109/34.824821.

Handwritten Recognition Techniques: A Comprehensive Review / H.A. Alhamad et al. Symmetry. 2024. Vol. 16. Article 681. DOI: https://doi.org/10.3390/sym16060681.

A Deep Learning Approach for Robust, Multi-oriented, and Curved Text Detection / R. Ranjbarzadeh et al. Cognitive Computation. 2024. Vol. 16. Pp. 1979-1991. DOI: https://doi.org/10.1007/s12559-022-10072-w.

Convolutional-Neural-Network-Based Handwritten Character Recognition: An Approach with Massive Multisource Data / Saqib N., Haque K.F., Yanambaka V.P., Abdelgawad A. Algorithms. 2022. Vol. 15. Article 129. DOI: https://doi.org/10.3390/a15040129.

Alzubi J., Anand N., Akshi K. Machine learning from theory to algorithms: an overview. Journal of physics: conference series. 2018. Vol. 1142. Pp. 1-15. DOI: https://doi.org/10.1088/1742-6596/1142/1/012012.

Zeiler M.D., Fergus R. Visualizing and understanding convolutional networks. Computer Vision – ECCV 2014: Proceedings of the 13th European Conference, Zurich, Switzerland, 6-12 September 2014. Pp. 818-833. DOI: https://doi.org/10.1007/978-3-319-10590-1_53.

Schilling F. The effect of batch normalization on deep convolutional neural networks : thesis of Master of Science – Computer Science. Stockholm, 2016. 102 p.

Laptev D., Buhmann J. M. Transformation-invariant convolutional jungles. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 07-12 June 2015. 2015. Pp. 3043-3051. DOI: https://doi.org/10.1109/CVPR.2015.7298923.

Beyond exploding and vanishing gradients: analysing RNN training using attractors and smoothness / Ribeiro A. H., Tiels K., Aguirre L. A., Schön T. Proceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics, Palermo, Italy, 26-28 August 2020. Vol. 108. Pp. 2370-2380. DOI: https://doi.org/10.48550/arXiv.1906.08482.

Hochreiter S., Schmidhuber J. Long Short-Term Memory. Neural Computation. 1997. Vol. 9(8). Pp. 1735-1780. DOI: https://doi.org/10.1162/neco.1997.9.8.1735.

End-to-end sequence labeling via convolutional recurrent neural network with a connectionist temporal classification layer / X. Huang et al. International Journal of Computational Intelligence Systems. 2020. Vol. 13. Iss. 1. Pp. 341-351. DOI: https://doi.org/10.2991/ijcis.d.200316.001.

HTR-VT: Handwritten text recognition with vision transformer / Li Y., Chen D., Tang T., Shen X. Pattern Recognition. 2024. Vol. 158. Article 110967. DOI: https://doi.org/10.1016/j.patcog.2024.110967.

A light transformer-based architecture for handwritten text recognition / Barrere K., Soullard Y., Lemaitre A., Coüasnon B. International Workshop on Document Analysis Systems. 2022. Pp. 275-290. DOI: https://doi.org/10.1007/978-3-031-06555-2_19.

Attention is all you need / A. Vaswani et al. NIPS 2017 : 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 5-10 December 2017. Pp. 1-11. DOI: https://doi.org/10.48550/arXiv.1706.03762.

Swin transformer: Hierarchical vision transformer using shifted windows / L. Ze et al. Proceedings of the IEEE/CVF International conference on computer vision, Las Vegas, Nevada, USA, 10-17 Oct. 2021. Pp. 9992-10002. DOI: https://doi.org/10.1109/ICCV48922.2021.00986.

Mlp-mixer: An all-mlp architecture for vision / I.O. Tolstikhin et al. NIPS'21 : Proceedings of the 35th International Conference on Neural Information Processing Systems 6-14 December 2021. Pp. 24261-24272. DOI: https://doi.org/10.48550/arXiv.2105.01601.

Kan: Kolmogorov-Arnold networks / L. Ziming et al. 2024. Pp. 1-50. DOI: https://doi.org/10.48550/arXiv.2404.19756.

Yang X., Wang X. Kolmogorov-Arnold Transformer. 2024. Pp. 1-19. DOI: https://doi.org/10.48550/arXiv.2409.10594.

Molina A., Schramowski P., Kersting K. Padé Activation Units: End-to-end Learning of Flexible Activation Functions in Deep Networks. International Conference on Learning Representations, 26 April - 26 May 2020. Pp. 1-17. DOI: https://doi.org/10.48550/arXiv.1907.06732.

Online and offline handwritten Chinese character recognition: benchmarking on new databases / Liu C.-L., Yin F., Wang D.-H., Wang Q.-F. Pattern Recognition. 2013. Vol. 46. Iss. 1. Pp. 155-162. DOI: https://doi.org/10.1016/j.patcog.2012.06.021.

Offline Handwriting Recognition on Devanagari using a new Benchmark Dataset / Dutta K., Krishnan P., Mathew M., Jawahar C. V. 13th IAPR International Workshop on Document Analysis Systems (DAS), Vancouver, Canada, 24-27 April 2018. Pp. 25-30. DOI: https://doi.org/10.1109/DAS.2018.69.

Marti U-V., Bunke H. The IAM-database: an English sentence database for offline handwriting recognition. International journal on document analysis and recognition. 2002. Vol. 5. Pp. 39-46. DOI: https://doi.org/10.1007/s100320200071.

Fujitake M. Dtrocr: Decoder-only transformer for optical character recognition. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Waikoloa, HI, USA, 03-08 January 2024. Pp. 8010-8020. DOI: https://doi.org/10.1109/WACV57701.2024.00784.

Published

2024-12-26

How to Cite

Serhiienko, A. ., & Kolomoichenko , E. . (2024). Study of handwritten character recognition algorithms for different languages using the KAN Neural Network Model. Reporter of the Priazovskyi State Technical University. Section: Technical Sciences, 1(49), 36–47. https://doi.org/10.31498/2225-6733.49.1.2024.321184