Organization of software and neural network algorithms for machine analysis of textual data presented in natural language

Authors

DOI:

https://doi.org/10.30837/2522-9818.2025.2.151

Keywords:

computer-aided analysis; text data; natural language; software algorithms; neural network architecture.

Abstract

The article addresses the problem of organizing software and neural network algorithms for natural language processing. Given the rapid growth of information flows and the limitations of computational resources, optimizing methods for processing short, unstructured messages and complex structured documents has become especially urgent. This study aims to develop a comprehensive method for organizing computer-aided analysis of text data, ensuring a balance between the accuracy of results and the efficiency of computational resource use. The research systematically examines modern approaches to tokenization, clustering, semantic-relevant search, and deep learning architectures, with particular attention to their adaptability under resource-constrained conditions. A multilevel methodology is suggested, combining the preliminary classification of text arrays, semantic clustering, and the use of a Bidirectional LSTM neural network model. Practical implementation of the method was tested through an automated text analysis application, demonstrating stable reduction in the loss function and acceptable resource consumption. The ability to adapt to different types of text data, reduced resource consumption while maintaining high-quality analysis, and suitability for deployment in environments with low computing capacity should be considered the main advantages of the developed approach. The scientific novelty of the article is substantiated by the integration of semantic-relevant clustering and lightweight deep learning techniques into a single optimized framework. The practical value of the study is the possibility of applying the proposed methodology in real-world information systems where limited hardware capabilities require efficient and adaptive text processing solutions.

Author Biographies

Viacheslav Shkurko, Kharkiv National University of Radio Electronics

Postgraduate Student Department of Applied Mathematics (AM)

Andrii Poliakov, Simon Kuznets Kharkiv National University of Economics

PhD (Engineering Sciences), Associate Professor, Simon Kuznets Kharkiv National University of Economics, Associated Professor at the Department of Information Systems; Kharkiv National University of Radio Electronics, Associated Professor at the Department of Applied Mathematics

References

Список літератури

Petrov V.V., Zichun L., Kryuchyn A.A., Shanoylo S.M., Mingle F., Beliak I.V., Manko D.Y., Lapchuk A.S., Morozov E.M. Long-term storage of digital information. Akademperiodyka, Kyiv. 148 р. 2018. DOI: https://doi.org/10.15407/akademperiodyka. 360.148

Giordano V., Spada I., Chiarello F., Fantoni G. The impact of ChatGPT on human skills: A quantitative study on Twitter data. Technological Forecasting and Social Change. 2024. No. 203. 124 р. DOI: https://doi.org/10.1016/j.techfore.2024.123389

Dahri N. A., Extended Tam based acceptance of ai-powered ChatGPT for supporting metacognitive self-regulated learning in education: A mixed-methods study / Dahri N. A., Yahaya N., Al-Rahmi W. M., Aldraiweesh A., Alturki U., Almutairy S., Shutaleva A., Soomro R. B. Heliyon. 2024. No. 10(8). DOI: https://doi.org/10.1016/j.heliyon.2024.e29317

Malhotra A., Bajaj K. A hybrid pattern based text mining approach for malware detection using DBScan. CSI Transactions on ICT, 4 (2-4), 2016. Р. 141-149. DOI: https://doi.org/10. 1007/s40012-016-0095-y

The Trustees of Princeton University. What is WordNet? Princeton University. Retrieved January 12, 2022, URL: https://wordnet.princeton.edu.

Marcos T. Efficient Methods for Natural Language Processing: A Survey. / Marcos Treviso, Ji-Ung Lee, Tianchu Ji. et al. Transactions of the Association for Computational Linguistics, 11. 2023. Р. 826–860. DOI: 10.1162/tacl_a_00577

Zhao W. X., A Survey of Large Language Models. / Zhao W. X., Zhou, K., Li, J. et al. Computation and Language. 144 p. 2023. DOI: https://doi.org/10.48550/arXiv.2303.18223

Lialin V., Deshpande V., Rumshisky A. Scaling down to scale up: A guide to parameter-efficient fine-tuning. Computation and Language. 2023. DOI: https://doi.org/10.48550/arXiv.2303.15647

Yang J. A Survey of Knowledge Enhanced Pre-trained Models. / Yang J., Xiao G., Shen Y., Jiang W., Hu X., Zhang Y., Peng, J. et al. Computation and Language. 2021. 32 p. https://doi.org/10.48550/arXiv.2110.00269

Fournie Q., Caron G. M., Aloise D. A Practical Survey on Faster and Lighter Transformers. ACM Computing Surveys, 55(14s), Р. 1-40. 2023. DOI: https://doi.org/10.1145/3586074

Zhang X. Edge intelligence optimization for large language model inference with batching and quantization. / Zhang X., Liu J., Xiong Z., Huang Y., Xie G., Zhang R. et al. IEEE Wireless Communications and Networking Conference (WCNC). 2024. DOI: https://doi.org/10. 1109/wcnc57260.2024.10571127

Wei X. Knowledge Enhanced Pretrained Language Models: A Compreshensive Survey. / Wei X., Wang S., Zhang D., Bhatia P., Arnold A.O. et al. Computation and Language. 2021. DOI: https://doi.org/10.48550/arXiv.2110.08455

Nagamatsu N., Hara-Azumi Y. Dynamic split computing-aware mixed-precision quantization for efficient deep edge intelligence. IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). 2023. DOI: https://doi.org/10.1109/trustcom60117.2023.00355

Bao Y., Xu Y., Xiong H. Feature map alignment: Towards efficient design of mixed-precision quantization scheme. IEEE Visual Communications and Image Processing (VCIP). 2019. DOI: https://doi.org/10.1109/vcip47243.2019.8965724

Shamaeva I., Galley D. Simple and advanced Google Search. Custom Search – Discover More: Р. 7–27. 2021. DOI: https://doi.org/10.1201/9781003 100133-2

Vo N.P., Popescu O. A multi-layer system for semantic textual similarity. Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management. Р. 56-67. 2016. DOI: https://doi.org/10.5220/0006045800560067

Yıldız E., Findik Y. Question similarity detection in Turkish using semantic textual similarity methods. 2019 27th Signal Processing and Communications Applications Conference (SIU). 2019. DOI: https://doi.org/10.1109/siu. 2019.8806308

Nel W., de Wet L., Schall R. Randomised controlled trial of the usability of major search engines (Google, Yahoo! and Bing) when using ambiguous search queries. Proceedings of the 4th International Conference on Computer-Human Interaction Research and Applications. Р. 152-161. 2020. DOI: https://doi.org/10. 5220/0010133601520161

Oladipo F. O., Ohiani A. B.A. Text summarization system: An extractive approach using hierarchical text clustering. International Journal of Computer Applications, 174 (23), Р. 15–19. 2021. DOI: https://doi.org/10.5120/ijca202192 1015

Bindal A. Pathak A. A survey on K-means clustering and web-text mining. International Journal of Science and Research (IJSR), 5(4), Р. 1049–1052. 2016. DOI: https://doi.org/10.21275/v5i4.nov162776

Shaposhnikov A. I. Feature-vector for the meanshift. Proceedings of Tomsk State University of Control Systems and Radioelectronics, 24(2), Р. 34–38. 2021. DOI: https://doi.org/10.21293/ 1818-0442-2021-24-2-34-38

Tingting S. Application and research of DBSCAN optimization algorithm in big data analysis of experimental text. Computer Science and Application, 10 (05), Р. 906-913. 2020. DOI: https://doi.org/10.12677/csa.2020.105093

Otani N. Pre tokenization of multi-word expressions in cross-lingual word embeddings / Otani N., Ozaki S., Zhao X., Li Y., St Johns M., Levin L. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Р. 4451–4464. 2020. DOI: https://doi.org/10.18653/v1/2020.emnlp-main.360

Ribeiro E., Ribeiro R., de Matos D. A multilingual and Multidomain Study on Dialog Act recognition using character-level tokenization. Information, 10(3), 94 р. 2019. DOI: https://doi.org/10.3390/info10030094

Slimane F., Margner V. A new text-independent GMM writer identification system applied to Arabic handwriting. International Conference on Frontiers in Handwriting Recognition. 13 р. 2014. DOI: https://doi.org/10.1109/ icfhr.2014.124

Belogorskaya D. V. Summarizing news texts using quantitative methods (TF-IDF). Proceedings of the VII (XXI) International Scientific and Practical Conference of Young Scientists. 2020. DOI: https://doi.org/10.17223/978-5-94621-901-3-2020-30

Choi E.A., Han Y.E., Lee S., Oh M. A comparison of TF and TF-IDF analysis for trends of Blockchain in health and Welfare. Journal of the Korean Data And Information Science Society, 30(5), Р. 1025–1036. 2019. DOI: https://doi.org/10.7465/jkdi.2019.30.5.1025

Ghawi R., Pfeffer J. Efficient hyperparameter tuning with grid search for text categorization using KNN approach with BM25 similarity. Open Computer Science, 9(1), Р. 160–180. 2019. DOI: https://doi.org/10.1515/comp-2019-0011

Tinega G. A., Mwangi P. W., Rimiru D. R. Text mining in digital libraries using okapi BM25 model. International Journal of Computer Applications Technology and Research, 7(10), Р. 398–406. 2018. DOI: https://doi.org/10.7753/ijcatr0710.1003

Ma X., Hovy E. End-to-end sequence labeling via bi-directional LSTM-cnns-CRF. Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). P. 1064–1074. 2016. DOI: https://doi.org/10.18653/v1/p16-1101

Khalil F., Pipa P. D. Transforming the generative pretrained transformer into augmented business text writer. CC BY 4.0. 2021. DOI: https://doi.org/10.21203/rs.3.rs-1170589/v1

Soyalp G., Alar A., Ozkanli K., Yildiz B. Improving text classification with Transformer. 2021 6th International Conference on Computer Science and Engineering (UBMK). 12 р. 2021. DOI: https://doi.org/10.1109/ ubmk52708.2021.9558906

Shen Y., Liu J. Comparison of text sentiment analysis based on Bert and word2vec. 2021 IEEE 3rd International Conference on Frontiers Technology of Information and Computer (ICFTIC). 17 р. 2021. DOI: https://doi.org/10.1109/icftic54370.2021.9647258

Pylkkönen J., Ukkonen A., Kilpikoski J., Tamminen S., Heikinheimo H. Fast text-only domain adaptation of RNN-Transducer Prediction Network. Interspeech. 2021. DOI: https://doi.org/10.21437/interspeech.2021-1191

Nismi Mol E. A., Santosh Kumar M. B. Study on impact of RNN, CNN and Han in text classification. 2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA). 2020. DOI: https://doi.org/10.1109/accthpa49271.2020. 9213231

Zouzou A., Azami I. E. Text sentiment analysis with CNN & GRU model using glove. 2021 Fifth International Conference On Intelligent Computing in Data Sciences (ICDS). 2021. DOI: https://doi.org/10.1109/icds53782. 2021.9626715

Park P.W. Text-CNN based intent classification method for automatic input of intent sentences in chatbot. The Journal of Korean Institute of Information Technology, 18 (1), Р. 19–25. 2020. DOI: https://doi.org/10.14801/jkiit.2020. 18.1.19

Sun S., Gao Z., Huang C., Yu H. Glove-FRCNN: Comprehensive network algorithm for Vespa Mandarinia image-text extraction and classification. 2021 International Conference on Communications, Information System and Computer Engineering (CISCE). 2021. DOI: https://doi.org/10.1109/ cisce52179.2021.9445902

Pan N., Yao W., Li X. Friends recommendation based on KBERT-CNN Text Classification Model. International Joint Conference on Neural Networks (IJCNN). 2021. DOI: https://doi.org/10.1109/ijcnn52387.2021. 9533618

References

Petrov, V. V., Zichun, L., Kryuchyn, A. A., Shanoylo, S. M., Mingle, F., Beliak, I. V., Manko, D. Y., Lapchuk, A. S., Morozov, E. M. (2018), Long-term storage of digital information. Akademperiodyka, Kyiv. 148 р. DOI: https://doi.org/10.15407/akademperiodyka. 360.148

Giordano, V., Spada, I., Chiarello, F., Fantoni, G. (2024), "The impact of ChatGPT on human skills: A quantitative study on Twitter data". Technological Forecasting and Social Change. No. 203. 124 р. DOI: https://doi.org/10.1016/j.techfore.2024.123389

Dahri, N. A., (2024), "Extended Tam based acceptance of ai-powered ChatGPT for supporting metacognitive self-regulated learning in education: A mixed-methods study" / Dahri, N. A., Yahaya, N., Al-Rahmi, W. M., Aldraiweesh, A., Alturki, U., Almutairy, S., Shutaleva, A., Soomro, R. B. Heliyon. 2024. No. 10(8). DOI: https://doi.org/10.1016/j.heliyon.2024.e29317

Malhotra, A., Bajaj, K. (2016), "A hybrid pattern based text mining approach for malware detection using DBScan". CSI Transactions on ICT, 4 (2-4), Р. 141-149. DOI: https://doi.org/10. 1007/s40012-016-0095-y

"The Trustees of Princeton University. What is WordNet? Princeton University". Retrieved January 12, 2022, available at: https://wordnet.princeton.edu.

Marcos, T. (2023), "Efficient Methods for Natural Language Processing: A Survey". / Marcos Treviso, Ji-Ung Lee, Tianchu Ji. et al. Transactions of the Association for Computational Linguistics, 11. Р. 826–860. DOI: 10.1162/tacl_a_00577

Zhao, W. X.,(2023), "A Survey of Large Language Models". / Zhao W. X., Zhou, K., Li, J. et al. Computation and Language. 144 p. DOI: https://doi.org/10.48550/arXiv.2303.18223

Lialin, V., Deshpande, V., Rumshisky, A. (2023), "Scaling down to scale up: A guide to parameter-efficient fine-tuning. Computation and Language. DOI: https://doi.org/10.48550/arXiv.2303.15647

Yang, J. (2021), "A Survey of Knowledge Enhanced Pre-trained Models". / Yang J., Xiao G., Shen Y., Jiang W., Hu X., Zhang Y., Peng, J. et al. Computation and Language. 32 p. https://doi.org/10.48550/arXiv.2110.00269

Fournie, Q., Caron, G. M., Aloise, D. (2023), "A Practical Survey on Faster and Lighter Transformers. ACM Computing Surveys, 55(14s), Р. 1-40. DOI: https://doi.org/10.1145/3586074

Zhang, X. (2024), "Edge intelligence optimization for large language model inference with batching and quantization". / Zhang X., Liu J., Xiong Z., Huang Y., Xie G., Zhang R. et al. IEEE Wireless Communications and Networking Conference (WCNC). DOI: https://doi.org/10. 1109/wcnc57260.2024.10571127

Wei, X. (2021), "Knowledge Enhanced Pretrained Language Models: A Compreshensive Survey". / Wei X., Wang S., Zhang D., Bhatia P., Arnold A.O. et al. Computation and Language. DOI: https://doi.org/10.48550/arXiv.2110.08455

Nagamatsu, N., Hara-Azumi, Y. (2023), "Dynamic split computing-aware mixed-precision quantization for efficient deep edge intelligence". IEEE 22nd International Conference on Trust, Security and Privacy in Computing and Communications (TrustCom). DOI: https://doi.org/10.1109/trustcom60117.2023.00355

Bao, Y., Xu, Y., Xiong, H. (2019), "Feature map alignment: Towards efficient design of mixed-precision quantization scheme". IEEE Visual Communications and Image Processing (VCIP). DOI: https://doi.org/10.1109/vcip47243.2019.8965724

Shamaeva, I., Galley, D. (2021), "Simple and advanced Google Search". Custom Search – Discover More: Р. 7–27. DOI: https://doi.org/10.1201/9781003 100133-2

Vo, N.P., Popescu, O. (2016), "A multi-layer system for semantic textual similarity". Proceedings of the 8th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management. Р. 56-67. DOI: https://doi.org/10.5220/0006045800560067

Yıldız, E., Findik, Y. (2019), "Question similarity detection in Turkish using semantic textual similarity methods". 2019 27th Signal Processing and Communications Applications Conference (SIU). DOI: https://doi.org/10.1109/siu. 2019.8806308

Nel, W., de Wet, L., Schall, R. (2020), "Randomised controlled trial of the usability of major search engines (Google, Yahoo! and Bing) when using ambiguous search queries". Proceedings of the 4th International Conference on Computer-Human Interaction Research and Applications. Р. 152-161. DOI: https://doi.org/10. 5220/0010133601520161

Oladipo, F. O., Ohiani, A. B.A. (2021), "Text summarization system: An extractive approach using hierarchical text clustering". International Journal of Computer Applications, 174 (23), Р. 15–19. DOI: https://doi.org/10.5120/ijca202192 1015

Bindal, A. Pathak, A. (2016), "A survey on K-means clustering and web-text mining". International Journal of Science and Research (IJSR), 5(4), Р. 1049–1052. DOI: https://doi.org/10.21275/v5i4.nov162776

Shaposhnikov, A. I. (2021), "Feature-vector for the meanshift". Proceedings of Tomsk State University of Control Systems and Radioelectronics, 24(2), Р. 34–38. DOI: https://doi.org/10.21293/ 1818-0442-2021-24-2-34-38

Tingting, S. (2020), "Application and research of DBSCAN optimization algorithm in big data analysis of experimental text". Computer Science and Application, 10 (05), Р. 906-913. DOI: https://doi.org/10.12677/csa.2020.105093

Otani, N. (2020), "Pre tokenization of multi-word expressions in cross-lingual word embeddings" / Otani N., Ozaki S., Zhao X., Li Y., St Johns M., Levin L. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). Р. 4451–4464. DOI: https://doi.org/10.18653/v1/2020.emnlp-main.360

Ribeiro, E., Ribeiro, R., de Matos, D. (2019), "A multilingual and Multidomain Study on Dialog Act recognition using character-level tokenization". Information, 10(3), 94 р. DOI: https://doi.org/10.3390/info10030094

Slimane, F., Margner, V.(2014), "A new text-independent GMM writer identification system applied to Arabic handwriting". International Conference on Frontiers in Handwriting Recognition. 13 р. DOI: https://doi.org/10.1109/ icfhr.2014.124

Belogorskaya, D. V. (2020), "Summarizing news texts using quantitative methods (TF-IDF). Proceedings of the VII (XXI) International Scientific and Practical Conference of Young Scientists. DOI: https://doi.org/10.17223/978-5-94621-901-3-2020-30

Choi, E.A., Han, Y.E., Lee, S., Oh, M. (2019), "A comparison of TF and TF-IDF analysis for trends of Blockchain in health and Welfare". Journal of the Korean Data And Information Science Society, 30(5), Р. 1025–1036. DOI: https://doi.org/10.7465/jkdi.2019.30.5.1025

Ghawi, R., Pfeffer, J. (2019), "Efficient hyperparameter tuning with grid search for text categorization using KNN approach with BM25 similarity". Open Computer Science, 9(1), Р. 160–180. DOI: https://doi.org/10.1515/comp-2019-0011

Tinega, G. A., Mwangi, P. W., Rimiru, D. R. (2018), "Text mining in digital libraries using okapi BM25 model". International Journal of Computer Applications Technology and Research, 7(10), Р. 398–406. DOI: https://doi.org/10.7753/ijcatr0710.1003

Ma, X., Hovy, E. (2016), "End-to-end sequence labeling via bi-directional LSTM-cnns-CRF". Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). P. 1064–1074. DOI: https://doi.org/10.18653/v1/p16-1101

Khalil, F., Pipa, P. D. (2021), "Transforming the generative pretrained transformer into augmented business text writer".

CC BY 4.0. DOI: https://doi.org/10.21203/rs.3.rs-1170589/v1

Soyalp, G., Alar, A., Ozkanli, K., Yildiz, B. (2021), "Improving text classification with Transformer". 2021 6th International Conference on Computer Science and Engineering (UBMK). 12 р. DOI: https://doi.org/10.1109/ ubmk52708.2021.9558906

Shen, Y., Liu J. (2021), "Comparison of text sentiment analysis based on Bert and word2vec". 2021 IEEE 3rd International Conference on Frontiers Technology of Information and Computer (ICFTIC). 17 р. DOI: https://doi.org/10.1109/icftic54370.2021.9647258

Pylkkönen, J., Ukkonen, A., Kilpikoski, J., Tamminen, S., Heikinheimo, H. (2021), "Fast text-only domain adaptation of RNN-Transducer Prediction Network". Interspeech. DOI: https://doi.org/10.21437/interspeech.2021-1191

Nismi, Mol, E. A., Santosh, Kumar M. B. (2020), "Study on impact of RNN, CNN and Han in text classification". 2020 Advanced Computing and Communication Technologies for High Performance Applications (ACCTHPA). DOI: https://doi.org/10.1109/accthpa49271.2020. 9213231

Zouzou, A., Azami, I. E. (2021), "Text sentiment analysis with CNN & GRU model using glove". 2021 Fifth International Conference On Intelligent Computing in Data Sciences (ICDS). DOI: https://doi.org/10.1109/icds53782. 2021.9626715

Park, P.W. (2020), "Text-CNN based intent classification method for automatic input of intent sentences in chatbot". The Journal of Korean Institute of Information Technology, 18 (1), Р. 19–25. DOI: https://doi.org/10.14801/jkiit.2020. 18.1.19

Sun, S., Gao, Z., Huang, C., Yu, H. (2021), "Glove-FRCNN: Comprehensive network algorithm for Vespa Mandarinia image-text extraction and classification". 2021 International Conference on Communications, Information System and Computer Engineering (CISCE). DOI: https://doi.org/10.1109/ cisce52179.2021.9445902

Pan, N., Yao, W., Li, X. (2021), "Friends recommendation based on KBERT-CNN Text Classification Model". International Joint Conference on Neural Networks (IJCNN). 2021. DOI: https://doi.org/10.1109/ijcnn52387.2021. 9533618

Published

2025-07-08

How to Cite

Shkurko, V., & Poliakov, A. (2025). Organization of software and neural network algorithms for machine analysis of textual data presented in natural language. INNOVATIVE TECHNOLOGIES AND SCIENTIFIC SOLUTIONS FOR INDUSTRIES, (2(32), 151–167. https://doi.org/10.30837/2522-9818.2025.2.151