Determining the effectiveness of GPT-4.1-mini for multiclass text categorization
DOI:
https://doi.org/10.15587/1729-4061.2025.340492Keywords:
data analysis, large language model, GPT-4.1-mini, text categorization, multilingual evaluation, the Ukrainian languageAbstract
The object of this study is the process of multiclass automatic categorization of user queries using large language models under the conditions of a language transition from English to Ukrainian.
The scientific task relates to the fact that most modern large language models (LLMs) are optimized for English while their effectiveness for morphologically complex and low-resource languages, particularly Ukrainian, remains insufficiently studied.
In this work, an experimental approach was devised and implemented to evaluate the transferability of the GPT-4.1-mini model from English to Ukrainian in the task to categorize 11,047 user queries spanning nine applied domains. The analysis employed conventional metrics (Recall, Precision, Weighted-F1, Macro-F1) alongside a novel indicator, the Uncertainty/Error Rate (U/E), which captures the proportion of model refusals and “hallucinations.”
The findings demonstrate that the highest quality was achieved on the English dataset (Macro-F1 = 69.78%, U/E = 0.05%). When Ukrainian prompts were applied, Macro-F1 decreased to 63.73%; however, the U/E equaled 0%, indicating higher reliability of responses. Using English prompts with Ukrainian-language data preserved nearly the same level of accuracy (Macro-F1 = 69.66%), thereby revealing strong internal translation and generalization mechanisms.
The novelty of this study is attributed to the use of a large multidomain parallel corpus, the systematic comparison of prompts in two languages, the application of the state-of-the-art model GPT-4.1-mini, and the introduction of the U/E metric as a reliability criterion. The proposed approach demonstrates the feasibility of applying GPT-4.1-mini to Ukrainian-language information services without additional training, particularly for automatic query routing in financial, medical, legal, and other domains.
References
- Huang, J., Xu, Y., Wang, Q., Wang, Q. (Cheems), Liang, X., Wang, F. et al. (2025). Foundation models and intelligent decision-making: Progress, challenges, and perspectives. The Innovation, 6 (6), 100948. https://doi.org/10.1016/j.xinn.2025.100948
- Doddapaneni, S., Ramesh, G., Khapra, M., Kunchukuttan, A., Kumar, P. (2025). A Primer on Pretrained Multilingual Language Models. ACM Computing Surveys, 57 (9), 1–39. https://doi.org/10.1145/3727339
- Yermolenko, S. (2019). From the history of Ukrainian stylistics: from stylistics of languages to integrative stylistics. Ukrainska Mova, 1, 3–17. https://doi.org/10.15407/ukrmova2019.01.003
- Zakon Ukrainy «Pro zabezpechennia funktsionuvannia ukrainskoi movy yak derzhavnoi» No. 2704-VIII. Verkhovna Rada Ukrainy. Available at: https://zakon.rada.gov.ua/laws/show/2704-19
- Syromiatnikov, M. V., Ruvinskaya, V. M., Troynina, A. S. (2024). ZNO-Eval: Benchmarking reasoning capabilities of large language models in Ukrainian. Informatics. Culture. Technology, 1 (1), 186–191. https://doi.org/10.15276/ict.01.2024.27
- Mitsa, O., Voloshchuk, Y., Levchuk, O., Petsko, V. (2025). A Comparative Study of Machine Learning Algorithms and the Prompting Approach Using GPT-3.5 Turbo for Text Categorization. Advances in Computer Science for Engineering and Education VII, 156–167. https://doi.org/10.1007/978-3-031-84228-3_13
- Voloshchuk, Y. O., Mitsa, O. V. (2024). Comparison of text categorization efficiency using the prompting approach with GPT-3.5-turbo and GPT-4-turbo. Science and Technology Today, 6 (34), 768–777. https://doi.org/10.52058/2786-6025-2024-6(34)-768-777
- Garrachón Ruiz, A., de La Rosa, T., Borrajo, D. (2025). TRIM: Token Reduction and Inference Modeling for Cost-Effective Language Generation. arXiv. https://doi.org/10.48550/arXiv.2412.07682
- Chen, L., Zaharia, M., Zou, J. (2024). FrugalGPT: How to use large language models while reducing cost and improving performance. Transactions on Machine Learning Research. Available at: https://openreview.net/forum?id=cSimKw5p6R
- Wang, Z., Pang, Y., Lin, Y., Zhu, X. (2024). Adaptable and Reliable Text Classification using Large Language Models. 2024 IEEE International Conference on Data Mining Workshops (ICDMW), 67–74. https://doi.org/10.1109/icdmw65004.2024.00015
- Panchenko, D., Maksymenko, D., Turuta, O., Yerokhin, A., Daniiel, Y., Turuta, O. (2022). Evaluation and Analysis of the NLP Model Zoo for Ukrainian Text Classification. Information and Communication Technologies in Education, Research, and Industrial Applications, 109–123. https://doi.org/10.1007/978-3-031-20834-8_6
- Panchenko, D., Maksymenko, D., Turuta, O., Luzan, M., Tytarenko, S., Turuta, O. (2022). Ukrainian News Corpus as Text Classification Benchmark. ICTERI 2021 Workshops, 550–559. https://doi.org/10.1007/978-3-031-14841-5_37
- Wang, Y., Wang, W., Chen, Q., Huang, K., Nguyen, A., De, S. (2024). Zero-shot text classification with knowledge resources under label-fully-unseen setting. Neurocomputing, 610, 128580. https://doi.org/10.1016/j.neucom.2024.128580
- Ulčar, M., Žagar, A., Armendariz, C. S., Repar, A., Pollak, S., Purver, M., Robnik-Šikonja, M. (2026). Mono- and cross-lingual evaluation of representation language models on less-resourced languages. Computer Speech & Language, 95, 101852. https://doi.org/10.1016/j.csl.2025.101852
- Han, B., Yang, S. T., LuVogt, C. (2025). Cross-Lingual Text Classification with Large Language Models. Companion Proceedings of the ACM on Web Conference 2025, 1005–1008. https://doi.org/10.1145/3701716.3715567
- Lai, V., Ngo, N., Pouran Ben Veyseh, A., Man, H., Dernoncourt, F., Bui, T., Nguyen, T. (2023). ChatGPT Beyond English: Towards a Comprehensive Evaluation of Large Language Models in Multilingual Learning. Findings of the Association for Computational Linguistics: EMNLP 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.878
- Prytula, M. (2024). Fine-tuning BERT, DistilBERT, XLM-RoBERTa and Ukr-RoBERTa models for sentiment analysis of Ukrainian language reviews. Artificial Intelligence, 2, 85–97. https://doi.org/10.15407/jai2024.02.085
- Dementieva, D., Khylenko, V., Groh, G. (2025). Cross-lingual text classification transfer: The case of Ukrainian. arXiv. https://doi.org/10.48550/arXiv.2404.02043
- Hamotskyi, І., Levbarg, A., Hänig, C. (2024). Eval-UA-tion 1.0: Benchmark for Evaluating Ukrainian (Large) Language Models. UNLP 2024. Available at: https://hal.science/hal-04534651v2
- Voloshchuk, Yu., Mitsa, O. (2024). Otsinka stabilnosti rezultativ katehoryzatsiyi tekstu z vykorystanniam prompting pidkhodu z velykymy movnymy modeliamy. Materialy konferentsii MTsND. https://doi.org/10.62731/mcnd-21.06.2024.002
- GPT-4.1. OpenAI. Available at: https://openai.com/index/gpt-4-1/
- Pricing. OpenAI. Available at: https://platform.openai.com/docs/pricing
- Voloshchuk, Yu., Mitsa, O. (2024). Porivniannia prompting pidkhodiv z vykorystanniam gpt 4-turbo dlia tekstovoi katehoryzatsiyi. Materialy konferentsii MTsND. https://doi.org/10.62731/mcnd-14.06.2024.006
- Ahia, O., Kumar, S., Gonen, H., Kasai, J., Mortensen, D., Smith, N., Tsvetkov, Y. (2023). Do All Languages Cost the Same? Tokenization in the Era of Commercial Language Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 9904–9923. https://doi.org/10.18653/v1/2023.emnlp-main.614
- Li, X., Zhang, K. (2025). Heterogeneous Graph Neural Network with Multi-View Contrastive Learning for Cross-Lingual Text Classification. Applied Sciences, 15 (7), 3454. https://doi.org/10.3390/app15073454
- Gui, A., Xiao, H. (2024). Multi-level multilingual semantic alignment for zero-shot cross-lingual transfer learning. Neural Networks, 173, 106217. https://doi.org/10.1016/j.neunet.2024.106217
- Huang, K., Shi, Y., Ding, D., Li, Y., Fei, Y., Lakshmanan, L., Xiao, X. (2025). ThriftLLM: On Cost-Effective Selection of Large Language Models for Classification Queries. Proceedings of the VLDB Endowment, 18 (11), 4410–4423. https://doi.org/10.14778/3749646.3749702
- Mitsa, O., Sharkan, V., Maksymchuk, V., Varha, S., Shkurko, H. (2023). Ethnocultural, Educational and Scientific Potential of the Interactive Dialects Map. 2023 IEEE International Conference on Smart Information Systems and Technologies (SIST), 226–231. https://doi.org/10.1109/sist58284.2023.10223544
- Kotsovsky, V. (2024). Learning of Multi-valued Multithreshold Neural Units. Proceedings of the 8th International Conference on Computational Linguistics and Intelligent Systems. Volume III: Intelligent Systems Workshop. https://doi.org/10.31110/colins/2024-3/004
- Kotsovsky, V., Batyuk, A. (2024). Towards the Design of Bithreshold ANN Regressor. 2024 IEEE 19th International Conference on Computer Science and Information Technologies (CSIT), 1–4. https://doi.org/10.1109/csit65290.2024.10982560
- Kotsovsky, V. (2025). Multithreshold neurons with smoothed activation functions. Proceedings of the Intelligent Systems Workshop at 9th International Conference on Computational Linguistics and Intelligent Systems (CoLInS-2025). https://doi.org/10.31110/colins/2025-2/007
- Lupei, M., Mitsa, O., Sharkan, V., Vargha, S., Gorbachuk, V. (2022). The Identification of Mass Media by Text Based on the Analysis of Vocabulary Peculiarities Using Support Vector Machines. 2022 International Conference on Smart Information Systems and Technologies (SIST), 1–6. https://doi.org/10.1109/sist54437.2022.9945774
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Yurii Voloshchuk, Oleksandr Mitsa

This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.





