Evaluation of the efficiency of large language models for extracting entities from unstructured documents

Oleksandr Shyshatskyi; Borys Moroz; Maksym Ievlanov; Ihor Levykin; Dmytro Moroz

doi:10.15587/2706-5448.2025.341926

Authors

Oleksandr Shyshatskyi Dnipro University of Technology, Ukraine https://orcid.org/0009-0008-6008-7079
Borys Moroz Dnipro University of Technology, Ukraine https://orcid.org/0000-0002-5625-0864
Maksym Ievlanov Kharkiv National University of Radio Electronics, Ukraine https://orcid.org/0000-0002-6703-5166
Ihor Levykin Kharkiv National University of Radio Electronics, Ukraine https://orcid.org/0000-0001-8086-237X
Dmytro Moroz Dnipro University of Technology, Ukraine https://orcid.org/0000-0003-2577-3352

DOI:

https://doi.org/10.15587/2706-5448.2025.341926

Keywords:

legal unstructured document, structured document annotation, token processing cost, GPT-4.1-mini

Abstract

The object of research is arrays of unstructured documents located on public websites of rural and urban communities of Ukraine.

The study is devoted to solving the problem of choosing a large language model (LLM), which is the best for applied use in solving named entity recognition (NER) problems during document processing. Modern researchers recognize that such a choice is significantly influenced by the features of the subject area and the language of document creation. However, when studying the feasibility of using LLM to solve NER problems, the features of the operation of such models are practically not taken into account. The issues of evaluating such features remain largely unexplored.

A method for recognizing selected varieties of legal unstructured texts in the Ukrainian language is proposed. Unlike existing ones, this method solves the NER problem for those documents that are subject to recognition/classification. Metrics for the cost of processing input and output tokens are proposed and a methodology for evaluating the cost of using LLM is developed. Based on these results, a comparative evaluation of the application of common LLMs to solve the NER problem on Ukrainian texts that need to be recognized was conducted. According to the evaluation results, it was recognized that: (I) GPT-4o is the best in terms of accuracy and quality of processing (Precision = 0.919; Recall = 0.954; F1 = 0.936); (II) GPT-4o-mini with discounts is the best in terms of average document processing cost (0.00045 USD per document); (III) GPT-4.1-mini with discounts is the best in terms of quality/cost ratio (the indicator value is 0.938). The GPT-4.1-mini LLM is recommended as the best for applied application.

The evaluation results obtained allow to significantly simplify the choice of LLM, which is advisable to use for creating information systems and technologies for processing unstructured documents created in Ukrainian.

Author Biographies

Oleksandr Shyshatskyi, Dnipro University of Technology

PhD Student

Department of Software Engineering

Borys Moroz, Dnipro University of Technology

Doctor of Technical Sciences

Department of Software Engineering

Maksym Ievlanov, Kharkiv National University of Radio Electronics

Doctor of Technical Sciences

Department of Information Control Systems

Ihor Levykin, Kharkiv National University of Radio Electronics

Doctor of Technical Sciences

Department of Media Systems and Technologies

Dmytro Moroz, Dnipro University of Technology

PhD

Department of Software Engineering

References

Jonker, A., Gomstyn, A. (2025). Structured vs. unstructured data: What's the difference? IBM. Available at: https://www.ibm.com/think/topics/structured-vs-unstructured-data Last accessed: 26.08.2025
What is text mining? IBM. Available at: https://www.ibm.com/think/topics/text-mining Last accessed: 26.08.2025
What Percentage of Data is Unstructured? 3 Must-Know Statistics (2024). Edge Delta. Available at: https://edgedelta.com/company/blog/what-percentage-of-data-is-unstructured Last accessed: 26.08.2025
Shcho take rozpiznavannia imenovanykh sutnostei (NER) – pryklad, vypadky vykorystannia, perevahy ta problemy (2025). Shaip. Available at: https://uk.shaip.com/blog/named-entity-recognition-and-its-types/ Last accessed: 26.08.2025
Seow, W. L., Chaturvedi, I., Hogarth, A., Mao, R., Cambria, E. (2025). A review of named entity recognition: from learning methods to modelling paradigms and tasks. Artificial Intelligence Review, 58 (10). https://doi.org/10.1007/s10462-025-11321-8
Pitsilou, V., Papadakis, G., Skoutas, D. (2024). Using LLMs to Extract Food Entities from Cooking Recipes. 2024 IEEE 40th International Conference on Data Engineering Workshops (ICDEW). Utrecht, 21–28. https://doi.org/10.1109/icdew61823.2024.00008
Brach, W., Košťál, K., Ries, M. (2025). The Effectiveness of Large Language Models in Transforming Unstructured Text to Standardized Formats. IEEE Access, 13, 91808–91825. https://doi.org/10.1109/access.2025.3573030
Zeginis, D., Kalampokis, E., Tarabanis, K. (2024). Applying an ontology-aware zero-shot LLM prompting approach for information extraction in Greek: the case of DIAVGEIA gov gr. Proceedings of the 28th Pan-Hellenic Conference on Progress in Computing and Informatics. New York, 324–330. https://doi.org/10.1145/3716554.3716603
Liu, Y., Hou, J., Chen, Y., Jin, J., Wang, W. (2025). LLM-ACNC: Aerospace Requirement Texts Knowledge Graph Construction Utilizing Large Language Model. Aerospace, 12 (6), 463. https://doi.org/10.3390/aerospace12060463
Truhn, D., Loeffler, C. M., Müller‐Franzes, G., Nebelung, S., Hewitt, K. J., Brandner, S. et al. (2023). Extracting structured information from unstructured histopathology reports using generative pre‐trained transformer 4 (GPT‐4). The Journal of Pathology, 262 (3), 310–319. https://doi.org/10.1002/path.6232
Hu, Y., Chen, Q., Du, J., Peng, X., Keloth, V. K., Zuo, X. et al. (2024). Improving large language models for clinical named entity recognition via prompt engineering. Journal of the American Medical Informatics Association, 31 (9), 1812–1820. https://doi.org/10.1093/jamia/ocad259
del Moral-González, R., Gómez-Adorno, H., Ramos-Flores, O. (2025). Comparative analysis of generative LLMs for labeling entities in clinical notes. Genomics & Informatics, 23 (1). https://doi.org/10.1186/s44342-024-00036-x
Campillos-Llanos, L., Valverde-Mateos, A., Capllonch-Carrión, A. (2025). Hybrid natural language processing tool for semantic annotation of medical texts in Spanish. BMC Bioinformatics, 26 (1). https://doi.org/10.1186/s12859-024-05949-6
Xu, Q., Liu, Y., Wang, D., Huang, S. (2025). Automatic recognition of cross-language classic entities based on large language models. Npj Heritage Science, 13 (1). https://doi.org/10.1038/s40494-025-01624-y
Shyshatskyi, O. (2025). Dataset and additional materials. GitHub. Available at: https://github.com/oshyshatskyi-phd/public-docs-processing Last accessed: 26.08.2025
Gemini models that support batch predictions. Google Cloud. Available at: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/batch-prediction-gemini#models_that_support_batch_predictions Last accessed: 21.06.2025
Pricing. OpenAI platform. Available at: https://platform.openai.com/docs/pricing Last accessed: 21.06.2025
Models & Pricing. Deepseek API Docs. Available at: https://api-docs.deepseek.com/quick_start/pricing Last accessed: 21.06.2025

Evaluation of the efficiency of large language models for extracting entities from unstructured documents

Authors

DOI:

Keywords:

Abstract

Author Biographies

Oleksandr Shyshatskyi, Dnipro University of Technology

Borys Moroz, Dnipro University of Technology

Maksym Ievlanov, Kharkiv National University of Radio Electronics

Ihor Levykin, Kharkiv National University of Radio Electronics

Dmytro Moroz, Dnipro University of Technology

References

Downloads

Published

How to Cite

Issue

Section

License

Information site

Language

Information

Developed By

Current Issue