Accuracy evaluation and error analysis of dependency parsing of texts in Ukrainian

Kostiantyn Syrotkin

doi:10.30837/2522-9818.2025.2.102

Authors

Kostiantyn Syrotkin National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute", Ukraine https://orcid.org/0009-0008-4150-8325

DOI:

https://doi.org/10.30837/2522-9818.2025.2.102

Abstract

The subject of our research is the dependency parsing of sentences in the Ukrainian language using the Universal Dependencies framework. The goal of the work is to evaluate the accuracy of existing transition-based and graph-based parsing architectures with and without deep word embeddings on the Ukrainian dataset, and to analyze the error profiles of such parsers. The article addresses two tasks. One is to evaluate the accuracy of several modern dependency parsing approaches applied to a hand-annotated gold standard dataset, using labeled and unlabeled attachment scores as the metric to evaluate the parsing accuracy. The other task is to analyze and categorize the errors made by standard parsers. Resolving these errors could potentially allow us to build a more accurate parser in the future. Error rate for different categories is compared to the baseline error rate, and statistical significance of such comparison is validated using the chi-square method. The key results are as follows. For the Ukrainian language, parsing accuracy is greatly increased with the use of deep word embeddings. Transition-based parser with deep word embeddings provides the highest labeled attachment score of 84.66% for the test dataset. For the same parser, higher error rates are associated with non-projectivity of dependencies, higher sentence length and higher distance to head. Also, for pronouns and numerals the error rate for labeled attachment is significantly higher than the baseline, while the unlabeled error rate is at the baseline. Conclusions: parsing accuracy for the Ukrainian dataset is sub-par in comparison with other languages, but the overall trend of accuracy improvement with the use of deep word embeddings is consistent with existing research. To improve overall parsing accuracy, we must focus on such problem areas as non-projective dependencies, longer sentences, and greater distance between the head and the dependent. In future work we intend to explore ways to improve parsing accuracy by supplementing neural parsing with other approaches, like formal rules or pre- and post-processing.

Author Biography

Kostiantyn Syrotkin, National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"

Master

References

Tsarfaty, R.; Seddah, D.; Goldberg, Y.; Kuebler, S.; Versley, Y.; Candito, M.; Foster, J.; Rehbein, I.; Tounsi, L. (2010), "Statistical Parsing of Morphologically Rich Languages (SPMRL) What, How and Whither". Proceedings of the NAACL HLT 2010 First Workshop on Statistical Parsing of Morphologically-Rich Languages. P. 1–12. URL: https://aclanthology.org/W10-1401/

Kotsyba, N.; Moskalevskyi, B.; Romanenko, M.; Samoridna, H.; Kosovska, I.; Lytvyn, O.; Orlenko, O.; Brovko, H.; Matushko, B.; Onyshchuk, N.; Pareviazko, V.; Rychyk, Y.; Stetsenko, A.; Umanets, S.; Masenko, L. (2021), "Gold standard Universal Dependencies corpus for Ukrainian (UD_Ukrainian-IU) v2.8". URL: https://github.com/UniversalDependencies/UD_Ukrainian-IU.

Silveira, N.; Dozat, T., de Marneffe M.-C.; Bowman, S.; Connor M.; Bauer, J.; Manning C. (2014), "A Gold Standard Dependency Corpus for English". Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC-2014). P. 2897–2904. URL: http://www.lrec-conf.org/proceedings/lrec2014/pdf/1089_Paper.pdf

Jurafsky D., Martin J. "Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition with Language Models, 3rd ed". USA. 2025. 599 p. URL: https://web.stanford.edu/~jurafsky/slp3/

Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. (2013), "Efficient Estimation of Word Representations in Vector Space". arXiv: 1301.3781 [cs.CL]. DOI: 10.48550/arXiv.1301.3781

Kulmizev, A.; de Lhoneux, M.; Gontrum, J.; Fano, E.; Nivre, J. (2019), "Deep Contextualized Word Embeddings in Transition-Based and Graph-Based Dependency Parsing - A Tale of Two Parsers Revisited". Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP). P. 2755–2768. DOI: 10.18653/v1/D19-1277

Honnibal M.; Johnson M. (2015), "An Improved Non-monotonic Transition System for Dependency Parsing". Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing. P. 1373–1378. DOI: 10.18653/v1/D15-1162

De Marneffe, M.-C.; Manning, C.; Nivre, J.; Zeman, D. (2021), "Universal Dependencies". Computational Linguistics. Vol. 47. No. 2. P. 255–308. DOI: 10.1162/coli_a_00402

Chaplynskyi D. (2023), "Introducing UberText 2.0: A Corpus of Modern Ukrainian at Scale". Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP). P. 1–10. DOI: 10.18653/v1/2023.unlp-1.1

Starko V.; Rysin A. (2023), "Creating a POS Gold Standard Corpus of Modern Ukrainian". Proceedings of the Second Ukrainian Natural Language Processing Workshop (UNLP). P. 91–95. DOI: 10.18653/v1/2023.unlp-1.11.

Shvedova M.; Lukashevskyi A. (2024), "UD_Ukrainian-ParlaMint". URL: https://github.com/UniversalDependencies/UD_Ukrainian-ParlaMint.

De Lhoneux, M.; Stymne S.; Nivre J. (2017), "Arc-Hybrid Non-Projective Dependency Parsing with a Static-Dynamic Oracle". Proceedings of the 15th International Conference on Parsing Technologies. P. 99–104. URL: https://aclanthology.org/W17-6314/

Peters, M.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. (2018), "Deep Contextualized Word Representations". Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers). P. 2227–2237. DOI: 10.18653/v1/N18-1202

Che, W.; Liu, Y.; Wang, Y.; Zheng, B.; Liu, T. (2018), "Towards Better UD Parsing: Deep Contextualized Word Embeddings, Ensemble, and Treebank Concatenation". Proceedings of the CoNLL 2018 Shared Task: Multilingual Parsing from Raw Text to Universal Dependencies. P. 55–64. DOI: 10.18653/v1/K18-2005

Eisner J. (1996), "Three New Probabilistic Models for Dependency Parsing: An Exploration". COLING 1996 Volume 1: The 16th International Conference on Computational Linguistics. P. 340–345. URL: https://aclanthology.org/C96-1058/

Nivre J.; Hall J.; Nilsson J. (2004), "Memory-Based Dependency Parsing". Proceedings of the Eighth Conference on Computational Natural Language Learning (CoNLL-2004) at HLT-NAACL 2004. P. 49–56. URL: https://aclanthology.org/W04-2407/

Nivre J.; Fang C.-T. (2017), "Universal Dependency Evaluation". Proceedings of the NoDaLiDa 2017 Workshop on Universal Dependencies (UDW 2017). P. 86–95. URL: https://aclanthology.org/W17-0411/

Accuracy evaluation and error analysis of dependency parsing of texts in Ukrainian

Authors

DOI:

Abstract

Author Biography

Kostiantyn Syrotkin, National Technical University of Ukraine "Igor Sikorsky Kyiv Polytechnic Institute"

References

Downloads

Published

How to Cite

Issue

Section

License

Language

Make a Submission