Study of the process of identifying the authorship of texts written in natural language

Authors

DOI:

https://doi.org/10.15587/2706-5448.2024.301706

Keywords:

normalization, toning, lemmatization, stop word, machine learning, classical model, deep model, LSTM, GRU, web-application

Abstract

The object of the research is the process of identifying the authorship of a text using computer technologies with the application of machine learning. The full process of solving the problem from text preparation to evaluation of the results was considered. Identification of the authorship of a text is a very complex and time-consuming task that requires maximum attention. This is because the identification process always requires taking into account a very large number of different factors and information related to each specific author. As a result, various problems and errors related to the human factor may arise in the identification process, which may ultimately lead to a deterioration in the results obtained.

The subject of the work is the methods and means of analyzing the process of identifying the authorship of a text using existing computer technologies. As part of the work, the authors have developed a web application for identifying the authorship of a text. The software application was written using machine learning technologies, has a user-friendly interface and an advanced error tracking system, and can recognize both text written by one author and that written in collaboration.

The effectiveness of different types of machine learning models and data fitting tools is analyzed. Computer technologies for identifying the authorship of a text are defined. The main advantages of using computer technology to identify text authorship are:

– Speed: computer algorithms can analyze large amounts of text in an extremely short period of time.

– Objectivity: computer algorithms use only proven algorithms to analyze text features and are not subject to emotional influence or preconceived opinions during the analysis process.

The result of the work is a web application for identifying the authorship of a text developed on the basis of research on the process of identifying the authorship of a text using computer technology.

Author Biographies

Yuliia Ulianovska, University of Customs and Finance

PhD, Associate Professor

Department of Computer Science and Software Engineering

Oleksandr Firsov, University of Customs and Finance

PhD, Associate Professor

Department of Computer Science and Software Engineering

Victoria Kostenko, University of Customs and Finance

Senior Lecturer

Department of Computer Science and Software Engineering

Oleksiy Pryadka, University of Customs and Finance

Department of Computer Science and Software Engineering

References

  1. Bengfort, В., Bilbro, R., Ojeda, T. (2018). Applied Text Analysis with Python. O'Reilly Media, Inc., 330.
  2. Yülüce, İ., Dalkılıç, F. (2022). Author Identification with Machine Learning Algorithms. International Journal of Multidisciplinary Studies and Innovative Technologies, 6 (1), 45–50. doi: https://doi.org/10.36287/ijmsit.6.1.45
  3. Lupey, M. (2020). Determining the author’s affiliation of a Ukrainian-language text using a neuro-system for determining the affiliation of a text. Science and Education a New Dimension, VIII (233) (28), 34–37. doi: https://doi.org/10.31174/send-nt2020-233viii28-07
  4. Podshyvalenko, B. O. (2021). Zastosuvannia metodiv statystychnoho analizu dlia rozviazannia zadachi identyfikatsii tekstiv. Radioelektronika ta molod u XXI stolitti, 7 (10), 65–66.
  5. Gupta, S. T., Sahoo, J. K., Roul, R. K. (2019). Authorship Identification using Recurrent Neural Networks. Proceedings of the 2019 3rd International Conference on Information System and Data Mining, 133–137. doi: https://doi.org/10.1145/3325917.3325935
  6. Zhao, Y., Zobel, J. (2007). Searching with Style. Authorship Attribution in Classic Literature, 148, 89–111.
  7. Statystychnyi analiz. Available at: https://stud.com.ua/49878/marketing/statistichniy_analiz
  8. What is machine learning (ML)? Available at: https://www.ibm.com/topics/machine-learning
  9. Slovnyk NLP. Available at: https://medium.com/
  10. Windows Machine Learning (WinML). Available at: https://learn.microsoft.com/en-us/windows/ai/windows-ml/
  11. Lamiae, H. (2020). Classical ML vs. Deep Learning. Available at: https://lamiae-hana.medium.com/classical-ml-vs-deep-learning-f8e28a52132d
  12. Scikit-learn User Guide. Available at: https://scikitlearn.org/stable/user_guide.html
  13. Lendave, V. (2021). LSTM Vs GRU in Recurrent Neural Network: A Comparative Study. Available at: https://analyticsindiamag-com.translate.goog/lstm-vs-gru-in-recurrent-neural-network-a-comparative-study/
Study of the process of identifying the authorship of texts written in natural language

Downloads

Published

2024-04-15

How to Cite

Ulianovska, Y., Firsov, O., Kostenko, V., & Pryadka, O. (2024). Study of the process of identifying the authorship of texts written in natural language. Technology Audit and Production Reserves, 2(2(76), 32–37. https://doi.org/10.15587/2706-5448.2024.301706

Issue

Section

Information Technologies