Study of the process of identifying the authorship of texts written in natural language
DOI:
https://doi.org/10.15587/2706-5448.2024.301706Keywords:
normalization, toning, lemmatization, stop word, machine learning, classical model, deep model, LSTM, GRU, web-applicationAbstract
The object of the research is the process of identifying the authorship of a text using computer technologies with the application of machine learning. The full process of solving the problem from text preparation to evaluation of the results was considered. Identification of the authorship of a text is a very complex and time-consuming task that requires maximum attention. This is because the identification process always requires taking into account a very large number of different factors and information related to each specific author. As a result, various problems and errors related to the human factor may arise in the identification process, which may ultimately lead to a deterioration in the results obtained.
The subject of the work is the methods and means of analyzing the process of identifying the authorship of a text using existing computer technologies. As part of the work, the authors have developed a web application for identifying the authorship of a text. The software application was written using machine learning technologies, has a user-friendly interface and an advanced error tracking system, and can recognize both text written by one author and that written in collaboration.
The effectiveness of different types of machine learning models and data fitting tools is analyzed. Computer technologies for identifying the authorship of a text are defined. The main advantages of using computer technology to identify text authorship are:
– Speed: computer algorithms can analyze large amounts of text in an extremely short period of time.
– Objectivity: computer algorithms use only proven algorithms to analyze text features and are not subject to emotional influence or preconceived opinions during the analysis process.
The result of the work is a web application for identifying the authorship of a text developed on the basis of research on the process of identifying the authorship of a text using computer technology.
References
- Bengfort, В., Bilbro, R., Ojeda, T. (2018). Applied Text Analysis with Python. O'Reilly Media, Inc., 330.
- Yülüce, İ., Dalkılıç, F. (2022). Author Identification with Machine Learning Algorithms. International Journal of Multidisciplinary Studies and Innovative Technologies, 6 (1), 45–50. doi: https://doi.org/10.36287/ijmsit.6.1.45
- Lupey, M. (2020). Determining the author’s affiliation of a Ukrainian-language text using a neuro-system for determining the affiliation of a text. Science and Education a New Dimension, VIII (233) (28), 34–37. doi: https://doi.org/10.31174/send-nt2020-233viii28-07
- Podshyvalenko, B. O. (2021). Zastosuvannia metodiv statystychnoho analizu dlia rozviazannia zadachi identyfikatsii tekstiv. Radioelektronika ta molod u XXI stolitti, 7 (10), 65–66.
- Gupta, S. T., Sahoo, J. K., Roul, R. K. (2019). Authorship Identification using Recurrent Neural Networks. Proceedings of the 2019 3rd International Conference on Information System and Data Mining, 133–137. doi: https://doi.org/10.1145/3325917.3325935
- Zhao, Y., Zobel, J. (2007). Searching with Style. Authorship Attribution in Classic Literature, 148, 89–111.
- Statystychnyi analiz. Available at: https://stud.com.ua/49878/marketing/statistichniy_analiz
- What is machine learning (ML)? Available at: https://www.ibm.com/topics/machine-learning
- Slovnyk NLP. Available at: https://medium.com/
- Windows Machine Learning (WinML). Available at: https://learn.microsoft.com/en-us/windows/ai/windows-ml/
- Lamiae, H. (2020). Classical ML vs. Deep Learning. Available at: https://lamiae-hana.medium.com/classical-ml-vs-deep-learning-f8e28a52132d
- Scikit-learn User Guide. Available at: https://scikitlearn.org/stable/user_guide.html
- Lendave, V. (2021). LSTM Vs GRU in Recurrent Neural Network: A Comparative Study. Available at: https://analyticsindiamag-com.translate.goog/lstm-vs-gru-in-recurrent-neural-network-a-comparative-study/
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Yuliia Ulianovska, Oleksandr Firsov, Victoria Kostenko, Oleksiy Pryadka
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.