Development of methods, models, and means for the author attribution of a text

Authors

DOI:

https://doi.org/10.15587/1729-4061.2018.132052

Keywords:

mean frequencies of groups of consonant phonemes, style, substyle, and author differentiation of texts, software system, method, phoneme, phonological level

Abstract

The level of accuracy of author attribution of a text is not high enough at the lexical and syntactic levels of a language as these levels are not strictly organized systems. In this study, the author attribution of a text is based on the differentiation of phonostatistical structures of styles.

We have developed a system of differentiation of phonostatistical structures of styles, which differs from the existing ones by the chosen level of a language ‒ phonological. At this level of a language one can obtain results with a greater accuracy. In addition, the system constructed is based on a modular principle, which makes it possible to rapidly modify the developed software.

We have developed methods and models that are based on the theory of mathematical statistics and allow the improvement in the accuracy of differentiation of phonostatistical structures of styles. A method was devised for a comprehensive analysis of phonostatistical structures of styles, as well as a multifactor method for determining the degrees of action of factors related to style, substyle, and author's manner of presentation. We have constructed a statistical model of stylistic differentiation using the ranking method, and a statistical model for determining a general stylistic markedness of the examined text. A software system for the differentiation of texts was designed.

The criterion for the differentiation of texts is the mean frequencies of groups of consonant phonemes.

In the process of implementing a system we used the programming language java, which ensures that the software is platform-independent.

This study reports results of the application of the developed methods, models, and software tools. The research results confirm that author attribution of a text at the phonological level is more effective. The developed methods, models, and means for the author attribution of a text could be used when determining the percentage of creative contribution of each of the co-authors of scientific papers.

Author Biographies

Iryna Khomytska, Lviv Polytechnic National University Bandery str., 12, Lviv, Ukraine, 79013

Assistant

Department of Applied Linguistics

Vasyl Teslyuk, Lviv Polytechnic National University Bandery str., 12, Lviv, Ukraine, 79013

Doctor of Technical Sciences, Professor

Department of Automated Control Systems

Andriy Holovatyy, Ukrainian National Forestry University Henerala Chuprynky str., 103, Lviv, Ukraine, 79057

PhD, Associate Professor

Department of Information Technologies

Oleksandr Morushko, Lviv Polytechnic National University Bandery str., 12, Lviv, Ukraine, 79013

PhD, Associate Professor

Department of Social Communication and Information Activity

References

  1. Kornai, A. (2008). Mathematical Linguistics. Springer. doi: 10.1007/978-1-84628-986-6
  2. Gries, Th. S. (2009). Statistics for Linguistics with R. Mouton Textbook, 335. doi: 10.1515/9783110216042
  3. Martindale, C., McKenzie, D. (1995). On the utility of content analysis in author attribution:The Federalist. Computers and the Humanities, 29 (4), 259–270. doi: 10.1007/bf01830395
  4. Gibbons, J. (2003). Forensic Linguistics. An Introduction to Language in the Justice System. Wiley-Blackwell, 346.
  5. Olsson, J. (2008). Forensic Linguistics. Second edition: An Introduction to Language, Crime and the Law. Bloomsbury Academic, 288.
  6. Berko, A. Yu., Vysotska, V. A., Chyrun, L. V. (2015). Linhvistychnyi analiz tekstovoho komertsiynoho kontentu. Informatsiyni systemy ta merezhi. Visnyk Natsionalnoho universytetu “Lvivska politekhnika”, 814, 203–227.
  7. Bisikalo, O. V., Vysotska, V. A. (2016). Sentence syntactic analysis application to keywords identification Ukrainian texts. Radio Electronics, Computer Science, Control. 2016. Issue 3. P. 54–65. doi: 10.15588/1607-3274-2016-3-7
  8. Shakhovska, N., Vysotska, V., Chyrun, L. (2016). Intelligent Systems Design of Distance Learning Realization for Modern Youth Promotion and Involvement in Independent Scientific Researches. Advances in Intelligent Systems and Computing, 175–198. doi: 10.1007/978-3-319-45991-2_12
  9. Lytvyn, V., Vysotska, V., Veres, O., Rishnyak, I., Rishnyak, H. (2016). Content linguistic analysis methods for textual documents classification. 2016 XIth International Scientific and Technical Conference Computer Sciences and Information Technologies (CSIT). doi: 10.1109/stc-csit.2016.7589903
  10. Lytvyn, V. V., Bobyk, I. O., Vysotska, V. A. (2016). Application of algorithmic algebra system for grammatical analysis of symbolic computation expressions of propositional logic. Radio Electronics, Computer Science, Control, 4, 77–89. doi: 10.15588/1607-3274-2016-4-10
  11. Lytvyn, V., Vysotska, V., Pukach, P., Bobyk, I., Uhryn, D. (2017). Development of a method for the recognition of author’s style in the Ukrainian language texts based on linguometry, stylemetry and glottochronology. Eastern-European Journal of Enterprise Technologies, 4 (2 (88)), 10–19. doi: 10.15587/1729-4061.2017.107512
  12. Davydov, M., Lozynska, O. (2016). Linguistic models of assistive computer technologies for cognition and communication. 2016 XIth International Scientific and Technical Conference Computer Sciences and Information Technologies (CSIT). doi: 10.1109/stc-csit.2016.7589898
  13. Shestakevych, Т., Vysotska, V., Chyrun, L., Chyrun, L. (2014). Modelling of semantics of natural language sentences using generative grammars. Computer Science and Information Technologies: Proc. of the IX-th Int. Conf. CSIT’2014. Lviv: Lviv Polytechnic Publishing House, 19–22.
  14. Vasyl, L., Victoria, V., Dmytro, D., Roman, H., Zoriana, R. (2017). Application of sentence parsing for determining keywords in Ukrainian texts. 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT). doi: 10.1109/stc-csit.2017.8098797
  15. Zhezhnych, P., Markiv, O. (2017). A linguistic method of web-site content comparison with tourism documentation objects. 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT). doi: 10.1109/stc-csit.2017.8098800
  16. Korobchinsky, M., Chyrun, L., Chyrun, L., Vysotska, V. (2017). Peculiarities of content forming and analysis in internet newspaper covering music news. 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT). doi: 10.1109/stc-csit.2017.8098735
  17. Kapociute-Dzikiene, J., Utka, F., Sarkute, L. (2015). Authorship Attribution and Author Profiling of Lithuanian Literary Texts. Proceedings of the 5th Workshop on Balto-Slavic Natural Language Processing. Hissac, Bulgaria, 96–105.
  18. Stamatatos, E. (2009). A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology, 60 (3), 538–556. doi: 10.1002/asi.21001
  19. Argamon, S., Koppel, M., Pennebaker, J. W., Schler, J. (2009). Automatically profiling the author of an anonymous text. Communications of the ACM, 52 (2), 119. doi: 10.1145/1461928.1461959
  20. Koppel, M., Schler, J., Argamon, S. (2009). Computational methods in authorship attribution. Journal of the American Society for Information Science and Technology, 60 (1), 9–26. doi: 10.1002/asi.20961
  21. Juola, P. (2007). Authorship Attribution. Foundations and Trends® in Information Retrieval, 1 (3), 233–334. doi: 10.1561/1500000005
  22. Khomytska, I., Teslyuk, V. (2016). The Method of Statistical Analysis of the Scientific, Colloquial, Belles-Lettres and Newspaper Styles on the Phonological Level. Advances in Intelligent Systems and Computing, 149–163. doi: 10.1007/978-3-319-45991-2_10
  23. Khomytska, I., Teslyuk, V. (2016). Specifics of phonostatistical structure of the scientific style in English style system. 2016 XIth International Scientific and Technical Conference Computer Sciences and Information Technologies (CSIT). doi: 10.1109/stc-csit.2016.7589887
  24. Bektaev, K. B. (1974). Matematicheskie metody v yazykoznanii. Ch. 2. Alma-Ata, 335.
  25. Mitropol'skiy, A. K. (1971). Tekhnika statisticheskih vichisleniy. Moscow: Nauka, 576.
  26. Khomytska, I., Teslyuk, V. (2017). Modelling of phonostatistical structures of English backlingual phoneme group in style system. 2017 14th International Conference The Experience of Designing and Application of CAD Systems in Microelectronics (CADSM). doi: 10.1109/cadsm.2017.7916144
  27. Khomytska, I., Teslyuk, V. (2017). Modelling of phonostatistical structures of the colloquial and newspaper styles in english sonorant phoneme group. 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT). doi: 10.1109/stc-csit.2017.8098738
  28. Chabanyuk, Y., Seniv, M., Khimka, U. (2013). Continuous Stochastic Optimization Procedure in Software Reliability. Proceedings of the XIIth International Conference The Experience of Designing and Application of CAD Systems in Microelectronics CADSM 2013. Polyana, 56–59.

Downloads

Published

2018-05-24

How to Cite

Khomytska, I., Teslyuk, V., Holovatyy, A., & Morushko, O. (2018). Development of methods, models, and means for the author attribution of a text. Eastern-European Journal of Enterprise Technologies, 3(2 (93), 41–46. https://doi.org/10.15587/1729-4061.2018.132052