Development of methods, models, and means for the author attribution of a text
DOI:
https://doi.org/10.15587/1729-4061.2018.132052Keywords:
mean frequencies of groups of consonant phonemes, style, substyle, and author differentiation of texts, software system, method, phoneme, phonological levelAbstract
The level of accuracy of author attribution of a text is not high enough at the lexical and syntactic levels of a language as these levels are not strictly organized systems. In this study, the author attribution of a text is based on the differentiation of phonostatistical structures of styles.
We have developed a system of differentiation of phonostatistical structures of styles, which differs from the existing ones by the chosen level of a language ‒ phonological. At this level of a language one can obtain results with a greater accuracy. In addition, the system constructed is based on a modular principle, which makes it possible to rapidly modify the developed software.
We have developed methods and models that are based on the theory of mathematical statistics and allow the improvement in the accuracy of differentiation of phonostatistical structures of styles. A method was devised for a comprehensive analysis of phonostatistical structures of styles, as well as a multifactor method for determining the degrees of action of factors related to style, substyle, and author's manner of presentation. We have constructed a statistical model of stylistic differentiation using the ranking method, and a statistical model for determining a general stylistic markedness of the examined text. A software system for the differentiation of texts was designed.
The criterion for the differentiation of texts is the mean frequencies of groups of consonant phonemes.
In the process of implementing a system we used the programming language java, which ensures that the software is platform-independent.
This study reports results of the application of the developed methods, models, and software tools. The research results confirm that author attribution of a text at the phonological level is more effective. The developed methods, models, and means for the author attribution of a text could be used when determining the percentage of creative contribution of each of the co-authors of scientific papers.
References
- Kornai, A. (2008). Mathematical Linguistics. Springer. doi: 10.1007/978-1-84628-986-6
- Gries, Th. S. (2009). Statistics for Linguistics with R. Mouton Textbook, 335. doi: 10.1515/9783110216042
- Martindale, C., McKenzie, D. (1995). On the utility of content analysis in author attribution:The Federalist. Computers and the Humanities, 29 (4), 259–270. doi: 10.1007/bf01830395
- Gibbons, J. (2003). Forensic Linguistics. An Introduction to Language in the Justice System. Wiley-Blackwell, 346.
- Olsson, J. (2008). Forensic Linguistics. Second edition: An Introduction to Language, Crime and the Law. Bloomsbury Academic, 288.
- Berko, A. Yu., Vysotska, V. A., Chyrun, L. V. (2015). Linhvistychnyi analiz tekstovoho komertsiynoho kontentu. Informatsiyni systemy ta merezhi. Visnyk Natsionalnoho universytetu “Lvivska politekhnika”, 814, 203–227.
- Bisikalo, O. V., Vysotska, V. A. (2016). Sentence syntactic analysis application to keywords identification Ukrainian texts. Radio Electronics, Computer Science, Control. 2016. Issue 3. P. 54–65. doi: 10.15588/1607-3274-2016-3-7
- Shakhovska, N., Vysotska, V., Chyrun, L. (2016). Intelligent Systems Design of Distance Learning Realization for Modern Youth Promotion and Involvement in Independent Scientific Researches. Advances in Intelligent Systems and Computing, 175–198. doi: 10.1007/978-3-319-45991-2_12
- Lytvyn, V., Vysotska, V., Veres, O., Rishnyak, I., Rishnyak, H. (2016). Content linguistic analysis methods for textual documents classification. 2016 XIth International Scientific and Technical Conference Computer Sciences and Information Technologies (CSIT). doi: 10.1109/stc-csit.2016.7589903
- Lytvyn, V. V., Bobyk, I. O., Vysotska, V. A. (2016). Application of algorithmic algebra system for grammatical analysis of symbolic computation expressions of propositional logic. Radio Electronics, Computer Science, Control, 4, 77–89. doi: 10.15588/1607-3274-2016-4-10
- Lytvyn, V., Vysotska, V., Pukach, P., Bobyk, I., Uhryn, D. (2017). Development of a method for the recognition of author’s style in the Ukrainian language texts based on linguometry, stylemetry and glottochronology. Eastern-European Journal of Enterprise Technologies, 4 (2 (88)), 10–19. doi: 10.15587/1729-4061.2017.107512
- Davydov, M., Lozynska, O. (2016). Linguistic models of assistive computer technologies for cognition and communication. 2016 XIth International Scientific and Technical Conference Computer Sciences and Information Technologies (CSIT). doi: 10.1109/stc-csit.2016.7589898
- Shestakevych, Т., Vysotska, V., Chyrun, L., Chyrun, L. (2014). Modelling of semantics of natural language sentences using generative grammars. Computer Science and Information Technologies: Proc. of the IX-th Int. Conf. CSIT’2014. Lviv: Lviv Polytechnic Publishing House, 19–22.
- Vasyl, L., Victoria, V., Dmytro, D., Roman, H., Zoriana, R. (2017). Application of sentence parsing for determining keywords in Ukrainian texts. 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT). doi: 10.1109/stc-csit.2017.8098797
- Zhezhnych, P., Markiv, O. (2017). A linguistic method of web-site content comparison with tourism documentation objects. 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT). doi: 10.1109/stc-csit.2017.8098800
- Korobchinsky, M., Chyrun, L., Chyrun, L., Vysotska, V. (2017). Peculiarities of content forming and analysis in internet newspaper covering music news. 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT). doi: 10.1109/stc-csit.2017.8098735
- Kapociute-Dzikiene, J., Utka, F., Sarkute, L. (2015). Authorship Attribution and Author Profiling of Lithuanian Literary Texts. Proceedings of the 5th Workshop on Balto-Slavic Natural Language Processing. Hissac, Bulgaria, 96–105.
- Stamatatos, E. (2009). A survey of modern authorship attribution methods. Journal of the American Society for Information Science and Technology, 60 (3), 538–556. doi: 10.1002/asi.21001
- Argamon, S., Koppel, M., Pennebaker, J. W., Schler, J. (2009). Automatically profiling the author of an anonymous text. Communications of the ACM, 52 (2), 119. doi: 10.1145/1461928.1461959
- Koppel, M., Schler, J., Argamon, S. (2009). Computational methods in authorship attribution. Journal of the American Society for Information Science and Technology, 60 (1), 9–26. doi: 10.1002/asi.20961
- Juola, P. (2007). Authorship Attribution. Foundations and Trends® in Information Retrieval, 1 (3), 233–334. doi: 10.1561/1500000005
- Khomytska, I., Teslyuk, V. (2016). The Method of Statistical Analysis of the Scientific, Colloquial, Belles-Lettres and Newspaper Styles on the Phonological Level. Advances in Intelligent Systems and Computing, 149–163. doi: 10.1007/978-3-319-45991-2_10
- Khomytska, I., Teslyuk, V. (2016). Specifics of phonostatistical structure of the scientific style in English style system. 2016 XIth International Scientific and Technical Conference Computer Sciences and Information Technologies (CSIT). doi: 10.1109/stc-csit.2016.7589887
- Bektaev, K. B. (1974). Matematicheskie metody v yazykoznanii. Ch. 2. Alma-Ata, 335.
- Mitropol'skiy, A. K. (1971). Tekhnika statisticheskih vichisleniy. Moscow: Nauka, 576.
- Khomytska, I., Teslyuk, V. (2017). Modelling of phonostatistical structures of English backlingual phoneme group in style system. 2017 14th International Conference The Experience of Designing and Application of CAD Systems in Microelectronics (CADSM). doi: 10.1109/cadsm.2017.7916144
- Khomytska, I., Teslyuk, V. (2017). Modelling of phonostatistical structures of the colloquial and newspaper styles in english sonorant phoneme group. 2017 12th International Scientific and Technical Conference on Computer Sciences and Information Technologies (CSIT). doi: 10.1109/stc-csit.2017.8098738
- Chabanyuk, Y., Seniv, M., Khimka, U. (2013). Continuous Stochastic Optimization Procedure in Software Reliability. Proceedings of the XIIth International Conference The Experience of Designing and Application of CAD Systems in Microelectronics CADSM 2013. Polyana, 56–59.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2018 Iryna Khomytska, Vasyl Teslyuk, Andriy Holovatyy, Oleksandr Morushko
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.