Authenticity of authorship of scientific publications using latent semantic analysis
DOI:
https://doi.org/10.15587/1729-4061.2014.23942Keywords:
identification, publication, indexing, latent, semantic analysis, classification, information, singular, matrixAbstract
In this research, a latent semantic analysis is used to solve the problem of identifying authorship of scientific publications. It enables the identification of keywords related to a particular subject. It is proposed to use the keywords for identifying similar publications.
This allows distinguishing publications of different authors with the same last names or even initials. Many scientometric databases contain records of publications with the same field “author”, but with completely different activities, sources, etc. As a result of the studies, it was found that publicationsof namesakes have a different set of keywords and, accordingly, different subject-matters.
Applying the latent semantic analysis can be used to classify these publications, as well as highlighting the keywords, which bind to the author, in turn, enables determining his publications with some accuracy. The research results allow automating the creation of local databases of researchers with a list of their scientific papers.
References
- Deerwester, Scott Indexing by Latent Semantic Analysis [Text] / Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Richard Harshman // Journal of the American society for information science. – 1990. - № 41(6). – P. 391-407.
- Daud, Ali Knowledge discovery through directed probabilistic topic models: a survey [Text] / Ali Daud, Juanzi Li, Lizhu Zhou, Faqir Muhammad // In Proceedings of Frontiers of Computer Science in China, 2010. – P. 280–301.
- Řehůřek, R. Subspace tracking for latent semantic analysis [Text] / R. Řehůřek. – Advances in Information Retrieval, 2011. - P. 289–300.
- Коляда, А. С. Латентно семантический подход для анализа информации из наукометрических баз данных [Текст] / А. С. Коляда // Управління розвитком складних систем. – 2014. – Вып. 17. – C. 90–94.
- Стенин, А. А. Латентно-семантический метод извлечения информации из интернет ресурсов [Текст] / А. А. Стенин, Ю. А. Тимошин, Е. Ю. Мелкумян, В. В. Курбанов // Восточно-Европейский журнал передовых технологий. – 2013. – Т. 4, № 9 (64). – С. 19–22.
- Pedersen, T. Duluth: Word Sense Induction Applied to Web Page Clustering [Text] : proc. of the 7th inter. workshop / T. Pedersen // Semantic Evaluation (SemEval 2013), in conjunction with the Second Joint Conference on Lexical and Computational Semantics (*SEM-2013), 2013. – P. 202–206.
- Jurgens, D. The S-Space Package: An Open Source Package for Word Space Models [Text] : proc. ACLDemos ‘10 / D. Jurgens, K. Stevens // Proceedings of the ACL System Demonstrations, 2010. – P. 30–35.
- Řehůřek, R. Software Framework for Topic Modelling with Large Corpora [Text] : proc. of the LREC 2010 workshop / R. Řehůřek, P. Sojka // New Challenges for NLP Frameworks, 2010. – P. 45–50.
- Hofmann, T. Probabilistic Latent Semantic Indexing [Text] : proc. of the twenty-second annual inter. SIGIR conf. / T. Hofmann // Research and Development in Information Retrieval, 1999. – P. 50–57.
- Коляда, А. С. Управління проектами: стан та перспективи [Текст] : матеріали IX міжнар. наук.-практ. конф. / А. С. Коляда, А. А. Негри, Е. В. Колесникова. – Миколаїв : НУК, 2013. – 348 с.
- Коляда, А. С. Автоматизация извлечения информации из наукометрических баз данных [Текст] / А. С. Коляда, В. Д. Гогунский // Управління розвитком складних систем. – 2013. – № 16. – С. 96–99.
- Roger, B. Bradford An empirical study of required dimensionality for large-scale latent semantic indexing applications [Text] : proc. of the 17th ACM conf. / B. Roger Bradford // Information and Knowledge Management, 2008. – P. 153–162.
- Палагин, А. Формализация проблемы извлечения знаний из естественно языковых текстов [Текст] / А. Палагин, С. Кривый, Н. Петренко, Д. Бибиков // Information technologies & knowledge, 2012. – 100 с.
- Бурков, В. Н. Параметры цитируемости научных публикаций в наукометрических базах данных [Текст] / В. Н. Бурков, А. А. Белощицкий, В. Д. Гогунский // Управління розвитком складних систем. – 2013. – № 15. – С. 134–139.
- Білощицький, А. О. Наукометричні бази та індикатори цитування наукових публікацій [Текст] / А. О. Білощицький, В. Д. Гогунський // Інформаційні технології в освіті, науці та виробництві. – 2013. – Вип. 4 (5). – C. 198–203.
- Scott, Deerwester, Susan, T. Dumais, George, W. Furnas, Thomas, K. Landauer, Richard, Harshman (1990). Indexing by Latent Semantic Analysis. Journal of the American society for information science, 41 (6), 391-407.
- Ali, Daud, Juanzi, Li, Lizhu, Zhou, Faqir, Muhammad (2010). Knowledge discovery through directed probabilistic topic models: a survey. In Proceedings of Frontiers of Computer Science in China, 280–301.
- Řehůřek, R. (2011). Subspace tracking for latent semantic analysis. Advances in Information Retrieval. 289–300.
- Kolyada, A. (2014). Latent Semantic approach for information analysis from science metric databases. Management of Development of Complex Systems, 17, 90–94.
- Stenin, A., Timoshin, Yu., Melkumyan, E., Kurbanov, V. (2013). Latent semantic method of extracting information from online resources. Eastern-European Journal of Enterprise Technologies, 9 (64), 19–22.
- Pedersen, T. Duluth (2013). Word Sense Induction Applied to Web Page Clustering. Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013), in conjunction with the Second Joint Conference on Lexical and Computational Semantics (*SEM-2013). Atlanta, Georgia, 202-206.
- Jurgens, D., Stevens, K. (2010). The S-Space Package: An Open Source Package for Word Space Models. Proceeding ACLDemos ‘10 Proceedings of the ACL System Demonstrations, 30-35.
- Řehůřek, R., Sojka, P. (2010). Software Framework for Topic Modelling with Large Corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Valletta, Malta, ELRA, 45–50.
- Hofmann, T. (1999). Probabilistic Latent Semantic Indexing. Proceedings of the Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval, 50–57.
- Kolyada, A., Negri, A., Kolesnikova, E. (2013). Development of the information and analytical system for extraction and processing of scientometric databases. Project management: state and prospects, 348.
- Kolyada, A., Gogunsky, V. (2013). Automating the extraction of information from scientometric databases. Management of complex systems, 16, 96-99.
- Roger, B. Bradford (2008). An empirical study of required dimensionality for large-scale latent semantic indexing applications. In proceeding of: Proceedings of the 17th ACM Conference on Information and Knowledge Management, Napa Valley, California, USA, 153–162.
- Palagin, A., Kriviy, S., Petrenko, N., Bibikov, D. (2012). Formalization of the problem of knowledge extraction from natural language texts. Information technologies & knowledge, 100.
- Burkov, V. N., Beloschitsky, A. A., Gogunsky, V. D. (2013). Options citation of scientific publications in scientometric databases. Management of Development of Complex Systems, 15, 134–139.
- Beloschitsky, A., Gogunsky, V. (2013). Scientometric indicators and citation database of scientific publications. Information technology in education, science and production, 4, 198–203.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2014 Андрей Сергеевич Коляда, Виктор Дмитриевич Гогунский
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.