Authenticity of authorship of scientific publications using latent semantic analysis

Authors

DOI:

https://doi.org/10.15587/1729-4061.2014.23942

Keywords:

identification, publication, indexing, latent, semantic analysis, classification, information, singular, matrix

Abstract

In this research, a latent semantic analysis is used to solve the problem of identifying authorship of scientific publications. It enables the identification of keywords related to a particular subject. It is proposed to use the keywords for identifying similar publications.

This allows distinguishing publications of different authors with the same last names or even initials. Many scientometric databases contain records of publications with the same field “author”, but with completely different activities, sources, etc. As a result of the studies, it was found that publicationsof namesakes have a  different set of keywords and, accordingly, different subject-matters.

Applying the latent semantic analysis can be used to classify these publications, as well as highlighting the keywords, which bind to the author, in turn, enables determining his publications with some accuracy. The research results allow automating the creation of local databases of researchers with a list of their scientific papers.

Author Biographies

Андрей Сергеевич Коляда, Odessa National Polytechnic University Shevchenko Ave 1 , Odessa, Ukraine, 65044

Graduate student

Department of Systems Management Life Safety

Виктор Дмитриевич Гогунский, Odessa National Polytechnic University Shevchenko Ave 1 , Odessa, Ukraine, 65044

Doctor of Technical Sciences, Professor

Department of Systems Management Life Safety

References

  1. Deerwester, Scott Indexing by Latent Semantic Analysis [Text] / Scott Deerwester, Susan T. Dumais, George W. Furnas, Thomas K. Landauer, Richard Harshman // Journal of the American society for information science. – 1990. - № 41(6). – P. 391-407.
  2. Daud, Ali Knowledge discovery through directed probabilistic topic models: a survey [Text] / Ali Daud, Juanzi Li, Lizhu Zhou, Faqir Muhammad // In Proceedings of Frontiers of Computer Science in China, 2010. – P. 280–301.
  3. Řehůřek, R. Subspace tracking for latent semantic analysis [Text] / R. Řehůřek. – Advances in Information Retrieval, 2011. - P. 289–300.
  4. Коляда, А. С. Латентно семантический подход для анализа информации из наукометрических баз данных [Текст] / А. С. Коляда // Управління розвитком складних систем. – 2014. – Вып. 17. – C. 90–94.
  5. Стенин, А. А. Латентно-семантический метод извлечения информации из интернет ресурсов [Текст] / А. А. Стенин, Ю. А. Тимошин, Е. Ю. Мелкумян, В. В. Курбанов // Восточно-Европейский журнал передовых технологий. – 2013. – Т. 4, № 9 (64). – С. 19–22.
  6. Pedersen, T. Duluth: Word Sense Induction Applied to Web Page Clustering [Text] : proc. of the 7th inter. workshop / T. Pedersen // Semantic Evaluation (SemEval 2013), in conjunction with the Second Joint Conference on Lexical and Computational Semantics (*SEM-2013), 2013. – P. 202–206.
  7. Jurgens, D. The S-Space Package: An Open Source Package for Word Space Models [Text] : proc. ACLDemos ‘10 / D. Jurgens, K. Stevens // Proceedings of the ACL System Demonstrations, 2010. – P. 30–35.
  8. Řehůřek, R. Software Framework for Topic Modelling with Large Corpora [Text] : proc. of the LREC 2010 workshop / R. Řehůřek, P. Sojka // New Challenges for NLP Frameworks, 2010. – P. 45–50.
  9. Hofmann, T. Probabilistic Latent Semantic Indexing [Text] : proc. of the twenty-second annual inter. SIGIR conf. / T. Hofmann // Research and Development in Information Retrieval, 1999. – P. 50–57.
  10. Коляда, А. С. Управління проектами: стан та перспективи [Текст] : матеріали IX міжнар. наук.-практ. конф. / А. С. Коляда, А. А. Негри, Е. В. Колесникова. – Миколаїв : НУК, 2013. – 348 с.
  11. Коляда, А. С. Автоматизация извлечения информации из наукометрических баз данных [Текст] / А. С. Коляда, В. Д. Гогунский // Управління розвитком складних систем. – 2013. – № 16. – С. 96–99.
  12. Roger, B. Bradford An empirical study of required dimensionality for large-scale latent semantic indexing applications [Text] : proc. of the 17th ACM conf. / B. Roger Bradford // Information and Knowledge Management, 2008. – P. 153–162.
  13. Палагин, А. Формализация проблемы извлечения знаний из естественно языковых текстов [Текст] / А. Палагин, С. Кривый, Н. Петренко, Д. Бибиков // Information technologies & knowledge, 2012. – 100 с.
  14. Бурков, В. Н. Параметры цитируемости научных публикаций в наукометрических базах данных [Текст] / В. Н. Бурков, А. А. Белощицкий, В. Д. Гогунский // Управління розвитком складних систем. – 2013. – № 15. – С. 134–139.
  15. Білощицький, А. О. Наукометричні бази та індикатори цитування наукових публікацій [Текст] / А. О. Білощицький, В. Д. Гогунський // Інформаційні технології в освіті, науці та виробництві. – 2013. – Вип. 4 (5). – C. 198–203.
  16. Scott, Deerwester, Susan, T. Dumais, George, W. Furnas, Thomas, K. Landauer, Richard, Harshman (1990). Indexing by Latent Semantic Analysis. Journal of the American society for information science, 41 (6), 391-407.
  17. Ali, Daud, Juanzi, Li, Lizhu, Zhou, Faqir, Muhammad (2010). Knowledge discovery through directed probabilistic topic models: a survey. In Proceedings of Frontiers of Computer Science in China, 280–301.
  18. Řehůřek, R. (2011). Subspace tracking for latent semantic analysis. Advances in Information Retrieval. 289–300.
  19. Kolyada, A. (2014). Latent Semantic approach for information analysis from science metric databases. Management of Development of Complex Systems, 17, 90–94.
  20. Stenin, A., Timoshin, Yu., Melkumyan, E., Kurbanov, V. (2013). Latent semantic method of extracting information from online resources. Eastern-European Journal of Enterprise Technologies, 9 (64), 19–22.
  21. Pedersen, T. Duluth (2013). Word Sense Induction Applied to Web Page Clustering. Proceedings of the 7th International Workshop on Semantic Evaluation (SemEval 2013), in conjunction with the Second Joint Conference on Lexical and Computational Semantics (*SEM-2013). Atlanta, Georgia, 202-206.
  22. Jurgens, D., Stevens, K. (2010). The S-Space Package: An Open Source Package for Word Space Models. Proceeding ACLDemos ‘10 Proceedings of the ACL System Demonstrations, 30-35.
  23. Řehůřek, R., Sojka, P. (2010). Software Framework for Topic Modelling with Large Corpora. Proceedings of the LREC 2010 Workshop on New Challenges for NLP Frameworks. Valletta, Malta, ELRA, 45–50.
  24. Hofmann, T. (1999). Probabilistic Latent Semantic Indexing. Proceedings of the Twenty-Second Annual International SIGIR Conference on Research and Development in Information Retrieval, 50–57.
  25. Kolyada, A., Negri, A., Kolesnikova, E. (2013). Development of the information and analytical system for extraction and processing of scientometric databases. Project management: state and prospects, 348.
  26. Kolyada, A., Gogunsky, V. (2013). Automating the extraction of information from scientometric databases. Management of complex systems, 16, 96-99.
  27. Roger, B. Bradford (2008). An empirical study of required dimensionality for large-scale latent semantic indexing applications. In proceeding of: Proceedings of the 17th ACM Conference on Information and Knowledge Management, Napa Valley, California, USA, 153–162.
  28. Palagin, A., Kriviy, S., Petrenko, N., Bibikov, D. (2012). Formalization of the problem of knowledge extraction from natural language texts. Information technologies & knowledge, 100.
  29. Burkov, V. N., Beloschitsky, A. A., Gogunsky, V. D. (2013). Options citation of scientific publications in scientometric databases. Management of Development of Complex Systems, 15, 134–139.
  30. Beloschitsky, A., Gogunsky, V. (2013). Scientometric indicators and citation database of scientific publications. Information technology in education, science and production, 4, 198–203.

Published

2014-06-25

How to Cite

Коляда, А. С., & Гогунский, В. Д. (2014). Authenticity of authorship of scientific publications using latent semantic analysis. Eastern-European Journal of Enterprise Technologies, 3(2(69), 36–40. https://doi.org/10.15587/1729-4061.2014.23942