A method for the identification of scientists' research areas based on a cluster analysis of scientific publications

Authors

DOI:

https://doi.org/10.15587/1729-4061.2017.112323

Keywords:

clustering, area of scientific research, co-citation graph, locally-sensitive hashing

Abstract

A method for the clustering of scientific publications is proposed in order to identify areas of scientists' research areas. In this method, the links between scientific publications and citations are represented in the form of a directed graph. There are two proposed techniques for finding a distance between publications in the method for clustering the scientific publications. The first technique is based on the calculation of the length of the minimal route between the corresponding vertices of the graph of links between publications through citation. The second procedure is based on the calculation of the degree of closeness by the content of abstracts of these publications using the Hamming distance on the basis of a locally-sensitive hashing method. After the application of the method for clustering this graph, considering the specificity of input data, it is proposed to merge clusters by the criterion of proximity of centers of gravity.

To identify scientists' research areas, it is proposed to initially use one of the expert methods for establishing a correspondence between the built clusters and the appropriate verbal representations of scientific areas. Next, to form for each scientist a set of areas for scientific research, taking into account the mapping of a set of scientists onto a number of scientific areas.

The methods proposed could be used in scientific and educational institutions, as well as private companies that are engaged in the creation of science-intensive technologies.

Author Biographies

Andrii Biloshchytskyi, Taras Shevchenko National University of Kyiv Volodymyrska str., 60, Kyiv, Ukraine, 01033

Doctor of Technical Sciences, Professor

Department of Network and Internet Technologies

Alexander Kuchansky, Kyiv National University of Construction and Architecture Povitroflotskyi ave., 31, Kyiv, Ukraine, 03037

PhD, Associate Professor

Department of Cybersecurity and Computer Engineering

Yurii Andrashko, Uzhhorod National University Narodna sq., 3, Uzhhorod, Ukraine, 88000

Lecturer

Department of System Analysis and Optimization Theory 

Svitlana Biloshchytska, Kyiv National University of Construction and Architecture Povitroflotskyi ave., 31, Kyiv, Ukraine, 03037

PhD, Associate Professor

Department of Information Technology Designing and Applied Mathematics

Oleksandr Kuzka, Uzhhorod National University Narodna sq., 3, Uzhhorod, Ukraine, 88000

PhD, Associate Professor

Department of System Analysis and Optimization Theory 

Yevheniia Shabala, Kyiv National University of Construction and Architecture Povitroflotskyi ave., 31, Kyiv, Ukraine, 03037

PhD, Associate Professor

Department of Cybersecurity and Computer Engineering

Tamara Lyashchenko, Kyiv National University of Construction and Architecture Povitroflotskyi ave., 31, Kyiv, Ukraine, 03037

Senior Lecturer

Department of Information Technologies

References

  1. Bhattacharya, S., Basu, P. K. (1998). Mapping a research area at the micro level using co-word analysis. Scientometrics, 43 (3), 359–372. doi: 10.1007/bf02457404
  2. Glänzel, W. (2012). Bibliometric methods for detecting and analysing emerging research topics. El Profesional de La Informacion, 21 (2), 194–201. doi: 10.3145/epi.2012.mar.11
  3. Mulesa, O., Geche, F., Batyuk, A. (2015). Information technology for determining structure of social group based on fuzzy c-means. 2015 Xth International Scientific and Technical Conference “Computer Sciences and Information Technologies” (CSIT). doi: 10.1109/stc-csit.2015.7325431
  4. Shvets, A., Devyatkin, D., Sochenkov, I., Tikhomirov, I., Popov, K., Yarygin, K. (2015). Detection of current research directions based on full-text clustering. 2015 Science and Information Conference (SAI). doi: 10.1109/sai.2015.7237186
  5. Lizunov, P., Biloshchytskyi, A., Kuchansky, A., Biloshchytska, S., Chala, L. (2016). Detection of near dublicates in tables based on the locality-sensitive hashing method and the nearest neighbor method. Eastern-European Journal of Enterprise Technologies, 6 (4 (84)), 4–10. doi: 10.15587/1729-4061.2016.86243
  6. Biloshchytskyi, A., Kuchansky, A., Biloshchytska, S., Dubnytska, A. (2017). Conceptual model of automatic system of near duplicates detection in electronic documents. 2017 14th International Conference The Experience of Designing and Application of CAD Systems in Microelectronics (CADSM). doi: 10.1109/cadsm.2017.7916155
  7. Samatova, N., Hendrix, W., Jenkins, J., Padmanabhan, K., Chakraborty, A. (2013). Practical Graph Mining with R. Chapman and Hall/CRC, 495.
  8. Blondel, V. D., Guillaume, J.-L., Lambiotte, R., Lefebvre, E. (2008). Fast unfolding of communities in large networks. Journal of Statistical Mechanics: Theory and Experiment, 2008 (10), P10008. doi: 10.1088/1742-5468/2008/10/p10008
  9. Seifi, M., Guillaume, J.-L. (2012). Community cores in evolving networks. Proceedings of the 21st International Conference Companion on World Wide Web – WWW ’12 Companion, 1173–1180. doi: 10.1145/2187980.2188258
  10. Ovelgönne, M., Geyer-Schulz, A. (2013). An ensemble learning strategy for graph clustering. Contemporary Mathematics, 588, 187–205. doi: 10.1090/conm/588/11701
  11. Zhang, T., Ramakrishnan, R., Livny, M. (1996). BIRCH: an efficient data clustering method for very large databases. Proceedings of the 1996 ACM SIGMOD international conference on Management of data, 25 (2), 103–114. doi: 10.1145/233269.233324
  12. Otradskaya, T., Gogunsky, V. (2016). Development process models for evaluation of performance of the educational establishments. Eastern-European Journal of Enterprise Technologies, 3 (3 (81)), 12–22. doi: 10.15587/1729-4061.2016.66562
  13. Otradskaya, T., Gogunskii, V., Antoshchuk, S., Kolesnikov, O. (2016). Development of parametric model of prediction and evaluation of the quality level of educational institutions. Eastern-European Journal of Enterprise Technologies, 5 (3 (83)), 12–21. doi: 10.15587/1729-4061.2016.80790
  14. Biloshchytskyi, A., Kuchansky, A., Andrashko, Y., Biloshchytska, S., Kuzka, O., Terentyev, О. (2017). Evaluation methods of the results of scientific research activity of scientists based on the analysis of publication citations. Eastern-European Journal of Enterprise Technologies, 3 (2 (87)), 4–10. doi: 10.15587/1729-4061.2017.103651
  15. Jain, A. K., Murty, M. N., Flynn, P. J. (1999). Data clustering: a review. ACM Computing Surveys, 31 (3), 264–323. doi: 10.1145/331499.331504

Downloads

Published

2017-10-30

How to Cite

Biloshchytskyi, A., Kuchansky, A., Andrashko, Y., Biloshchytska, S., Kuzka, O., Shabala, Y., & Lyashchenko, T. (2017). A method for the identification of scientists’ research areas based on a cluster analysis of scientific publications. Eastern-European Journal of Enterprise Technologies, 5(2 (89), 4–11. https://doi.org/10.15587/1729-4061.2017.112323