Development of an algorithm for code clone detection in source code based on abstract syntax tree

Authors

DOI:

https://doi.org/10.15587/2706-5448.2023.286472

Keywords:

clone detection, abstract syntax tree, AST, hashing, vulnerability search, false alarms

Abstract

The object of research of this work is the algorithm for searching for duplicates in the program code based on the Abstract Syntaxes Tree (AST). The main tasks solved within the framework of this study are the detection of duplicate code and the search for vulnerabilities in the program code.

The obtained results showed that the proposed algorithm is resistant to type 1 and 2 clones, which means its effectiveness in detecting similar code fragments with identical or variant text. However, for type 3 and 4 clones, the algorithm may show less efficiency due to the change in the AST structure for these types of clones.

Experimental studies of the proposed algorithm showed that the algorithm can detect matches between unrelated files due to the presence of typical AST chains present in many programs. This can lead to a certain level of false positives in the detection of duplicates.

Testing of the algorithm in the task of finding vulnerabilities showed that:

  1. The best recognition is observed for the «SQL injection» vulnerability, but it also has the highest number of false positives.
  2. Memory leak and null pointer dereferencing vulnerabilities are detected with equal effectiveness and false positives.
  3. «Buffer overflow» has the lowest recognition rate but fewer false positives compared to «SQL injection».

The study showed that the use of AST allows for the effective detection of duplicate code and vulnerabilities in the software code. The developed tool can help software developers reduce maintenance efforts, improve code quality, and ensure software product security.

Author Biographies

Yevhenii Kubiuk, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

Department of System Design

Gennadiy Kyselov, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

PhD

Department of System Design

References

  1. Koschke, R. (2007). Survey of research on software clones. In Dagstuhl Seminar Proceedings. Schloss Dagstuhl-Leibniz-Zentrum für Informatik. doi: https://doi.org/10.4230/DagSemProc.06301.13
  2. Kim, M., Bergman, L., Lau, T., Notkin, D. (2004). An ethnographic study of copy and paste programming practices in OOPL. Proceedings. 2004 International Symposium on Empirical Software Engineering. ISESE'04, 83–92. doi: https://doi.org/10.1109/isese.2004.1334896
  3. Ain, Q. U., Butt, W. H., Anwar, M. W., Azam, F., Maqbool, B. (2019). A Systematic Review on Code Clone Detection. IEEE Access, 7, 86121–86144. doi: https://doi.org/10.1109/access.2019.2918202
  4. Kal Viertel, F. P., Brunotte, W., Strüber, D., Schneider, K. (2019). Detecting Security Vulnerabilities using Clone Detection and Community Knowledge. International Conferences on Software Engineering and Knowledge Engineering, 245–324. doi: https://doi.org/10.18293/seke2019-183
  5. Nishi, M. A., Damevski, K. (2018). Scalable code clone detection and search based on adaptive prefix filtering. Journal of Systems and Software, 137, 130–142. doi: https://doi.org/10.1016/j.jss.2017.11.039
  6. Kaliuzhna, T., Kubiuk, Y. (2022). Analysis of machine learning methods in the task of searching duplicates in the software code. Technology Audit and Production Reserves, 4 (2 (66)), 6–13. doi: https://doi.org/10.15587/2706-5448.2022.263235
  7. Singh, M., Sharma, V. (2015). Detection of File Level Clone for High Level Cloning. Procedia Computer Science, 57, 915–922. doi: https://doi.org/10.1016/j.procs.2015.07.509
  8. Yang, Y., Ren, Z., Chen, X., Jiang, H. (2018). Structural function based code clone detection using a new hybrid technique. 2018 IEEE 42nd annual computer software and applications conference (COMPSAC), 1, 286–291. doi: https://doi.org/10.1109/compsac.2018.00045
  9. NVD. Available at: https://nvd.nist.gov/ Last accessed: 22.07.2023
  10. Li, Z., Zou, D., Xu, S., Ou, X., Jin, H., Wang, S. et al. (2018). VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. Proceedings 2018 Network and Distributed System Security Symposium. doi: https://doi.org/10.14722/ndss.2018.23158
  11. Chrenousov, A., Savchenko, A., Osadchyi, S., Kubiuk, Y., Kostenko, Y., Likhomanov, D. (2019). Deep learning based automatic software defects detection framework. Theoretical and Applied Cybersecurity, 1 (1). doi: https://doi.org/10.20535/tacs.2664-29132019.1.169086
  12. Appel, A. W. (2015). Verification of a Cryptographic Primitive. ACM Transactions on Programming Languages and Systems, 37 (2), 1–31. doi: https://doi.org/10.1145/2701415
Development of an algorithm for code clone detection in source code based on abstract syntax tree

Downloads

Published

2023-08-29

How to Cite

Kubiuk, Y., & Kyselov, G. (2023). Development of an algorithm for code clone detection in source code based on abstract syntax tree. Technology Audit and Production Reserves, 4(2(72), 33–36. https://doi.org/10.15587/2706-5448.2023.286472

Issue

Section

Information Technologies