Development of an algorithm for code clone detection in source code based on abstract syntax tree

Yevhenii Kubiuk; Gennadiy Kyselov

doi:10.15587/2706-5448.2023.286472

Authors

Yevhenii Kubiuk National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute», Ukraine http://orcid.org/0000-0002-7086-0976
Gennadiy Kyselov National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute», Ukraine https://orcid.org/0000-0003-2682-3593

DOI:

https://doi.org/10.15587/2706-5448.2023.286472

Keywords:

clone detection, abstract syntax tree, AST, hashing, vulnerability search, false alarms

Abstract

The object of research of this work is the algorithm for searching for duplicates in the program code based on the Abstract Syntaxes Tree (AST). The main tasks solved within the framework of this study are the detection of duplicate code and the search for vulnerabilities in the program code.

The obtained results showed that the proposed algorithm is resistant to type 1 and 2 clones, which means its effectiveness in detecting similar code fragments with identical or variant text. However, for type 3 and 4 clones, the algorithm may show less efficiency due to the change in the AST structure for these types of clones.

Experimental studies of the proposed algorithm showed that the algorithm can detect matches between unrelated files due to the presence of typical AST chains present in many programs. This can lead to a certain level of false positives in the detection of duplicates.

Testing of the algorithm in the task of finding vulnerabilities showed that:

The best recognition is observed for the «SQL injection» vulnerability, but it also has the highest number of false positives.
Memory leak and null pointer dereferencing vulnerabilities are detected with equal effectiveness and false positives.
«Buffer overflow» has the lowest recognition rate but fewer false positives compared to «SQL injection».

The study showed that the use of AST allows for the effective detection of duplicate code and vulnerabilities in the software code. The developed tool can help software developers reduce maintenance efforts, improve code quality, and ensure software product security.

Author Biographies

Yevhenii Kubiuk, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

Department of System Design

Gennadiy Kyselov, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

PhD

Department of System Design

References

Koschke, R. (2007). Survey of research on software clones. In Dagstuhl Seminar Proceedings. Schloss Dagstuhl-Leibniz-Zentrum für Informatik. doi: https://doi.org/10.4230/DagSemProc.06301.13
Kim, M., Bergman, L., Lau, T., Notkin, D. (2004). An ethnographic study of copy and paste programming practices in OOPL. Proceedings. 2004 International Symposium on Empirical Software Engineering. ISESE'04, 83–92. doi: https://doi.org/10.1109/isese.2004.1334896
Ain, Q. U., Butt, W. H., Anwar, M. W., Azam, F., Maqbool, B. (2019). A Systematic Review on Code Clone Detection. IEEE Access, 7, 86121–86144. doi: https://doi.org/10.1109/access.2019.2918202
Kal Viertel, F. P., Brunotte, W., Strüber, D., Schneider, K. (2019). Detecting Security Vulnerabilities using Clone Detection and Community Knowledge. International Conferences on Software Engineering and Knowledge Engineering, 245–324. doi: https://doi.org/10.18293/seke2019-183
Nishi, M. A., Damevski, K. (2018). Scalable code clone detection and search based on adaptive prefix filtering. Journal of Systems and Software, 137, 130–142. doi: https://doi.org/10.1016/j.jss.2017.11.039
Kaliuzhna, T., Kubiuk, Y. (2022). Analysis of machine learning methods in the task of searching duplicates in the software code. Technology Audit and Production Reserves, 4 (2 (66)), 6–13. doi: https://doi.org/10.15587/2706-5448.2022.263235
Singh, M., Sharma, V. (2015). Detection of File Level Clone for High Level Cloning. Procedia Computer Science, 57, 915–922. doi: https://doi.org/10.1016/j.procs.2015.07.509
Yang, Y., Ren, Z., Chen, X., Jiang, H. (2018). Structural function based code clone detection using a new hybrid technique. 2018 IEEE 42nd annual computer software and applications conference (COMPSAC), 1, 286–291. doi: https://doi.org/10.1109/compsac.2018.00045
NVD. Available at: https://nvd.nist.gov/ Last accessed: 22.07.2023
Li, Z., Zou, D., Xu, S., Ou, X., Jin, H., Wang, S. et al. (2018). VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. Proceedings 2018 Network and Distributed System Security Symposium. doi: https://doi.org/10.14722/ndss.2018.23158
Chrenousov, A., Savchenko, A., Osadchyi, S., Kubiuk, Y., Kostenko, Y., Likhomanov, D. (2019). Deep learning based automatic software defects detection framework. Theoretical and Applied Cybersecurity, 1 (1). doi: https://doi.org/10.20535/tacs.2664-29132019.1.169086
Appel, A. W. (2015). Verification of a Cryptographic Primitive. ACM Transactions on Programming Languages and Systems, 37 (2), 1–31. doi: https://doi.org/10.1145/2701415

Development of an algorithm for code clone detection in source code based on abstract syntax tree

Authors

DOI:

Keywords:

Abstract

Author Biographies

Yevhenii Kubiuk, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

Gennadiy Kyselov, National Technical University of Ukraine «Igor Sikorsky Kyiv Polytechnic Institute»

References

Downloads

Published

How to Cite

Issue

Section

License

Information site

Language

Information

Developed By

Current Issue