Development of an algorithm for code clone detection in source code based on abstract syntax tree
DOI:
https://doi.org/10.15587/2706-5448.2023.286472Keywords:
clone detection, abstract syntax tree, AST, hashing, vulnerability search, false alarmsAbstract
The object of research of this work is the algorithm for searching for duplicates in the program code based on the Abstract Syntaxes Tree (AST). The main tasks solved within the framework of this study are the detection of duplicate code and the search for vulnerabilities in the program code.
The obtained results showed that the proposed algorithm is resistant to type 1 and 2 clones, which means its effectiveness in detecting similar code fragments with identical or variant text. However, for type 3 and 4 clones, the algorithm may show less efficiency due to the change in the AST structure for these types of clones.
Experimental studies of the proposed algorithm showed that the algorithm can detect matches between unrelated files due to the presence of typical AST chains present in many programs. This can lead to a certain level of false positives in the detection of duplicates.
Testing of the algorithm in the task of finding vulnerabilities showed that:
- The best recognition is observed for the «SQL injection» vulnerability, but it also has the highest number of false positives.
- Memory leak and null pointer dereferencing vulnerabilities are detected with equal effectiveness and false positives.
- «Buffer overflow» has the lowest recognition rate but fewer false positives compared to «SQL injection».
The study showed that the use of AST allows for the effective detection of duplicate code and vulnerabilities in the software code. The developed tool can help software developers reduce maintenance efforts, improve code quality, and ensure software product security.
References
- Koschke, R. (2007). Survey of research on software clones. In Dagstuhl Seminar Proceedings. Schloss Dagstuhl-Leibniz-Zentrum für Informatik. doi: https://doi.org/10.4230/DagSemProc.06301.13
- Kim, M., Bergman, L., Lau, T., Notkin, D. (2004). An ethnographic study of copy and paste programming practices in OOPL. Proceedings. 2004 International Symposium on Empirical Software Engineering. ISESE'04, 83–92. doi: https://doi.org/10.1109/isese.2004.1334896
- Ain, Q. U., Butt, W. H., Anwar, M. W., Azam, F., Maqbool, B. (2019). A Systematic Review on Code Clone Detection. IEEE Access, 7, 86121–86144. doi: https://doi.org/10.1109/access.2019.2918202
- Kal Viertel, F. P., Brunotte, W., Strüber, D., Schneider, K. (2019). Detecting Security Vulnerabilities using Clone Detection and Community Knowledge. International Conferences on Software Engineering and Knowledge Engineering, 245–324. doi: https://doi.org/10.18293/seke2019-183
- Nishi, M. A., Damevski, K. (2018). Scalable code clone detection and search based on adaptive prefix filtering. Journal of Systems and Software, 137, 130–142. doi: https://doi.org/10.1016/j.jss.2017.11.039
- Kaliuzhna, T., Kubiuk, Y. (2022). Analysis of machine learning methods in the task of searching duplicates in the software code. Technology Audit and Production Reserves, 4 (2 (66)), 6–13. doi: https://doi.org/10.15587/2706-5448.2022.263235
- Singh, M., Sharma, V. (2015). Detection of File Level Clone for High Level Cloning. Procedia Computer Science, 57, 915–922. doi: https://doi.org/10.1016/j.procs.2015.07.509
- Yang, Y., Ren, Z., Chen, X., Jiang, H. (2018). Structural function based code clone detection using a new hybrid technique. 2018 IEEE 42nd annual computer software and applications conference (COMPSAC), 1, 286–291. doi: https://doi.org/10.1109/compsac.2018.00045
- NVD. Available at: https://nvd.nist.gov/ Last accessed: 22.07.2023
- Li, Z., Zou, D., Xu, S., Ou, X., Jin, H., Wang, S. et al. (2018). VulDeePecker: A Deep Learning-Based System for Vulnerability Detection. Proceedings 2018 Network and Distributed System Security Symposium. doi: https://doi.org/10.14722/ndss.2018.23158
- Chrenousov, A., Savchenko, A., Osadchyi, S., Kubiuk, Y., Kostenko, Y., Likhomanov, D. (2019). Deep learning based automatic software defects detection framework. Theoretical and Applied Cybersecurity, 1 (1). doi: https://doi.org/10.20535/tacs.2664-29132019.1.169086
- Appel, A. W. (2015). Verification of a Cryptographic Primitive. ACM Transactions on Programming Languages and Systems, 37 (2), 1–31. doi: https://doi.org/10.1145/2701415
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2023 Yevhenii Kubiuk, Gennadiy Kyselov
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.