Transformer-based models application for bug detection in source code
DOI:
https://doi.org/10.15587/2706-5448.2024.310822Keywords:
transformers, large language models, bug detection, defect detection, static code analysis, neural networksAbstract
This paper explores the use of transformer-based models for bug detection in source code, aiming to better understand the capacity of these models to learn complex patterns and relationships within the code. Traditional static analysis tools are highly limited in their ability to detect semantic errors, resulting in numerous defects passing through to the code execution stage. This research represents a step towards enhancing static code analysis using neural networks.
The experiments were designed as binary classification tasks to detect buggy code snippets, each targeting a specific defect type such as NameError, TypeError, IndexError, AttributeError, ValueError, EOFError, SyntaxError, and ModuleNotFoundError. Utilizing the «RunBugRun» dataset, which relies on code execution results, the models – BERT, CodeBERT, GPT-2, and CodeT5 – were fine-tuned and compared under identical conditions and hyperparameters. Performance was evaluated using F1-Score, Precision, and Recall.
The results indicated that transformer-based models, especially CodeT5 and CodeBERT, were effective in identifying various defects, demonstrating their ability to learn complex code patterns. However, performance varied by defect type, with some defects like IndexError and TypeError being more challenging to detect. The outcomes underscore the importance of high-quality, diverse training data and highlight the potential of transformer-based models to achieve more accurate early defect detection.
Future research should further explore advanced transformer architectures for detecting complicated defects, and investigate the integration of additional contextual information to the detection process. This study highlights the potential of modern machine learning architectures to advance software engineering practices, leading to more efficient and reliable software development.
References
- Tassey, G. (2002). The Economic Impacts of Inadequate Infrastructure for Software Testing (NIST Planning Report 02-3). RTI International. National Institute of Standards and Technology. Available at: https://www.nist.gov/system/files/documents/director/planning/report02-3.pdf Last accessed: 22.07.2024.
- Nachtigall, M., Schlichtig, M., Bodden, E. (2022). A large-scale study of usability criteria addressed by static analysis tools. Proceedings of the 31st ACM SIGSOFT International Symposium on Software Testing and Analysis. https://doi.org/10.1145/3533767.3534374
- Vaswani, A., Shazeer, N. M., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N. et al. (2017). Attention is All you Need. Neural Information Processing Systems. https://doi.org/10.48550/arXiv.1706.03762
- Bahdanau, D., Cho, K., Bengio, Y. (2014). Neural Machine Translation by Jointly Learning to Align and Translate. CoRR, abs/1409.0473. https://doi.org/10.48550/arXiv.1409.0473
- Hou, X., Zhao, Y., Liu, Y., Yang, Z., Wang, K., Li, L. et al. (2023). Large Language Models for Software Engineering: A Systematic Literature Review. ArXiv, abs/2308.10620. https://doi.org/10.48550/arXiv.2308.10620
- Sun, Z., Li, L., Liu, Y., Du, X. (2022). On the Importance of Building High-quality Training Datasets for Neural Code Search. 2022 IEEE/ACM 44th International Conference on Software Engineering (ICSE), 1609–1620. https://doi.org/10.1145/3510003.3510160
- Prenner, J. A., Robbes, R. (2023). RunBugRun – An Executable Dataset for Automated Program Repair. ArXiv, abs/2304.01102. https://doi.org/10.48550/arXiv.2304.01102
- Hu, J. E., Shen, Y., Wallis, P., Allen-Zhu, Z., Li, Y., Wang, S., Chen, W. (2021). LoRA: Low-Rank Adaptation of Large Language Models. ArXiv, abs/2106.09685. https://doi.org/10.48550/arXiv.2106.09685
- Built-in exceptions. Python Documentation. Python Software Foundation. Available at: https://docs.python.org/3/library/exceptions.html Last accessed: 22.07.2024
- Marjanov, T., Pashchenko, I., Massacci, F. (2022). Machine Learning for Source Code Vulnerability Detection: What Works and What Isn’t There Yet. IEEE Security & Privacy, 20 (5), 60–76. https://doi.org/10.1109/msec.2022.3176058
- Fang, C., Miao, N., Srivastav, S., Liu, J., Zhang, R., Fang, R. et al. (2023). Large Language Models for Code Analysis: Do LLMs Really Do Their Job? ArXiv, abs/2310.12357. https://doi.org/10.48550/arXiv.2310.12357
- Xiao, Y., Zuo, X., Xue, L., Wang, K., Dong, J. S., Beschastnikh, I. (2023). Empirical Study on Transformer-based Techniques for Software Engineering. ArXiv, abs/2310.00399. https://doi.org/10.48550/arXiv.2310.00399
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Illia Vokhranov, Bogdan Bulakh
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.