Detection and classification of threats and vulnerabilities on hacker forums based on machine learning

Authors

DOI:

https://doi.org/10.15587/1729-4061.2024.306522

Keywords:

cybersecurity, hacker forum, threats identification, data classification, machine learning

Abstract

The object of this study is the process of detecting threats and vulnerabilities in hacker forums, which are a well-known source of potential dangers for Internet users. However, the problem of analyzing and classifying data from these forums is its complexity due to such features of the participants' language as specific slang, jargon, etc., which requires the use of modern tools of their processing. This paper explores the application of machine learning to devise an effective method for analyzing sentiment and trends in hacker forums to identify potential threats and vulnerabilities in cyberspace. All necessary stages of the process of detecting threats and vulnerabilities have been developed, ranging from data collection and preprocessing to the training of a model that is capable of processing “raw” unstructured data from hacker forums. The implementation of six popular machine learning algorithms, namely k Nearest Neighbors (kNN), Random Forest, Naive Bayes, Logistic Regression, Support Vector Machines (SVM), and Decision Tree algorithms have been studied with a view to determining their efficiency of threat and vulnerability detection and classification. The experiments have been conducted on real data (150,000 messengers). It has been determined that the Random Forest algorithm coped with the task the best (accuracy=0.89, recall=0.84, precision=0.91, F1-score=0.87 and ROC-AUC=0.89). The proposed tool based on machine learning not only collects data that poses a potential threat but also processes and classifies it according to the specified keywords. This allows detecting threats and vulnerabilities at a high speed. The results of the study make it possible to identify potential trends in threats and vulnerabilities. This will contribute to the improvement of cybersecurity systems and ensure more reliable protection of information resources

Author Biographies

Saken Mambetov, Al-Farabi Kazakh National University

PhD Student

Department of Information Systems

Ihor Ilhe, Kharkiv National Automobile and Highway University

Associate Professor

Department of Automation and Computer-Aided Technologies

Vitalina Babenko, Kharkiv National Automobile and Highway University; Daugavpils University

Doctor of Economic Sciences, PhD, Professor, Head of Department

Department of Computer Systems

Department of Law, Management & Economics

Bakytzhan Kulambayev, Turan University

Сandidate in Technical Sciences

Department of Radio Engineering, Electronics and Telecommunications

Olena Fridman, V. N. Karazin Kharkiv National University

Associate Professor

Department of Economics and Management

Serik Joldasbayev, International IT University

Master of Science

Department of Computer Engineering

Hanna Doroshenko, V. N. Karazin Kharkiv National University

Doctor of Economic Sciences, Professor, Head of Department

Department of Economics and Management

Oleksandr Gurko, Kharkiv National Automobile and Highway University

Doctor of Technical Sciences, Head of Department

Department of Automation and Computer-Aided Technologies

Yenlik Begimbayeva, AUPET named after Gumarbek Daukeyev

PhD, Head of Department

Department of Cybersecurity

Serhii Neronov, Kharkiv National Automobile and Highway University

Senior Lecturer

Department of Computer Systems

References

  1. Mambetov, S., Begimbayeva, Y., Joldasbayev, S., Kazbekova, G. (2023). Internet threats and ways to protect against them: A brief review. 2023 13th International Conference on Cloud Computing, Data Science & Engineering (Confluence). https://doi.org/10.1109/confluence56041.2023.10048858
  2. Dhake, B., Shetye, C., Borhade, P., Gawas, D., Nerurkar, A. (2023). Stratification of Hacker Forums and Predicting Cyber Assaults for Proactive Cyber Threat Intelligence. 2023 2nd International Conference on Paradigm Shifts in Communications Embedded Systems, Machine Learning and Signal Processing (PCEMS). https://doi.org/10.1109/pcems58491.2023.10136033
  3. Leukfeldt, E. R., Kleemans, E. R., Stol, W. P. (2016). Cybercriminal Networks, Social Ties and Online Forums: Social Ties Versus Digital Ties within Phishing and Malware Networks. British Journal of Criminology, azw009. https://doi.org/10.1093/bjc/azw009
  4. Shakarian, J., Gunn, A. T., Shakarian, P. (2016). Exploring Malicious Hacker Forums. Cyber Deception, 259–282. https://doi.org/10.1007/978-3-319-32699-3_11
  5. Mikhaylov, A., Frank, R. (2016). Cards, Money and Two Hacking Forums: An Analysis of Online Money Laundering Schemes. 2016 European Intelligence and Security Informatics Conference (EISIC). https://doi.org/10.1109/eisic.2016.021
  6. Abbasi, A., Li, W., Benjamin, V., Hu, S., Chen, H. (2014). Descriptive Analytics: Examining Expert Hackers in Web Forums. 2014 IEEE Joint Intelligence and Security Informatics Conference. https://doi.org/10.1109/jisic.2014.18
  7. Zhang, X., Li, C. (2013). Survival analysis on hacker forums. SIGBPS workshop on business processes and service, 106–110.
  8. Tariq, E., Akour, I., Al-Shanableh, N., Alquqa, E. K., Alzboun, N., Al-Hawary, S. I. S., Alshurideh, M. T. (2024). How cybersecurity influences fraud prevention: An empirical study on Jordanian commercial banks. International Journal of Data and Network Science, 8 (1), 69–76. https://doi.org/10.5267/j.ijdns.2023.10.016
  9. Karuna, P., Purohit, H., Jajodia, S., Ganesan, R., Uzuner, O. (2021). Fake Document Generation for Cyber Deception by Manipulating Text Comprehensibility. IEEE Systems Journal, 15 (1), 835–845. https://doi.org/10.1109/jsyst.2020.2980177
  10. Rebafka, T. (2023). Model-based clustering of multiple networks with a hierarchical algorithm. Statistics and Computing, 34 (1). https://doi.org/10.1007/s11222-023-10329-w
  11. Fu, T., Abbasi, A., Chen, H. (2010). A focused crawler for Dark Web forums. Journal of the American Society for Information Science and Technology, 61 (6), 1213–1231. https://doi.org/10.1002/asi.21323
  12. McAlaney, J., Kimpton, E., Thackeray, H. (2019). Fifty shades of grey hat: A socio-psychological analysis of conversations on hacking forums. CyPsy24: Annual CyberPsychology, CyberTherapy & Social Networking Conference. Available at: https://eprints.bournemouth.ac.uk/32495
  13. McAlaney, J., Hambidge, S., Kimpton, E., Thackray, H. (2020). Knowledge is power: An analysis of discussions on hacking forums. 2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW). https://doi.org/10.1109/eurospw51379.2020.00070
  14. Lacey, D., Salmon, P. M. (2015). It’s Dark in There: Using Systems Analysis to Investigate Trust and Engagement in Dark Web Forums. Lecture Notes in Computer Science, 117–128. https://doi.org/10.1007/978-3-319-20373-7_12
  15. Benjamin, V., Valacich, J. S., Chen, H. (2019). DICE-E: A Framework for Conducting Darknet Identification, Collection, Evaluation with Ethics. MIS Quarterly, 43 (1), 1–22. https://doi.org/10.25300/misq/2019/13808
  16. Zhang, Y., Fan, Y., Ye, Y., Zhao, L., Wang, J., Xiong, Q., Shao, F. (2018). KADetector: Automatic Identification of Key Actors in Online Hack Forums Based on Structured Heterogeneous Information Network. 2018 IEEE International Conference on Big Knowledge (ICBK). https://doi.org/10.1109/icbk.2018.00028
  17. Park, A. J., Frank, R., Mikhaylov, A., Thomson, M. (2018). Hackers Hedging Bets: A Cross-Community Analysis of Three Online Hacking Forums. 2018 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM). https://doi.org/10.1109/asonam.2018.8508613
  18. Macdonald, M., Frank, R., Mei, J., Monk, B. (2015). Identifying Digital Threats in a Hacker Web Forum. Proceedings of the 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining 2015. https://doi.org/10.1145/2808797.2808878
  19. Frank, R., Macdonald, M., Monk, B. (2016). Location, Location, Location: Mapping Potential Canadian Targets in Online Hacker Discussion Forums. 2016 European Intelligence and Security Informatics Conference (EISIC). https://doi.org/10.1109/eisic.2016.012
  20. Du, P.-Y., Zhang, N., Ebrahimi, M., Samtani, S., Lazarine, B., Arnold, N. et al. (2018). Identifying, Collecting, and Presenting Hacker Community Data: Forums, IRC, Carding Shops, and DNMs. 2018 IEEE International Conference on Intelligence and Security Informatics (ISI). https://doi.org/10.1109/isi.2018.8587327
  21. Joldasbayev, S., Sapakova, S., Zhaksylyk, A., Kulambayev, B., Armankyzy, R., Bolysbek, A. (2023). Development of an Intelligent Service Delivery System to Increase Efficiency of Software Defined Networks. International Journal of Advanced Computer Science and Applications, 14 (12). https://doi.org/10.14569/ijacsa.2023.0141267
  22. Balakayeva, G., Ezhilchelvan, P., Makashev, Y., Phillips, C., Darkenbayev, D., Nurlybayeva, K. (2023). Digitalization of enterprise with ensuring stability and reliability. Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska, 13 (1), 54–57. https://doi.org/10.35784/iapgos.3295
  23. Balakayeva, G., Zhanuzakov, M., Kalmenova, G. (2023). Development of a digital employee rating evaluation system (DERES) based on machine learning algorithms and 360-degree method. Journal of Intelligent Systems, 32 (1). https://doi.org/10.1515/jisys-2023-0008
  24. Balakayeva, G., Darkenbayev, D., Zhanuzakov, M. (2023). Development of a software system for predicting employee ratings. Informatyka, Automatyka, Pomiary w Gospodarce i Ochronie Środowiska, 13 (3), 121–124. https://doi.org/10.35784/iapgos.3723
  25. Joldasbayev, S., Balakayeva, G., Joldasbayev, O. (2020). Application of load balancing algorithms to improve the quality of service delivery using modifications of the least connections algorithm. Journal of Theoretical and Applied Information Technology, 98 (12), 2063–2077. Available at: http://www.jatit.org/volumes/Vol98No12/7Vol98No12.pdf
  26. Huang, C., Guo, Y., Guo, W., Li, Y. (2021). HackerRank: Identifying key hackers in underground forums. International Journal of Distributed Sensor Networks, 17 (5), 155014772110151. https://doi.org/10.1177/15501477211015145
  27. Samtani, S., Chinn, R., Chen, H. (2015). Exploring hacker assets in underground forums. 2015 IEEE International Conference on Intelligence and Security Informatics (ISI). https://doi.org/10.1109/isi.2015.7165935
  28. Benjamin, V., Li, W., Holt, T., Chen, H. (2015). Exploring threats and vulnerabilities in hacker web: Forums, IRC and carding shops. 2015 IEEE International Conference on Intelligence and Security Informatics (ISI). https://doi.org/10.1109/isi.2015.7165944
  29. Deliu, I., Leichter, C., Franke, K. (2018). Collecting Cyber Threat Intelligence from Hacker Forums via a Two-Stage, Hybrid Process using Support Vector Machines and Latent Dirichlet Allocation. 2018 IEEE International Conference on Big Data (Big Data). https://doi.org/10.1109/bigdata.2018.8622469
  30. Anand, M., Sahay, K. B., Ahmed, M. A., Sultan, D., Chandan, R. R., Singh, B. (2023). Deep learning and natural language processing in computation for offensive language detection in online social networks by feature selection and ensemble classification techniques. Theoretical Computer Science, 943, 203–218. https://doi.org/10.1016/j.tcs.2022.06.020
  31. Sultan, D., Omarov, B., Kozhamkulova, Z., Kazbekova, G., Alimzhanova, L., Dautbayeva, A. et al. (2023). A Review of Machine Learning Techniques in Cyberbullying Detection. Computers, Materials & Continua, 74 (3), 5625–5640. https://doi.org/10.32604/cmc.2023.033682
  32. Biswas, B., Mukhopadhyay, A., Bhattacharjee, S., Kumar, A., Delen, D. (2022). A text-mining based cyber-risk assessment and mitigation framework for critical analysis of online hacker forums. Decision Support Systems, 152, 113651. https://doi.org/10.1016/j.dss.2021.113651
  33. Williams, R., Samtani, S., Patton, M., Chen, H. (2018). Incremental Hacker Forum Exploit Collection and Classification for Proactive Cyber Threat Intelligence: An Exploratory Study. 2018 IEEE International Conference on Intelligence and Security Informatics (ISI). https://doi.org/10.1109/isi.2018.8587336
  34. Benjamin, G. (2021). What we do with data: a performative critique of data “collection.” Internet Policy Review, 10 (4). https://doi.org/10.14763/2021.4.1588
  35. Jain, S., de Buitleir, A., Fallon, E. (2020). A Review of Unstructured Data Analysis and Parsing Methods. 2020 International Conference on Emerging Smart Computing and Informatics (ESCI). https://doi.org/10.1109/esci48226.2020.9167588
  36. Thivaharan., S., Srivatsun., G., Sarathambekai., S. (2020). A Survey on Python Libraries Used for Social Media Content Scraping. 2020 International Conference on Smart Electronics and Communication (ICOSEC). https://doi.org/10.1109/icosec49089.2020.9215357
  37. Sarkar, S., Almukaynizi, M., Shakarian, J., Shakarian, P. (2019). Predicting enterprise cyber incidents using social network analysis on dark web hacker forums. The Cyber Defense Review, 87–102. Available at: https://www.jstor.org/stable/26846122
  38. Ampel, B., Samtani, S., Zhu, H., Ullman, S., Chen, H. (2020). Labeling Hacker Exploits for Proactive Cyber Threat Intelligence: A Deep Transfer Learning Approach. 2020 IEEE International Conference on Intelligence and Security Informatics (ISI). https://doi.org/10.1109/isi49825.2020.9280548
  39. Ampel, B., Chen, H. (2021). Distilling Contextual Embeddings Into A Static Word Embedding For Improving Hacker Forum Analytics. 2021 IEEE International Conference on Intelligence and Security Informatics (ISI). https://doi.org/10.1109/isi53945.2021.9624848
  40. Samtani, S., Zhu, H., Chen, H. (2020). Proactively Identifying Emerging Hacker Threats from the Dark Web. ACM Transactions on Privacy and Security, 23 (4), 1–33. https://doi.org/10.1145/3409289
  41. Sen, P. C., Hajra, M., Ghosh, M. (2019). Supervised Classification Algorithms in Machine Learning: A Survey and Review. Emerging Technology in Modelling and Graphics, 99–111. https://doi.org/10.1007/978-981-13-7403-6_11
  42. Sokolova, M., Lapalme, G. (2009). A systematic analysis of performance measures for classification tasks. Information Processing & Management, 45 (4), 427–437. https://doi.org/10.1016/j.ipm.2009.03.002
Detection and classification of threats and vulnerabilities on hacker forums based on machine learning

Downloads

Published

2024-06-28 — Updated on 2024-07-09

How to Cite

Mambetov, S., Ilhe, I., Babenko, V., Kulambayev, B., Fridman, O., Joldasbayev, S., Doroshenko, H., Gurko, O., Begimbayeva, Y., & Neronov, S. (2024). Detection and classification of threats and vulnerabilities on hacker forums based on machine learning. Eastern-European Journal of Enterprise Technologies, 3(9 (129), 16–27. https://doi.org/10.15587/1729-4061.2024.306522

Issue

Section

Information and controlling system