Binary classification based on a combination of rough set theory and decision trees

Authors

DOI:

https://doi.org/10.30837/ITSSI.2023.26.087

Keywords:

decision tree classification, approximate set theory, algebraic approach, machine learning

Abstract

The subject of the study is to improve the accuracy and efficiency of classification algorithms using decision trees by integrating the principles of Rough Set theory, a mathematical approach to approximating sets. The aim of the study is to develop a hybrid model that integrates rough set theory with decision tree algorithms, thereby solving the inherent limitations of these algorithms in dealing with uncertainty in data. This integration should significantly improve the accuracy and efficiency of binary classification based on decision trees, making them more robust to different inputs. Research objectives include a deep study of possible synergies between approximate set theory and decision tree algorithms. For this purpose, we are conducting a comprehensive study of the integration of approximate set theory within decision tree algorithms. This includes the development of a model that utilizes the principles and algebraic tools of approximate set theory to more efficiently select features in decision tree-based systems. The model uses the theory of approximate sets to efficiently handle uncertainty and weighting, which allows for improved and extended feature selection processes in decision tree systems. A series of experiments are conducted on different datasets to demonstrate the effectiveness and practicality of this approach. These datasets are chosen to represent a range of complexities and uncertainties, providing a thorough and rigorous evaluation of the model's capabilities. The methodology uses advanced algebraic tools of approximate set theory, including the formulation of algebraic expressions and the development of new rules and techniques, to simplify and improve the accuracy of data classification processes using decision tree systems. The findings of the study are important because they show that integrating approximate set theory into decision tree algorithms can indeed provide more accurate and efficient classification results. Such a hybrid model demonstrates significant advantages in dealing with data with embedded uncertainty, which is a common challenge in many complementary scenarios. The versatility and effectiveness of the integrated approach is demonstrated by its successful application in the areas of credit scoring and cybersecurity, which emphasizes its potential as a versatile tool in data mining and machine learning. The conclusions show that integrating approximate set theory can lead to more accurate and efficient classification results. By improving the ability of decision trees to account for uncertainty and imprecision in data, the research opens up new possibilities for robust and sophisticated data analysis and interpretation in a variety of industries, from healthcare to finance and beyond. The integration of approximate set theory and decision trees is an important step in the development of more advanced, efficient, and accurate classification tools in the era of big data.

Author Biographies

Dmytro Chernyshov, Kharkiv National University of Radio Electronics

Bachelor of Computer Science

Dmytro Sytnikov, Kharkiv National University of Radio Electronics

PhD (Engineering Sciences), Associate Professor, Professor at the Department of System Engineering

References

Список літератури

Costa V. G. and Pedreira C. E. Recent advances in decision trees: an updated survey. Artificial Intelligence Review, Springer Science and Business Media LLC. Vol. 56. No. 5. P. 4765–4800. 2022. DOI: 10.1007/s10462-022-10275-5.

Hafeez M. A., Rashid M., Tariq H., Abideen Z. U., Alotaibi S. S., and Sinky M. H. Performance Improvement of Decision Tree: A Robust Classifier Using Tabu Search Algorithm. Applied Sciences, MDPI AG. Vol. 11. No. 15. 6728 р. 2021. DOI: 10.3390/app11156728.

Wang Z., Zhang X., and Deng J. The uncertainty measures for covering rough set models. Soft Computing, Springer Science and Business Media LLC. Vol. 24. No. 16. P. 11909–11929. 2020. DOI: 10.1007/s00500-020-05098-x.

Geetha M. A., Acharjya D. P., and Iyengar N. Ch. S. N. Algebraic properties and measures of uncertainty in rough set on two universal sets based on multi-granulation. Proceedings of the 6th ACM India Computing Convention, ACM. Р. 1-8. 2013. DOI: 10.1145/2522548.2523168.

Qian Y., Xu H., Liang J., Liu B., and Wang J. Fusing Monotonic Decision Trees. IEEE Transactions on Knowledge and Data Engineering. Vol. 27. No. 10. P. 2717–2728. 2015. DOI: 10.1109/TKDE.2015.2429133.

Sitnikov D. and Ryabov O. An Algebraic Approach to Defining Rough Set Approximations and Generating Logic Rules. Data Mining V, WIT Press. 10 р. 2004. DOI: 10.2495/data040171.

Sitnikov D., Titova O., Romanenko O., and Ryabov O. A method for finding minimal sets of features adequately describing discrete information objects. Data Mining X, WIT Press. 8 р. 2009. DOI: 10.2495/data090141.

Wang D., Liu X., Jiang L., Zhang X., and Zhao Y. Rough Set Approach to Multivariate Decision Trees Inducing. Journal of Computers, International Academy Publishing (IAP). Vol. 7. No. 4. P. 870–879. 2012. DOI: 10.4304/jcp.7.4.870-879.

Blockeel H., Devos L., Frénay B., Nanfack G., and Nijssen S. Decision trees: from efficient prediction to responsible AI. Frontiers in Artificial Intelligence, Frontiers Media SA. Vol. 6. Jul. 26. 2023. DOI: 10.3389/frai.2023.1124553.

Hu X., Rudin C., and Seltzer M. Optimal Sparse Decision Trees. arXiv. 2019. DOI: 10.48550/ARXIV.1904.12847.

Chiaselotti G., Gentile T., and Infusino F. Decision systems in rough set theory: A set operatorial perspective. Journal of Algebra and Its Applications, World Scientific Pub Co Pte Lt. Vol. 18. No. 01. 2019. 1950004 р. DOI: 10.1142/s021949881950004x.

Xu J., Qu K., Meng X., Sun Y., and Hou Q. Feature selection based on multiview entropy measures in multiperspective rough set. International Journal of Intelligent Systems, Hindawi Limited. Vol. 37. No. 10. 2022. P. 7200–7234. DOI: 10.1002/int.22878.

Duan G., Ding D., Tian Y., and You X. An Improved Medical Decision Model Based on Decision Tree Algorithms. 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom), Atlanta, GA, USA IEEE. 2016. P. 151-156. DOI: 10.1109/BDCloud-SocialCom-SustainCom.2016.33.

Cukierski W. Titanic - Machine Learning from Disaster. Kaggle. 2012. URL: https://kaggle.com/competitions/titanic.

Ronen R., Radu M., Feuerstein C., Yom-Tov E., and Ahmadi M. Microsoft Malware Classification Challenge. arXiv. 2018. DOI: 10.48550/ARXIV.1802.10135.

Montoya A., Odintsov K., and Kotek M. Home Credit Default Risk. Kaggle. 2018. URL: https://kaggle.com/competitions/home-credit-default-risk.

References

Costa, V. G. and Pedreira, C. E. (2022), “Recent advances in decision trees: an updated survey”. Artificial Intelligence Review, Springer Science and Business Media LLC. Vol. 56, No. 5. P. 4765–4800. DOI: 10.1007/s10462-022-10275-5.

Hafeez, M. A., Rashid, M., Tariq, H., Abideen, Z. U., Alotaibi, S. S., and Sinky, M. H. (2021), “Performance Improvement of Decision Tree: A Robust Classifier Using Tabu Search Algorithm”. Applied Sciences, MDPI AG. Vol. 11, No. 15. 6728 р. DOI: 10.3390/app11156728.

Wang, Z., Zhang, X., and Deng, J. (2020), “The uncertainty measures for covering rough set models”. Soft Computing, Springer Science and Business Media LLC. Vol. 24, No. 16. P. 11909–11929. DOI: 10.1007/s00500-020-05098-x.

Geetha, M. A., Acharjya, D. P., and Iyengar, N. Ch. S. N. (2013), “Algebraic properties and measures of uncertainty in rough set on two universal sets based on multi-granulation”. Proceedings of the 6th ACM India Computing Convention, ACM. Р. 1-8. DOI: 10.1145/2522548.2523168.

Qian, Y., Xu, H., Liang, J., Liu, B., and Wang, J. (2015), “Fusing Monotonic Decision Trees”. IEEE Transactions on Knowledge and Data Engineering. Vol. 27, No. 10. P. 2717–2728. DOI: 10.1109/TKDE.2015.2429133.

Sitnikov, D. and Ryabov, O. (2004), “An Algebraic Approach to Defining Rough Set Approximations and Generating Logic Rules”. Data Mining V, WIT Press. 10 р. DOI: 10.2495/data040171.

Sitnikov, D., Titova, O., Romanenko, O., and Ryabov, O. (2009), “A method for finding minimal sets of features adequately describing discrete information objects”. Data Mining X, WIT Press. 8 р. DOI: 10.2495/data090141.

Wang, D., Liu, X., Jiang, L., Zhang, X., and Zhao, Y. (2012), “Rough Set Approach to Multivariate Decision Trees Inducing”. Journal of Computers, International Academy Publishing (IAP). Vol. 7, No. 4. P. 870–879. DOI: 10.4304/jcp.7.4.870-879.

Blockeel, H., Devos, L., Frénay, B., Nanfack, G., and Nijssen, S. (2023), “Decision trees: from efficient prediction to responsible AI”. Frontiers in Artificial Intelligence, Frontiers Media SA. Vol. 6. Jul. 26. DOI: 10.3389/frai.2023.1124553.

Hu, X., Rudin, C., and Seltzer, M. (2019), “Optimal Sparse Decision Trees”. arXiv. DOI: 10.48550/ARXIV.1904.12847.

Chiaselotti, G., Gentile, T., and Infusino, F. (2019), “Decision systems in rough set theory: A set operatorial perspective”. Journal of Algebra and Its Applications, World Scientific Pub Co Pte Lt. Vol. 18, No. 01. 1950004 р. DOI: 10.1142/s021949881950004x.

Xu, J., Qu, K., Meng, X., Sun, Y., and Hou, Q. (2022), “Feature selection based on multiview entropy measures in multiperspective rough set”. International Journal of Intelligent Systems, Hindawi Limited. Vol. 37, No. 10. P. 7200–7234. DOI: 10.1002/int.22878.

Duan, G., Ding, D., Tian, Y., and You, X. (2016), “An Improved Medical Decision Model Based on Decision Tree Algorithms”. 2016 IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom) (BDCloud-SocialCom-SustainCom), IEEE. P. 151-156. DOI: 10.1109/BDCloud-SocialCom-SustainCom.2016.33.

Cukierski, W. (2012), “Titanic - Machine Learning from Disaster”. Kaggle. available at: https://kaggle.com/competitions/titanic.

Ronen, R., Radu, M., Feuerstein, C., Yom-Tov, E., and Ahmadi, M. (2018), “Microsoft Malware Classification Challenge”. arXiv. DOI: 10.48550/ARXIV.1802.10135.

Montoya, A., Odintsov, K., and Kotek, M. (2018), “Home Credit Default Risk”. Kaggle. available at: https://kaggle.com/competitions/home-credit-default-risk.

Published

2023-12-27

How to Cite

Chernyshov, D., & Sytnikov, D. (2023). Binary classification based on a combination of rough set theory and decision trees. INNOVATIVE TECHNOLOGIES AND SCIENTIFIC SOLUTIONS FOR INDUSTRIES, (4(26), 87–94. https://doi.org/10.30837/ITSSI.2023.26.087