NEW ORGANIZATION PROCESS OF FEATURE SELECTION BY FILTER WITH CORRELATION-BASED FEATURES SELECTION METHOD

Authors

DOI:

https://doi.org/10.30837/ITSSI.2022.21.039

Keywords:

Correlation-based Feature Selection (CFS), symmetrical uncertainty (SU), Pearson Correlation (PearCorr), merit, accuracy, determination coefficient

Abstract

The subject of the article is feature selection techniques that are used on data preprocessing step before building machine learning models. In this paper the focus is put on a Filter technique when it uses Correlation-based Feature Selection (further CFS) with symmetrical uncertainty method (further CFS-SU) or CFS with Pearson Correlation (further CFS-PearCorr). The goal of the work is to increase the efficiency of feature selection by Filter with CFS by proposing a new organization process of feature selection. The tasks which are solved in the article: review and analysis of the existing organization process of feature selections by Filter with CFS; identify the routs cause the performance degradation; propose a new approach; evaluate the proposed approach. To implement the specified tasks, the following methods were used: information theory, process theory, algorithm theory, statistics theory, sampling techniques, data modeling theory, science experiments. Results. Based on the received results are proved: 1) the chosen features subset’s evaluation function couldn’t be based only on CFS merit as it causes a learning algorithm’s results degradation; 2) the accuracies of the classification learning algorithms had improved and the values of determination coefficient of the regression leaning algorithms had increased when features are selected according to the proposed new organization process. Conclusions. A new organization process for feature selection which is proposed in current work combines filter and learning algorithm properties in evaluation strategy which helps to choose the optimal feature subset for predefined learning algorithm. The computation complexity of the proposed approach to feature selection doesn’t depend on dataset’s dimensions which makes it robust to different data varieties; it eliminates the time needed for feature subsets’ search as subsets are selected randomly. The conducted experiments proved that the performance of the classification and regression learning algorithms with features selected according to the new flow had outperformed the performance of the same learning algorithms built with without applied new process on data preprocessing step.

Author Biography

Olga Solovei, Kyiv National University of Construction and Architecture

PhD (Technical Sciences), Associate Professor

References

Guyon, I., Elisseeff, A. (2003), "An introduction to variable and feature selection", J. Machine Learning

Research 3, P. 1157–1182.

Dernoncourt, D., Hanczar, B., & Zucker, J.-D. (2014), "Analysis of feature selection stability on high dimension and small sample data", Computational Statistics & Data Analysis, 71, Р. 681– 693. DOI: https://doi.org/10.1016/j.csda.2013.07.012

Luan, C., Dong, G. (2018), "Experimental identification of hard data sets for classification and feature selection methods

with insights on method selection", Data Knowl. Eng. 2018, 118, Р. 41–51.

Senliol B., Gulgezen G., Yu L., Cataltepe Z. (2008), "Fast Correlation Based Filter (FCBF) with a different search

strategy", 23rd international symposium on computer and information sciences, Р. 1–4.

Yu L., Liu H. (2021), «Enhancing Big Data Feature Selection Using a Hybrid Correlation-Based Feature Selection»,

No. 10, 2984 p. DOI: https://doi.org/10.3390/electronics10232984

Alzami F., Tang J., Yu Z., Wu S., Chen P., You J., Zhang J. (2018), "Adaptive Hybrid Feature Selection-Based

Classifier Ensemble for Epileptic Seizure Classification", IEEE Access., No. 6, P. 29132 – 29145.

DOI: https://10.1109/ACCESS.2018.2838559

Jaina D., Singhb V. (2018), "An Efficient Hybrid Feature Selection model for Dimensionality Reduction", Procedia

Computer Science, No. 132, Р.333–341.

Pedregosa F., Varoquaux G., Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V. (2011), "Scikit-learn: Machine learning in Python", Journal of Machine Learning Research, No. 12, Р. 2825–2830.

Duda R., Hart P., Stork D. (2012), Pattern classification, John Wiley & Sons.

Mundra P., Rajapakseab J. (2016), "Gene and sample selection using T-score with sample selection", Journal of Biomedical Informatics, No. 59, Р. 31–41. DOI: https://doi.org/10.1016/j.jbi.2015.11.003

Tan H., Wang G., Wang W., Zhanga Z. (2022), "Feature selection based on distance correlation: a filter algorithm", Journal

of Applied Statistics, No. 49 (2), Р. 411–426.

Zhai Y., Song W., Liu X., Liu L. (2018), "A Chi-Square Statistics Based Feature Selection Method in Text

Classification", IEEE 9th International Conference on Software Engineering and Service Science (ICSESS).

DOI: https://10.1109/ICSESS.2018.8663882

Ircioa J., Lojo A., Morib U., Lozanobc J. (2020), "Mutual information based feature subset selection in multivariate time

series classification", Pattern Recognition, 108. DOI: https://doi.org/10.1016/j.patcog.2020.107525

Sarkar D., Goswami S. (2013), "Empirical Study on Filter based Feature Selection Methods for Text Classification", International Journal of Computer Applications, No. 6, Р. 38 – 43.

Li J., Cheng K., Wang S., Morstatter F., Trevino R., Tang J., Liu H. (2018), "Feature Selection: A Data Perspective",

ACM Computing Surveys, No. 50, Р. 1–45. DOI: https://doi.org/10.1145/3136625

Koller, D., Sahami, M. (1996), "Toward optimal feature selection", Proceedings of the Thirteenth International Conference

on International Conference on Machine Learning, Р. 284–292.

Ahn E., Mullen T., Yen J. (2011), "A two-population evolutionary algorithm for feature extraction: Combining filter

and wrapper", IEEE Congress of Evolutionary Computation (CEC). DOI: https://ieeexplore.ieee.org/document/5949692

Hall, M. A. (1998), Correlation-based Feature Selection for Machine Learning, Ph.D diss. Dept. of Computer Science,

Waikato Univ

Downloads

Published

2022-11-18

How to Cite

Solovei, O. (2022). NEW ORGANIZATION PROCESS OF FEATURE SELECTION BY FILTER WITH CORRELATION-BASED FEATURES SELECTION METHOD. INNOVATIVE TECHNOLOGIES AND SCIENTIFIC SOLUTIONS FOR INDUSTRIES, (3 (21), 39–50. https://doi.org/10.30837/ITSSI.2022.21.039