Hybrid gene selection method based on mutual information technique and dragonfly optimization algorithm

Authors

DOI:

https://doi.org/10.15587/1729-4061.2021.233382

Keywords:

GS «gene selection», MI «mutual information technique», and BDA «binary dragonfly optimization algorithm»

Abstract

One of the most prevalent problems with big data is that many of the features are irrelevant. Gene selection has been shown to improve the outcomes of many algorithms, but it is a difficult task in microarray data mining because most microarray datasets have only a few hundred records but thousands of variables. This type of dataset increases the chances of discovering incorrect predictions due to chance. Finding the most relevant genes is generally the most difficult part of creating a reliable classification model. Irrelevant and duplicated attributes have a negative impact on categorization algorithms’ accuracy. Many Machine Learning-based Gene Selection methods have been explored in the literature, with the aim of improving dimensionality reduction precision. Gene selection is a technique for extracting the most relevant data from a series of datasets. The classification method, which can be used in machine learning, pattern recognition, and signal processing, will benefit from further developments in the Gene selection technique. The goal of the feature selection is to select the smallest subset of features but carrying as much information about the class as possible. This paper models the gene selection approach as a binary-based optimization algorithm in discrete space, which directs binary dragonfly optimization algorithm «BDA» and verifies it in a chosen fitness function utilizing precision of the dataset’s k-nearest neighbors’ classifier. The experimental results revealed that the proposed algorithm, dubbed MI-BDA, in terms of precision of results as measured by cost of calculations and classification accuracy, it outperforms other algorithms

Author Biographies

Sarah Ghanim Mahmood, University of AL-Hamdaniya

Master in Mathematics Sciences, Assistant Lecturer

Department of Mathematise

College of Education

Raed Sabeeh Karyakos, University of AL-Hamdaniya

Master in Mathematics, Assistant Lecturer

Department of Mathematise

College of Education

Ilham M. Yacoob, University of AL-Hamdaniya

Master in Mathematics Application Sciences, Assistant Lecturer

Department of Mathematise

College of Education

References

  1. Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P. et. al. (1999). Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286 (5439), 531–537. doi: https://doi.org/10.1126/science.286.5439.531
  2. Cai, R., Hao, Z., Yang, X., Wen, W. (2009). An efficient gene selection algorithm based on mutual information. Neurocomputing, 72 (4-6), 991–999. doi: https://doi.org/10.1016/j.neucom.2008.04.005
  3. Muszyński, M., Osowski, S. (2014). Data mining methods for gene selection on the basis of gene expression arrays. International Journal of Applied Mathematics and Computer Science, 24 (3), 657–668. doi: https://doi.org/10.2478/amcs-2014-0048
  4. Guyon, I., Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182. Available at: https://www.jmlr.org/papers/volume3/guyon03a/guyon03a.pdf
  5. Kohavi, R., John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97 (1-2), 273–324. doi: https://doi.org/10.1016/s0004-3702(97)00043-x
  6. Vergara, J. R., Estévez, P. A. (2013). A review of feature selection methods based on mutual information. Neural Computing and Applications, 24 (1), 175–186. doi: https://doi.org/10.1007/s00521-013-1368-0
  7. Shukla, A. K., Singh, P., Vardhan, M. (2018). A hybrid gene selection method for microarray recognition. Biocybernetics and Biomedical Engineering, 38 (4), 975–991. doi: https://doi.org/10.1016/j.bbe.2018.08.004
  8. Sun, L., Kong, X., Xu, J., Xue, Z., Zhai, R., Zhang, S. (2019). A Hybrid Gene Selection Method Based on ReliefF and Ant Colony Optimization Algorithm for Tumor Classification. Scientific Reports, 9 (1). doi: https://doi.org/10.1038/s41598-019-45223-x
  9. Nakariyakul, S. (2019). A hybrid gene selection algorithm based on interaction information for microarray-based cancer classification. PLOS ONE, 14 (2), e0212333. doi: https://doi.org/10.1371/journal.pone.0212333
  10. Mav, D., Shah, R. R., Howard, B. E., Auerbach, S. S., Bushel, P. R., Collins, J. B. et. al. (2018). A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics. PLOS ONE, 13 (2), e0191105. doi: https://doi.org/10.1371/journal.pone.0191105
  11. Mafarja, M. M., Eleyan, D., Jaber, I., Hammouri, A., Mirjalili, S. (2017). Binary Dragonfly Algorithm for Feature Selection. 2017 International Conference on New Trends in Computing Sciences (ICTCS). doi: https://doi.org/10.1109/ictcs.2017.43
  12. Aljarah, I., Mafarja, M., Heidari, A. A., Faris, H., Zhang, Y., Mirjalili, S. (2018). Asynchronous accelerating multi-leader salp chains for feature selection. Applied Soft Computing, 71, 964–979. doi: https://doi.org/10.1016/j.asoc.2018.07.040
  13. Zaffar, M., Ahmed, M., Savita, K. S., Sajjad, S. (2018). A Study of Feature Selection Algorithms for Predicting Students Academic Performance. International Journal of Advanced Computer Science and Applications, 9 (5). doi: https://doi.org/10.14569/ijacsa.2018.090569
  14. Cover, T. M., Thomas, J. A. (2005). Elements of Information Theory. John Wiley & Sons. doi: https://doi.org/10.1002/047174882x
  15. Liu, H., Sun, J., Liu, L., Zhang, H. (2009). Feature selection with dynamic mutual information. Pattern Recognition, 42 (7), 1330–1339. doi: https://doi.org/10.1016/j.patcog.2008.10.028
  16. Hammouri, A. I., Mafarja, M., Al-Betar, M. A., Awadallah, M. A., Abu-Doush, I. (2020). An improved Dragonfly Algorithm for feature selection. Knowledge-Based Systems, 203, 106131. doi: https://doi.org/10.1016/j.knosys.2020.106131
  17. Qasim, O. S., Mahmoud, M. S., Hasan, F. M. (2020). Hybrid Binary Dragonfly Optimization Algorithm with Statistical Dependence for Feature Selection. International Journal of Mathematical, Engineering and Management Sciences, 5 (6), 1420–1428. doi: https://doi.org/10.33889/ijmems.2020.5.6.105
  18. Sree Ranjini, K. S., Murugan, S. (2017). Memory based Hybrid Dragonfly Algorithm for numerical optimization problems. Expert Systems with Applications, 83, 63–78. doi: https://doi.org/10.1016/j.eswa.2017.04.033
  19. Mafarja, M., Heidari, A. A., Faris, H., Mirjalili, S., Aljarah, I. (2019). Dragonfly Algorithm: Theory, Literature Review, and Application in Feature Selection. Studies in Computational Intelligence, 47–67. doi: https://doi.org/10.1007/978-3-030-12127-3_4
  20. Al-Thanoon, N. A., Qasim, O. S., Algamal, Z. Y. (2018). Tuning parameter estimation in SCAD-support vector machine using firefly algorithm with application in gene selection and cancer classification. Computers in Biology and Medicine, 103, 262–268. doi: https://doi.org/10.1016/j.compbiomed.2018.10.034
  21. Alhafedh, M. A. A., Qasim, O. S. (2019). Two-Stage Gene Selection in Microarray Dataset Using Fuzzy Mutual Information and Binary Particle Swarm Optimization. Indian Journal of Forensic Medicine & Toxicology, 13 (4), 1162. doi: https://doi.org/10.5958/0973-9130.2019.00458.4
  22. Blake, C. L., Merz, C. J. (1998). UCI Repository of Machine Learning Databases. University of California, Oakland.
  23. Kashmoola, M. A., Alsaleem, M. Y. anad, Alsaleem, N. Y. A., Moskalets, M. (2019). Model of dynamics of the grouping states of radio electronic means in the problems of ensuring electromagnetic compatibility. Eastern-European Journal of Enterprise Technologies, 6 (9 (102)), 12–20. doi: https://doi.org/10.15587/1729-4061.2019.188976

Downloads

Published

2021-06-30

How to Cite

Mahmood, S. G., Karyakos, R. S., & Yacoob, I. M. (2021). Hybrid gene selection method based on mutual information technique and dragonfly optimization algorithm . Eastern-European Journal of Enterprise Technologies, 3(3 (111), 64–69. https://doi.org/10.15587/1729-4061.2021.233382

Issue

Section

Control processes