Hybrid gene selection method based on mutual information technique and dragonfly optimization algorithm
DOI:
https://doi.org/10.15587/1729-4061.2021.233382Keywords:
GS «gene selection», MI «mutual information technique», and BDA «binary dragonfly optimization algorithm»Abstract
One of the most prevalent problems with big data is that many of the features are irrelevant. Gene selection has been shown to improve the outcomes of many algorithms, but it is a difficult task in microarray data mining because most microarray datasets have only a few hundred records but thousands of variables. This type of dataset increases the chances of discovering incorrect predictions due to chance. Finding the most relevant genes is generally the most difficult part of creating a reliable classification model. Irrelevant and duplicated attributes have a negative impact on categorization algorithms’ accuracy. Many Machine Learning-based Gene Selection methods have been explored in the literature, with the aim of improving dimensionality reduction precision. Gene selection is a technique for extracting the most relevant data from a series of datasets. The classification method, which can be used in machine learning, pattern recognition, and signal processing, will benefit from further developments in the Gene selection technique. The goal of the feature selection is to select the smallest subset of features but carrying as much information about the class as possible. This paper models the gene selection approach as a binary-based optimization algorithm in discrete space, which directs binary dragonfly optimization algorithm «BDA» and verifies it in a chosen fitness function utilizing precision of the dataset’s k-nearest neighbors’ classifier. The experimental results revealed that the proposed algorithm, dubbed MI-BDA, in terms of precision of results as measured by cost of calculations and classification accuracy, it outperforms other algorithms
References
- Golub, T. R., Slonim, D. K., Tamayo, P., Huard, C., Gaasenbeek, M., Mesirov, J. P. et. al. (1999). Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring. Science, 286 (5439), 531–537. doi: https://doi.org/10.1126/science.286.5439.531
- Cai, R., Hao, Z., Yang, X., Wen, W. (2009). An efficient gene selection algorithm based on mutual information. Neurocomputing, 72 (4-6), 991–999. doi: https://doi.org/10.1016/j.neucom.2008.04.005
- Muszyński, M., Osowski, S. (2014). Data mining methods for gene selection on the basis of gene expression arrays. International Journal of Applied Mathematics and Computer Science, 24 (3), 657–668. doi: https://doi.org/10.2478/amcs-2014-0048
- Guyon, I., Elisseeff, A. (2003). An introduction to variable and feature selection. Journal of Machine Learning Research, 3, 1157–1182. Available at: https://www.jmlr.org/papers/volume3/guyon03a/guyon03a.pdf
- Kohavi, R., John, G. H. (1997). Wrappers for feature subset selection. Artificial Intelligence, 97 (1-2), 273–324. doi: https://doi.org/10.1016/s0004-3702(97)00043-x
- Vergara, J. R., Estévez, P. A. (2013). A review of feature selection methods based on mutual information. Neural Computing and Applications, 24 (1), 175–186. doi: https://doi.org/10.1007/s00521-013-1368-0
- Shukla, A. K., Singh, P., Vardhan, M. (2018). A hybrid gene selection method for microarray recognition. Biocybernetics and Biomedical Engineering, 38 (4), 975–991. doi: https://doi.org/10.1016/j.bbe.2018.08.004
- Sun, L., Kong, X., Xu, J., Xue, Z., Zhai, R., Zhang, S. (2019). A Hybrid Gene Selection Method Based on ReliefF and Ant Colony Optimization Algorithm for Tumor Classification. Scientific Reports, 9 (1). doi: https://doi.org/10.1038/s41598-019-45223-x
- Nakariyakul, S. (2019). A hybrid gene selection algorithm based on interaction information for microarray-based cancer classification. PLOS ONE, 14 (2), e0212333. doi: https://doi.org/10.1371/journal.pone.0212333
- Mav, D., Shah, R. R., Howard, B. E., Auerbach, S. S., Bushel, P. R., Collins, J. B. et. al. (2018). A hybrid gene selection approach to create the S1500+ targeted gene sets for use in high-throughput transcriptomics. PLOS ONE, 13 (2), e0191105. doi: https://doi.org/10.1371/journal.pone.0191105
- Mafarja, M. M., Eleyan, D., Jaber, I., Hammouri, A., Mirjalili, S. (2017). Binary Dragonfly Algorithm for Feature Selection. 2017 International Conference on New Trends in Computing Sciences (ICTCS). doi: https://doi.org/10.1109/ictcs.2017.43
- Aljarah, I., Mafarja, M., Heidari, A. A., Faris, H., Zhang, Y., Mirjalili, S. (2018). Asynchronous accelerating multi-leader salp chains for feature selection. Applied Soft Computing, 71, 964–979. doi: https://doi.org/10.1016/j.asoc.2018.07.040
- Zaffar, M., Ahmed, M., Savita, K. S., Sajjad, S. (2018). A Study of Feature Selection Algorithms for Predicting Students Academic Performance. International Journal of Advanced Computer Science and Applications, 9 (5). doi: https://doi.org/10.14569/ijacsa.2018.090569
- Cover, T. M., Thomas, J. A. (2005). Elements of Information Theory. John Wiley & Sons. doi: https://doi.org/10.1002/047174882x
- Liu, H., Sun, J., Liu, L., Zhang, H. (2009). Feature selection with dynamic mutual information. Pattern Recognition, 42 (7), 1330–1339. doi: https://doi.org/10.1016/j.patcog.2008.10.028
- Hammouri, A. I., Mafarja, M., Al-Betar, M. A., Awadallah, M. A., Abu-Doush, I. (2020). An improved Dragonfly Algorithm for feature selection. Knowledge-Based Systems, 203, 106131. doi: https://doi.org/10.1016/j.knosys.2020.106131
- Qasim, O. S., Mahmoud, M. S., Hasan, F. M. (2020). Hybrid Binary Dragonfly Optimization Algorithm with Statistical Dependence for Feature Selection. International Journal of Mathematical, Engineering and Management Sciences, 5 (6), 1420–1428. doi: https://doi.org/10.33889/ijmems.2020.5.6.105
- Sree Ranjini, K. S., Murugan, S. (2017). Memory based Hybrid Dragonfly Algorithm for numerical optimization problems. Expert Systems with Applications, 83, 63–78. doi: https://doi.org/10.1016/j.eswa.2017.04.033
- Mafarja, M., Heidari, A. A., Faris, H., Mirjalili, S., Aljarah, I. (2019). Dragonfly Algorithm: Theory, Literature Review, and Application in Feature Selection. Studies in Computational Intelligence, 47–67. doi: https://doi.org/10.1007/978-3-030-12127-3_4
- Al-Thanoon, N. A., Qasim, O. S., Algamal, Z. Y. (2018). Tuning parameter estimation in SCAD-support vector machine using firefly algorithm with application in gene selection and cancer classification. Computers in Biology and Medicine, 103, 262–268. doi: https://doi.org/10.1016/j.compbiomed.2018.10.034
- Alhafedh, M. A. A., Qasim, O. S. (2019). Two-Stage Gene Selection in Microarray Dataset Using Fuzzy Mutual Information and Binary Particle Swarm Optimization. Indian Journal of Forensic Medicine & Toxicology, 13 (4), 1162. doi: https://doi.org/10.5958/0973-9130.2019.00458.4
- Blake, C. L., Merz, C. J. (1998). UCI Repository of Machine Learning Databases. University of California, Oakland.
- Kashmoola, M. A., Alsaleem, M. Y. anad, Alsaleem, N. Y. A., Moskalets, M. (2019). Model of dynamics of the grouping states of radio electronic means in the problems of ensuring electromagnetic compatibility. Eastern-European Journal of Enterprise Technologies, 6 (9 (102)), 12–20. doi: https://doi.org/10.15587/1729-4061.2019.188976
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 Sarah Ghanim Mahmood, Raed Sabeeh Karyakos, Ilham M. Yacoob
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.