The sample censoring method development for neural network model synthesis
DOI:
https://doi.org/10.15587/1729-4061.2014.28027Keywords:
sample, instance selection, data reduction, neural network, dimensionality reductionAbstract
The method of training sample formation is proposed. It allows to characterize the individual instance informativity relative to the centers and boundaries of feature intervals. This allows to automate the analysis of the sample and its separation into sub-samples, and, as a result, to reduce the training data dimensionality. The computer program implementing proposed method has been developed and used in the experiments. The developed software was investigated in solving the problem of diagnosis chronic obstructive bronchitis from the experimentally obtained data of clinical laboratory tests of patients. The experiments found that even a slight reduction of the original sample volume by 25 % (to 75 % of the original volume) yielded acceptable accuracy and reduces training time by more than 1.32 times. The twice reduction of the original sample volume (up to 50 % of the original volume) afforded the gain in speed of 1.99 times. This confirms the usefulness of the proposed mathematical support in the construction of neural network models by precedents.
References
1. Engelbrecht, A. (2007). Computational intelligence: an introduction. Sidney, John Wiley & Sons, 597. doi: 10.1002/9780470512517
2. Jankowski, N., Grochowski, M. (2004). Comparison of instance selection algorithms I. Algorithms survey. Presented at 7th International Conference on Artificial Intelligence and Soft Computing,Zakopane,Poland, 3070. doi:10.1007/978-3-540-24844-6_90
3. Reinartz, T. (2002). A unifying view on instance selection. Data Mining and Knowledge Discovery, 6, 191–210. doi:10.1023/A:1014047731786
4. Hart, P. E. (1968). The condensed nearest neighbor rule. IEEE Transactions on Information Theory, 14, 515–516. doi:10.1109/TIT.1968.1054155
5. Aha, D. W., Kibler, D., Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66. doi:10.1023/A:1022689900470
6.Brighton, H., Mellish, C. (2002). Advances in instance selection for instance based learning algorithms. Data Mining and Knowledge Discovery, 6, 153–172. doi:10.1023/A:1014043630878
7.Wilson, D. R., Martinez, T. R. (1997). Instance pruning techniques. Presented at Fourteenth International Conference on Machine Learning, Nashville, 403–411.
8. Kibbler, D., Aha, D. W. (1987). Learning representative exemplars of concepts: an initial case of study. Presented at 4th International Workshop on Machine Learning, Irvine. 24–30. doi:10.1016/b978-0-934613-41-5.50006-4
9. Gates, G. (1972). The reduced nearest neighbor rule. IEEE Transactions on Information Theory, 18 (3), 431–433. doi:10.1109/TIT.1972.1054809
10.Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, Cybernetics, 2 (3), 408–421. doi:10.1109/TSMC.1972.4309137
11.Wilson, D. R.,Martinez, T. R. (2000). Reduction techniques for instancebased learning algorithms. Machine Learning, 38 (3), 257–286. doi:10.1023/A:1007626913721
12. Ritter, G. L., Woodruff, H. B., Lowry, S. R., Isenhour, T. L. (1975). An algorithm for a selective nearest neighbor decision rule. IEEE Transactions on Information Theory, 21 (6), 665–669. doi:10.1109/TIT.1975.1055464
13. Li, X. (2002). Data reduction via adaptive sampling. Communications in Information and Systems, 2 (1), 5–38. doi:10.4310/cis.2002.v2.n1.a3
14. Domingo, C. C., Gavaldа, R., Watanabe, O. (1999). Adaptive sampling methods for scaling up knowledge discovery algorithms. Presented at Second International Conference on Discovery Science. Tokyo, 172–183. doi:10.1007/3-540-46846-3_16
15. Li, B., Chi, M., Fan, J., Xue, X. (2007). Support cluster machine. Presented at 24th International Conference on Machine Learning.Corvallis. doi:10.1145/1273496.1273560
16. Evans, R. (2008). Clustering for classification: using standard clustering methods to summarise datasets with minimal loss of classification accuracy. Saarbrücken: VDM Verlag, 108.
17. Madigan, D., Raghavan, N., DuMouchel, W., Nason, M., Posse, C., Ridgeway, G. (2002). Likelihood-based data squashing: a modeling approach to instance construction. Data Mining and Knowledge Discovery, 6 (2), 173–190. doi:10.1023/A:1014095614948
18. Kohonen, T. (1988). Learning vector quantization. Neural Networks, 1, 303 doi: 10.1016/0893-6080(88)90334-6
19. Sane, S. S., Ghatol, A. A. (2007). A Novel supervised instance selection algorithm. International Journal of Business Intelligence and Data Mining, 2 (4), 471–495. doi:10.1504/IJBIDM.2007.016384
20. Subbotin, S. (2013). The neuro-fuzzy network synthesis and simplification on precedents in problems of diagnosis and pattern recognition. Optical Memory and Neural Networks (Information Optics), 22 (2), 97–103. doi: 10.3103/s1060992x13020082
21. Subbotin, S. A. (2013). Methods of sampling based on exhaustive and evolutionary search. Automatic Control and Computer Sciences, 47 (3), 113–121. doi: 10.3103/s0146411613030073
22. Kolisnyk, N. V., Subbotin, S. O. (2009). Modeling of immunopathogenesis of chronic obstructive bronchitis using neural networks. Presented at ІІ International Conference Modern problems of biology, ecology and chemistry, Zaporizhzhya, Ukraine, 124–125.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2014 Сергей Александрович Субботин
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.