DOI: https://doi.org/10.15587/1729-4061.2014.28027

The sample censoring method development for neural network model synthesis

Сергей Александрович Субботин

Abstract


The method of training sample formation is proposed. It allows to characterize the individual instance informativity relative to the centers and boundaries of feature intervals. This allows to automate the analysis of the sample and its separation into sub-samples, and, as a result, to reduce the training data dimensionality. The computer program implementing proposed method has been developed and used in the experiments. The developed software was investigated in solving the problem of diagnosis chronic obstructive bronchitis from the experimentally obtained data of clinical laboratory tests of patients. The experiments found that even a slight reduction of the original sample volume by 25 % (to 75 % of the original volume) yielded acceptable accuracy and reduces training time by more than 1.32 times. The twice reduction of the original sample volume (up to 50 % of the original volume) afforded the gain in speed of 1.99 times. This confirms the usefulness of the proposed mathematical support in the construction of neural network models by precedents.


Keywords


sample; instance selection; data reduction; neural network; dimensionality reduction

References


1. Engelbrecht, A. (2007). Computational intelligence: an introduction. Sidney, John Wiley & Sons, 597. doi: 10.1002/9780470512517

2. Jankowski, N., Grochowski, M. (2004). Comparison of instance selection algorithms I. Algorithms survey. Presented at 7th International Conference on Artificial Intelligence and Soft Computing,Zakopane,Poland, 3070. doi:10.1007/978-3-540-24844-6_90

3. Reinartz, T. (2002). A unifying view on instance selection. Data Mining and Knowledge Discovery, 6, 191–210. doi:10.1023/A:1014047731786

4. Hart, P. E. (1968). The condensed nearest neighbor rule. IEEE Transactions on Information Theory, 14, 515–516. doi:10.1109/TIT.1968.1054155

5. Aha, D. W., Kibler, D., Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning, 6, 37–66. doi:10.1023/A:1022689900470

6.Brighton, H., Mellish, C. (2002). Advances in instance selection for instance based learning algorithms. Data Mining and Knowledge Discovery, 6, 153–172. doi:10.1023/A:1014043630878

7.Wilson, D. R., Martinez, T. R. (1997). Instance pruning techniques. Presented at Fourteenth International Conference on Machine Learning, Nashville, 403–411.

8. Kibbler, D., Aha, D. W. (1987). Learning representative exemplars of concepts: an initial case of study. Presented at 4th International Workshop on Machine Learning, Irvine. 24–30. doi:10.1016/b978-0-934613-41-5.50006-4

9. Gates, G. (1972). The reduced nearest neighbor rule. IEEE Transactions on Information Theory, 18 (3), 431–433. doi:10.1109/TIT.1972.1054809

10.Wilson, D. L. (1972). Asymptotic properties of nearest neighbor rules using edited data. IEEE Transactions on Systems, Man, Cybernetics, 2 (3), 408–421. doi:10.1109/TSMC.1972.4309137

11.Wilson, D. R.,Martinez, T. R. (2000). Reduction techniques for instancebased learning algorithms. Machine Learning, 38 (3), 257–286. doi:10.1023/A:1007626913721

12. Ritter, G. L., Woodruff, H. B., Lowry, S. R., Isenhour, T. L. (1975). An algorithm for a selective nearest neighbor decision rule. IEEE Transactions on Information Theory, 21 (6), 665–669. doi:10.1109/TIT.1975.1055464

13. Li, X. (2002). Data reduction via adaptive sampling. Communications in Information and Systems, 2 (1), 5–38. doi:10.4310/cis.2002.v2.n1.a3

14. Domingo, C. C., Gavaldа, R., Watanabe, O. (1999). Adaptive sampling methods for scaling up knowledge discovery algorithms. Presented at Second International Conference on Discovery Science. Tokyo, 172–183. doi:10.1007/3-540-46846-3_16

15. Li, B., Chi, M., Fan, J., Xue, X. (2007). Support cluster machine. Presented at 24th International Conference on Machine Learning.Corvallis. doi:10.1145/1273496.1273560

16. Evans, R. (2008). Clustering for classification: using standard clustering methods to summarise datasets with minimal loss of classification accuracy. Saarbrücken: VDM Verlag, 108.

17. Madigan, D., Raghavan, N., DuMouchel, W., Nason, M., Posse, C., Ridgeway, G. (2002). Likelihood-based data squashing: a modeling approach to instance construction. Data Mining and Knowledge Discovery, 6 (2), 173–190. doi:10.1023/A:1014095614948

18. Kohonen, T. (1988). Learning vector quantization. Neural Networks, 1, 303 doi: 10.1016/0893-6080(88)90334-6

19. Sane, S. S., Ghatol, A. A. (2007). A Novel supervised instance selection algorithm. International Journal of Business Intelligence and Data Mining, 2 (4), 471–495. doi:10.1504/IJBIDM.2007.016384

20. Subbotin, S. (2013). The neuro-fuzzy network synthesis and simplification on precedents in problems of diagnosis and pattern recognition. Optical Memory and Neural Networks (Information Optics), 22 (2), 97–103. doi: 10.3103/s1060992x13020082

21. Subbotin, S. A. (2013). Methods of sampling based on exhaustive and evolutionary search. Automatic Control and Computer Sciences, 47 (3), 113–121. doi: 10.3103/s0146411613030073

22. Kolisnyk, N. V., Subbotin, S. O. (2009). Modeling of immunopathogenesis of chronic obstructive bronchitis using neural networks. Presented at ІІ International Conference Modern problems of biology, ecology and chemistry, Zaporizhzhya, Ukraine, 124–125.


GOST Style Citations


1. Engelbrecht, A. Computational intelligence: an introduction [Text] / A. Engelbrecht. – Sidney: John Wiley & Sons, 2007. – 597 p. doi: 10.1002/9780470512517

2. Jankowski, N. Comparison of instance selection algorithms I. Algorithms survey [Text] / N. Jankowski, M. Grochowski // Artificial Intelligence and Soft Computing : 7th International Conference ICAISC-2004, Zakopane, 7–11 June, 2004 : proceedings. –Berlin: Springer, 2004. – P. 598–603. – (Lecture Notes in Computer Science, Vol. 3070. doi:10.1007/978-3-540-24844-6_90

3. Reinartz, T. A unifying view on instance selection [Text] / T. Reinartz // Data Mining and Knowledge Discovery. – 2002. – № 6. – P. 191–210. doi:10.1023/A:1014047731786

4. Hart, P. E. The condensed nearest neighbor rule [Text] / P. E. Hart // IEEE Transactions on Information Theory. – 1968. – Vol. 14. – P. 515–516. doi:10.1109/TIT.1968.1054155

5. Aha, D. W. Instance-based learning algorithms [Text] / D. W. Aha, D. Kibler, M. K. Albert // Machine Learning. – 1991. – № 6. – P. 37–66. doi:10.1023/A:1022689900470

6. Brighton, H. Advances in instance selection for instancebased learning algorithms [Text] / H. Brighton, C. Mellish // Data Mining and Knowledge Discovery. – 2002. – № 6. – P. 153–172. doi:10.1023/A:1014043630878

7. Wilson, D. R. Instance pruning techniques  [Text] / D. R.Wilson, T. R. Martinez // Machine Learning : Fourteenth International Conference ICML–1997,Nashville, 8-12 July 1997 : proceedings. –Burlington: Morgan Kaufmann, 1997. – P. 403–411.

8. Kibbler, D. Learning representative exemplars of concepts: an initial case of study  [Text] / D. Kibbler, D. W. Aha // Machine Learning : 4th International Workshop,Irvine, 22–25 June 1987 : proceedings. –Burlington: Morgan Kaufmann, 1987. – P. 24–30. doi:10.1016/b978-0-934613-41-5.50006-4

9. Gates, G. The reduced nearest neighbor rule  [Text] / G. Gates // IEEE Transactions on Information Theory. – 1972. – Vol. 18, Issue 3. – P. 431–433. doi:10.1109/TIT.1972.1054809

10. Wilson, D. L. Asymptotic properties of nearest neighbor rules using edited data [Text] / D. L. Wilson // IEEE Transactions on Systems, Man, Cybernetics. – 1972. – Vol. 2, Issue 3. – P. 408–421. doi:10.1109/TSMC.1972.4309137

11. Wilson, D. R. Reduction techniques for instancebased learning algorithms  [Text] / D. R. Wilson, T. R. Martinez // Machine Learning. – 2000. – Vol. 38, Issue 3. – P. 257–286. doi:10.1023/A:1007626913721

12. Ritter, G. L. An algorithm for a selective nearest neighbor decision rule [Text] / G. L. Ritter, H. B. Woodruff, S. R. Lowry, T. L. Isenhour // IEEE Transactions on Information Theory. – 1975. – Vol. 21, Issue 6. – P. 665–669. doi:10.1109/TIT.1975.1055464

13. Li, X. Data reduction via adaptive sampling [Text] / X. Li // Communications in Information and Systems. – 2002. – Vol. 2, Issue 1. – P. 5–38. doi:10.4310/cis.2002.v2.n1.a3

14. Domingo, C. Adaptive sampling methods for scaling up knowledge discovery algorithms [Text] / C. Domingo, R. Gavaldа, O. Watanabe // Discovery Science : Second International Conference, DS’99 Tokyo, 1999 : proceedings. –Berlin: Springer, 1999. – P. 172–183. doi:10.1007/3-540-46846-3_16

15. Li, B. Support cluster machine [Text] / B. Li, M. Chi, J. Fan, X. Xue // Machine Learning : 24th International Conference,Corvallis, 20–24 June 2007 : proceedings. –New York, 2007. – P. 505–512. doi:10.1145/1273496.1273560

16. Evans, R. Clustering for classification: using standard clustering methods to summarise datasets with minimal loss of classification accuracy [Text] / R. Evans. – Saarbrücken: VDM Verlag, 2008. – 108 p.

17. Madigan, D. Likelihood-based data squashing: a modeling approach to instance construction [Text] / D. Madigan, N. Raghavan, W. DuMouchel, M. Nason, C. Posse, G. Ridgeway // Data Mining and Knowledge Discovery. – 2002. – Vol. 6, Issue 2. – P. 173–190. doi:10.1023/A:1014095614948

18. Kohonen, T. Learning vector quantization [Text] / T. Kohonen // Neural Networks. – 1988. – Vol. 1. – P. 303. doi: 10.1016/0893-6080(88)90334-6

19. Sane, S. S. A Novel supervised instance selection algorithm [Text] / S. S. Sane, A. A. Ghatol // International Journal of Business Intelligence and Data Mining. – 2007. – Vol. 2, Issue 4. – P. 471–495. doi:10.1504/IJBIDM.2007.016384

20. Subbotin, S. The neuro-fuzzy network synthesis and simplification on precedents in problems of diagnosis and pattern recognition [Text] /S. Subbotin// Optical Memory and Neural Networks (Information Optics). – 2013. – Vol. 22, Issue 2. – P. 97–103. doi: 10.3103/s1060992x13020082

21. Subbotin, S. A. Methods of sampling based on exhaustive and evolutionary search [Text] / S. A. Subbotin // Automatic Control and Computer Sciences. – 2013. – Vol. 47, Issue 3. – P. 113–121. doi: 10.3103/s0146411613030073

22. Колісник, Н. В. Моделювання імунопатогенезу хронічного обструктивного бронхіту за допомогою нейромереж [Текст] : зб. матер. / Н. В. Колісник, С. О. Суботин // Сучасні проблеми біології, екології та хімії : ІІ Міжнародна конференція,Запоріжжя : ЗНУ, ,2009. – С. 124–125.






Copyright (c) 2014 Сергей Александрович Субботин

Creative Commons License
This work is licensed under a Creative Commons Attribution 4.0 International License.

ISSN (print) 1729-3774, ISSN (on-line) 1729-4061