DOI: https://doi.org/10.30837/2522-9818.2018.6.042

### KERNEL FUZZY CLUSTERING OF DATA STREAMS BASED ON THE ENSEMBLE OF NEURAL NETWORKS

#### Abstract

**subject matter**of the study is data clustering based on the ensemble of neural networks. The

**goal**of the work is to create a new approach to solving the tasks of clustering in data streams when information is fed observation-by-observation in online mode. The following

**tasks**were solved in the article: the model of neural network ensembles for data clustering was created, the methods of data clustering to process mass data were developed, the methods of online data clustering of data using neural network ensembles working in the parallel mode were developed. The following

**results**were obtained: the operation principles of the ensembles of the Kohonen neural network were formulated and practical requirements for dealing with mass data were specified. The probable approaches to solving these problems were indicated. The operation principle of the ensemble of parallel tuned Kohonen clustering networks was studied. The procedures based on the WTA and WTM principles were used to train layers of the neural network ensemble. Radial basis functions were used to increase the dimension of the input space. The mathematical model was developed for solving the problem of data clustering in online mode. The mathematical model was developed to determine the quality of clustering using the Davies-Bouldin index, which was rewritten for online mode.

**Conclusions**. The paper proposes a new approach to solving the problem of clustering data streams when information is fed observation-by-observation in online mode, provided that the number and shape of clusters are unknown in advance. The main idea of this approach is based on the ensemble of neural networks, which consists of Kohonen self-organizing maps. All members of the ensemble process information that is sequentially fed into the system in parallel mode. Experimental results confirmed the fact that the considered system can be used to solve a wide range of Data Stream Mining tasks.

#### Keywords

#### Full Text:

PDF (Українська)#### References

Gan, G., Ma, Ch., Wu, J. (2007), Data Clustering: Theory, Algorithms and Applications, Philadelphia : SIAM.

Xu, R., Wunsch, D. C. (2009), Clustering, Hoboken, NJ : John Wiley & Sons, Inc., IEEE Press Series on Computational Intelligence.

Aggarwal, C. C., Reddy, C. K. (2014), Data Clustering, Algorithms and Application, Boca Raton : CRC Press.

Pelleg, D., Moor, A. (2000), "X-means: extending K-means with efficient estimation of the number of clusters", In: Proc. 17th Int. Conf. on Machine Learning, Morgan Kaufmann, San Francisco, P.727–730.

Ishioka, T. (2005), "An expansion of X-means for automatically determining the optimal number of clusters", In: Proc. 4th IASTED Int. Conf. Computational Intelligence, Calgary, Alberta, P. 91–96.

Rutkowski, L. (2008), Computational Intelligence. Methods and Techniques, Berlin-Heidelberg: Springer-Verlag.

Mumford, C. and Jain, L. (2009), Computational Intelligence. Collaboration, Fuzzy and Emergence, Berlin : Springer-Vergal.

Kruse, R., Borgelt, C., Klawonn, F., Moewes, C., Steinbrecher, M. and Held, P. (2013), Computational Intelligence. A Methodological Introduction, Berlin : Springer.

Du, K. L. and Swamy, M. N. S. (2014), Neural Networks and Statistical Learning, London : Springer-Verlag.

Kohonen, T. (1995), Self-Organizing Maps, Berlin : Springer-Verlag.

Strehl, A., Ghosh, J. (2002), "Cluster ensembles – A knowledge reuse framework for combining multiple partitions", Journal of Machine Learning Research, P. 583–617.

Topchy, A., Jain, A. K., Punch, W. (2005), "Clustering ensembles: models of consensus and weak partitions", IEEE Transactions on Pattern Analysis and Machine Intelligence, No. 27, P. 1866–1881.

Alizadeh, H., Minaei-Bidgoli, B., Parvin, H. (2013), "To improve the quality of cluster ensembles by selecting a subset of base clusters", Journal of Experimental & Theoretical Artificial Intelligence, No. 26, P. 127–150.

Charkhabi M., Dhot T., Mojarad S. A. (2014), "Cluster ensembles, majority vote, voter eligibility and privileged voters", Int. Journal of Machine Learning and Computing, No. 4, P. 275–278.

Bodyanskiy, Ye. V., Deineko, A. A., Zhernova, P. Ye., Riepin, V. O. (2017), "Adaptive modification of X-means method based on the ensemble of the T. Kohonen’s clustering neural networks", Materials of the VI Int. Sci. Conf. "Information Managements Systems and Technologies", Odessa, P. 202–204.

Bezdek, J. C., Keller, J., Krishnapuram, R., Pal N. (1999), Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, The Handbook of Fuzzy Sets, Kluwer, Dordrecht, Netherlands : Springer, Vol. 4.

Cover, T. M. (1956), "Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition", IEEE Trans. on Electronic Computers, No. 14, P. 326–334.

Girolami, M. (2002), "Mercer kernel-based clustering in feature space", IEEE Trans. on Neural Networks, Vol. 13, No. 3, P. 780–784.

MacDonald, D., Fyfe, C. (2002), "Clustering in data space and feature space", ESANN'2002 Proc. European Symp. on Artificial Neural Networks, Bruges (Belgium), P. 137–142.

Camastra, F., Verri, A. (2005), "A novel kernel method for clustering," IEEE Trans. on Pattern Analysis and Machine Intelligence, No. 5, P. 801–805.

Bodyanskiy, Ye. V., Deineko, A. A., Kutsenko, Y. V., "On-line kernel clustering based on the general regression neural network and T. Kohonen’s self-organizing map", Automatic Control and Computer Sciences, No. 51 (1), P. 55–62.

Davies, D. L., Bouldin, D. W. (1979), "A Cluster Separation Measure", IEEE Transactions on Pattern Analysis and Machine Intelligence, No. 4, P. 224–227.

Murphy, P. M., Aha, D. (1994), UCI Repository of machine learning databases, available at : http://www.ics.uci.edu/mlearn/MLRepository.html, Department of Information and Computer Science, CA : University of California.

#### GOST Style Citations

1. Gan G., Ma Ch., Wu J. Data Clustering: Theory, Algorithms and Applications. Philadelphia : SIAM, 2007.

2. Xu R., Wunsch D. C. Clustering. Hoboken, NJ : John Wiley & Sons, Inc., 2009. IEEE Press Series on Computational Intelligence,

3. Aggarwal C. C., ReddyC. K. Data Clustering. Algorithms and Application. Boca Raton : CRC Press, 2014.

4. Pelleg D., Moor A. X-means: extending K-means with efficient estimation of the number of clusters. In: Proc. 17th Int. Conf. on Machine Learning. San Francisco : Morgan Kaufmann, 2000. P. 727–730.

5. Ishioka T. An expansion of X-means for automatically determining the optimal number of clusters. In: Proc. 4th IASTED Int. Conf. Computational Intelligence. Calgary, Canada : Proceedings of International Conference on Computational Intelligence, 2005. P. 91–96.

6. Rutkowski L. Computational Intelligence. Methods and Techniques.Berlin-Heidelberg : Springer-Verlag, 2008.

7. Mumford C., Jain L. Computational Intelligence. Collaboration, Fuzzy and Emergence. Berlin : Springer-Vergal, 2009.

8. Kruse R., Borgelt C., Klawonn F., Moewes C., Steinbrecher M., Held P. Computational Intelligence. A Methodological Introduction. Berlin : Springer, 2013.

9. Du K.L., Swamy M.N.S. Neural Networks and Statistical Learning. London : Springer-Verlag, 2014.

10. Kohonen T. Self-Organizing Map. Berlin : Springer-Verlag, 1995.

11. Strehl A., Ghosh J. Cluster ensembles – A knowledge reuse framework for combining multiple partitions.*Journal of Machine Learning Research*. 2002.P. 583–617.

12. Topchy A., Jain A.K., Punch W. Clustering ensembles: models of consensus and weak partitions.*IEEE Transactions on Pattern Analysis and Machine Intelligence*. 2005.No. 27. P. 1866–1881.

13. Alizadeh H., Minaei-Bidgoli B., Parvin H. To improve the quality of cluster ensembles by selecting a subset of base clusters. *Journal of Experimental & Theoretical Artificial Intelligence*.2013. No. 26. P. 127–150.

14. Charkhabi M., Dhot T., Mojarad S.A. Cluster ensembles, majority vote, voter eligibility and privileged voters. *Int. Journal of Machine Learning and Computing*. 2014. No. 4. P. 275–278.

15. Bodyanskiy Ye.V., Deineko A.A., Zhernova P.Ye., Riepin V.O. Adaptive modification of X-means method based on the ensemble of the T. Kohonen’s clustering neural networks.Odessa : Materials of the VI Int. Sci. Conf. "Information Managements Systems and Technologies", 2017. P. 202–204.

16. Bezdek J.C., Keller J., Krishnapuram R., Pal N. Fuzzy Models and Algorithms for Pattern Recognition and Image Processing. The Handbook of Fuzzy Sets.Dordrecht, Netherlands : Springer, 1999. Vol. 4.

17. Cover T.M. Geometrical and statistical properties of systems of linear inequalities with applications in pattern recognition. IEEE Trans. on Electronic Computers. 1965. No. 14. P. 326–334.

18. Girolami M. Mercer kernel-based clustering in feature space. IEEE Trans. on Neural Networks. 2002. No. 3.Vol. 13. P. 780–784.

19. MacDonald D., Fyfe C. Clustering in data space and feature space. Belgium : ESANN'2002 Proc. European Symp. on Artificial Neural Networks. Bruges (24-26 April 2002), 2002. P. 137–142.

20. Camastra F., Verri A. A novel kernel method for clustering. IEEE Trans. on Pattern Analysis and Machine Intelligence. 2005. No. 5. P. 801–805.

21. Bodyanskiy Ye. V., Deineko A. A., Kutsenko Y. V. On-line kernel clustering based on the general regression neural network and T. Kohonen’s self-organizing map. Automatic Control and Computer Sciences. 2017. No. 51. P. 55–62.

22. Davies D.L., Bouldin D.W. A Cluster Separation Measure. IEEE Transactions on Pattern Analysis and Machine Intelligence. 1979. No. 4. P. 224–227.

23. Murphy P. M., Aha D. UCI Repository of machine learning databases. Department of Information and Computer Science, CA : University of California, 1994. URL : http://www.ics.uci.edu/mlearn/MLRepository.html.

Copyright (c) 2018 Поліна Євгеніївна Жернова, Євгеній Володимирович Бодянський

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

All the articles published in ITSSI journal are licensed under CC BY-NC-SA 4.0