ONLINE FUZZY CLUSTERING OF HIGH DIMENSION DATA STREAMS BASED ON NEURAL NETWORK ENSEMBLES

Authors

DOI:

https://doi.org/10.30837/2522-9818.2019.7.016

Keywords:

clustering, fuzzy C-means method, sequential analysis of principal components, the ensemble of neuro-fuzzy networks, T. Kohonen’s neural network, self-learning

Abstract

The subject matter of the article is fuzzy clustering of high-dimensional data based on the ensemble approach, provided that a number and shape of clusters are not known. The goal of the work is to create the neuro-fuzzy approach for clustering data when the data stream is fed for online processing and a number and shape of clusters are unknown. The following tasks are solved in the article - the input feature space is compressed in the online mode; the model of neural network ensembles for data clustering is built; the ensemble of neuro-fuzzy networks for clustering high-dimensional data is developed; the approach for clustering data in the online mode is worked out. The following results are obtained - the main idea of the proposed approach is based on a modification of the fuzzy C-means algorithm. To reduce the dimension of the input space, the modified Hebb-Sanger network is suggested to be used; this net is characterized by the increased speed and is built on the basis of the modified Oja neurons. A speed-optimized learning algorithm for the Oja neuron is proposed. Such a network implements the method of principal components in the online mode with high speed. Conclusions. In the event the reduction-compression procedure cannot be used due to the probability of losing the physical meaning of the original space, a new clustering criterion was introduced; this criterion contains both a well-known polynomial fuzzifier and the weighment of individual components of the deviations of presented images from cluster centroids. The recurrent modification based on the algorithms proposed in this article is introduced. A mathematical model is developed to determine the quality of clustering with the use of the Xi-Beni index, which was modified for the online mode. The experimental results confirm the fact that the proposed system enables solving a wide range of Data Mining tasks when data sets are processed online, provided that a number and shape of clusters are not known and there is a large number of observations as well.

Author Biographies

Yevgeniy Bodyanskiy, Kharkiv National University of Radio Electronics

Doctor of Sciences (Engineering), Professor, Professor at the Department of Artificial Intelligence, Scientific Head at the CSRL

Iryna Perova, Kharkiv National University of Radio Electronics

PhD (Engineering Sciences), Senior Researcher, Associate Professor, Associate Professor at the Department of Biomedical Engineering

Polina Zhernova, Kharkiv National University of Radio Electronics

Assistant Lecturer at the Department of System Engineering

References

Gan, G., Ma, Ch., Wu, J. (2007), Data Clustering. Theory, Algorithms and Application, SIAM, Philadelphia, 489 p.

Xu, R., Wunsch, D. C. (2009), Clustering, IEEE Press Series on Computational Intelligence, John Wiley & Sons, Inc., Hoboken, NJ, 368 p.

Bifet, A. (2010), Adaptive Stream Mining. Pattern Learning and Mining from Evolving Data Streams, Amsterdam, IOS Press, 224 p.

Bezdek, J. C. (1981), Pattern Recognition with Fuzzy Objective Function Algorithms, N.Y., Plenum Press, 272 p

Kohonen, T. (1995), Self-Organizing Maps, Springer-Verlag, Berlin, 362 p.

Pelleg, D., Moor, A. (2000), "X-means: extending K-means with efficient estimation of the number of clusters", Proc. 17th Int. Conf. on Machine Learning, Morgan Kaufmann, San Francisco, P. 727–730.

Ishioka, T. (2005), "An expansion of X-means for automatically determining the optimal number of clusters", Proc. 4th IASTED Int. Conf. Computational Intelligence, Calgary, Alberta, P. 91–96.

Strehl, A., Ghosh, J. (2002), "Cluster Ensembles – A knowledge reuse framework for combining multiple partitions", Journal of Machine Learning Research, P. 583–617.

Topchy, A., Jain, A.K., Punch, W. (2005), "Clustering ensembles: models of consensus and weak partitions", IEEE Transactions on Pattern Analysis and Machine Intelligence, No. 27, P. 1866–1881.

Alizadeh, H., Minaei-Bidgoli, B., Parvin, H. (2013), "To improve the quality of cluster ensembles by selecting a subset of base clusters", Journal of Experimental & Theoretical Artificial Intelligence, No. 26, P. 127–150.

Charkhabi, M., Dhot, T., Mojarad, S.A. (2014), "Cluster ensembles, majority vote, voter eligibility and privileged voters", Int. Journal of Machine Learning and Computing, No. 4, P. 275–278

Zhernova, P., Deyneko, A., Bodyanskiy, Ye., Riepin, V. (2018), "Adaptive kernel data streams clustering based on neural networks ensembles in conditions of uncertainty about amount and shapes of clusters", IEEE Second International Conference on Data Stream Mining & Processing, August 21-25, Lviv, Ukraine, P. 7–12.

Bezdek, J., Keller, J., Krisnapuram, R., Pal, N. (2005), Fuzzy Models and Algorithms for Pattern Recognition and Image Processing, Springer, 776 p.

Gorshkov, Ye., Kolodyazhniy, V., Bodyanskiy, Ye. (2009), "New recursive learning algoritms for fuzzy Kohonen clustering network", In Proc. 17th Int. Workshop on Nonlinear Dynamics of Electronic Systems, Rapperwil, Switzerland, P. 58–61.

Höppner, F., Klawonn, F., Kruze, R. (1999), Fuzzy Klusteranalyse, Braunschweig, Vieweg, 280 p.

Höppner, F., Klawonn, F., Kruse, R. (1996), Fuzzy-Klusteranalyse, Verfahren für die Bilderkennung, Klassifikation und Datenanalyse, Braunschweig, Vieweg, 292 p.

Oja, E. (1989), "Neural Network, principal components and subspaces", Int. J. of Neural Systems, No. 1, P. 61–68.

Sanger, T. (1989), "Optimal unsupervised learning in a single-layer linear feedforward neural network", Neural Networks, No. 2, P. 459–473.

Bodyanskiy, Ye., Mihaliov, O., Pliss I. (2000), Adaptive fault detection in control systems using artifitial neural networks, Dniepropetrovsk : System Technologies, 140 p.

Ȕberla, K. (1997), Faktorenanalyse, Springer Verlag, Berlin Heidelberg New York, 398 p.

Oja, E. (1982), "A simplified neuron model as a principal component analyzer", J. of Math. Biology, No. 15, P. 267–273.

Vazan, M. T. (1969), Stochastic approximation, Cambridge, Cambridge University Press, 289 p.

Shakhovska, N., Medykovsky, M., Stakhiv, P. (2013), "Application of algorithms of classification for uncertainty reduction", Przeglad Elektrotechniczny, No. 4, P. 284–286.

Kolchygin, B. V., Bodyanskiy, Ye. V. (2013), "Adaptive fuzzy clustering with a variable fuzzifier", Cybernetics and Systems Analysis, No. 3, P. 366–374.

Keller, A., Klawonn F. (2000), "Fuzzy Clustering with weighting of data variables", Uncertainty, Fuzziness and Knowledge Based Systems, No. 8, P. 735-746.

Bodyanskiy, Ye., Kolchygin, B., Pliss I. (2011), "Adaptive neuro-fuzzy Kohonen network with variable fuzzifier", Inform. Theories and Appl, No. 3, P. 215–223.

Bodyanskiy, Ye., Zhernova, P. (2018), “Kernel fuzzy data stream clustering based on neural networks ensemble”, Inovative Technologies & Scientific Solutions for Industries, No. 4 (6), P. 42–49. DOI: https://doi.org/10.30837/2522-9818.2018.6.042.

Xie, X. L., Beni, G. A. (1991), "Validity Measure for Fuzzy Clustering", IEEE Transactions on Pattern Analysis and Machine Intelligence, No. 13, P. 841–847.

Bodyanskiy, Ye. V., Tyshchenko, O. K., Kopaliani, D. S. (2017), "An Evolving Connectionist System for Data Stream Fuzzy Clustering and Its Online Learning", Neurocomputing, No. 262, P. 41–56.

"Dermatology dataset", available at: http://archive.ics.uci.edu/ml/machine-learning-databases/dermatology/dermatology.data (last accessed: 1st of May, 2018).

Mulesa, P., Perova, I. (2015), "Fuzzy Spacial Extrapolation Method Using Manhattan Metrics for Tasks of Medical Data Mining", Computer Science and Information Technologies, CSIT’2015, Lviv, Ukraine, P. 104–106. DOI: https://doi.org/10.1109/STC-CSIT.2015.7325443.

Downloads

Published

2019-03-22

How to Cite

Bodyanskiy, Y., Perova, I., & Zhernova, P. (2019). ONLINE FUZZY CLUSTERING OF HIGH DIMENSION DATA STREAMS BASED ON NEURAL NETWORK ENSEMBLES. INNOVATIVE TECHNOLOGIES AND SCIENTIFIC SOLUTIONS FOR INDUSTRIES, (1 (7), 16–24. https://doi.org/10.30837/2522-9818.2019.7.016

Issue

Section

Peer-reviewed Article