Preliminary data classification by multilevel segmentation of histograms for clustering of hypercubes
DOI:
https://doi.org/10.15587/2706-5448.2020.220428Keywords:
cumulative histogram, multilevel segmentation, piecewise linear approximation, hierarchical clustering, decomposition of data space.Abstract
The object of research is an algorithm for the classification of large data based on the hierarchical clustering algorithm. The nonlinear complexity of the clustering algorithm does not allow for data samples of 5–10 thousand and above. To classify data, it is necessary to pre-group them. Therefore, the subject of research is the data segmentation algorithm based on piecewise linear approximation.
In the course of the study, let’s use hierarchical clustering algorithms, the method of piecewise linear approximation of the cumulative histogram, calculated by a special procedure, and the procedure for searching for segmentation thresholds.
The computational complexity of the classical hierarchical algorithm reaches the value of O(N3), and certain steps to limit the search can achieve the value of O(N2), which is confirmed by experiments to study the dependence of the hierarchical tree on the initial sample. An approximate approach to key clustering with partitioning of a set of basic keys is implemented. To reduce further the complexity of the hierarchical clustering algorithm, a decomposition approach based on splitting the initial sample of large data into a number of subsets is proposed. In this article to use the hierarchical clustering algorithm for big data classification the preliminary decomposition method is proposed. It is based on multilevel segmentation of cumulative or ordinary histograms obtained for every feature coordinate characterizing object of data. Thresholds of multilevel segmentation are obtained by piecewise linear approximation of histogram functions. Build hypercubes of data are being accepted as objects for three stages clustering algorithm.
Powerful tool for data classification is obtained, the use of which allows carrying out many experiments with data of various types to identify patterns among the data features. Its application is intended for the processing of patient data, molecular structures, economic problems for making optimal treatment decisions, diagnostics and modeling. Thanks to this approach, data classification can be performed online to obtain the results of direct analysis when data is received, for example, from spacecraft.
References
- Yip, A. M., Ding, C., Chan, T. F. (2006). Dynamic cluster formation using level set methods. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28 (6), 877–889. doi: http://doi.org/10.1109/tpami.2006.117
- Viattchenin, D. (2009). Developments in fuzzy clustering. The collection of papers. Minsk: Vever, 216.
- Sandeep, V. (2010). Effective level sets and shape detection: an application to natural images. Gulbarga: Electronics and Communications Engineering, 132.
- Grady, L., Schwartz, E. L. (2006). Isoperimetric graph partitioning for image segmentation. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28 (3), 469–475. doi: http://doi.org/10.1109/tpami.2006.57
- Pavan, M., Pelillo, M. (2007). Dominant Sets and Pairwise Clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29 (1), 167–172. doi: http://doi.org/10.1109/tpami.2007.250608
- Foggia, P., Percannella, G., Vento, M. (2014). Graph matching and learning in pattern recognition in the last 10 years. International Journal of Pattern Recognition and Artificial Intelligence, 28 (1), 1450001. doi: http://doi.org/10.1142/s0218001414500013
- Dong, X., Shen, J., Shao, L., Van Gool, L. (2016). Sub-Markov Random Walk for Image Segmentation. IEEE Transactions on Image Processing, 25 (2), 516–527. doi: http://doi.org/10.1109/tip.2015.2505184
- Buluç, A., Meyerhenke, H., Safro, I., Sanders, P., Schulz, C. (2016). Recent Advances in Graph Partitioning. Lecture Notes in Computer Science, 117–158. doi: http://doi.org/10.1007/978-3-319-49487-6_4
- Wang, N., Gao, X., Tao, D., Yang, H., Li, X. (2018). Facial feature point detection: A comprehensive survey. Neurocomputing, 275 (31), 50–65. doi: http://doi.org/10.1016/j.neucom.2017.05.013
- Ding, C., He, X. (2005). Cluster aggregate inequality and multilevel hierarchical clustering. Proc. 9th European Conf. Principles of Data Mining and Knowledge Discovery, 71–83. doi: http://doi.org/10.1007/11564126_12
- Melnyk, R., Tushnytskyy, R. (2009). Algorithm accuracy and complexity optimization by inequality merging for data clustering. Proc. of the 10-th Intern. Conf. The Experience of Designing and Application of CAD Systems in Microelectronics (CADSM), 453–455.
- Melnyk, R., Aleekseev, O. (2006). Clustering of pattern keys base on decomposition of their set. Editing and processing of information, 24 (100), 110−114.
- Melnyk, R., Tushnytskyy, R. (2008). Pattern keys clustering using large-scale dataset cascading decomposition. Computer Science and Information Technology, 604, 249−254.
- Shenkoya, T., Kim, E. (2019). A case study of the daedeok innopolis innovation cluster and its implications for Nigeria. World Technopolis Review, 8 (2), 104−119. doi: http://dx.doi.org/10.7165/wtr19a1218.21
- Ramer, U. (1972). An iterative procedure for the polygonal approximation of plane curves. Computer Graphics and Image Processing, 1 (3), 244–256. doi: http://doi.org/10.1016/s0146-664x(72)80017-0
- Douglas, D., Peucker, T. (1973). Algorithms for the reduction of the number of points required to represent a digitized line or its caricature. Cartographica: The International Journal for Geographic Information and Geovisualization, 10 (2), 112–122.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2020 Roman Melnyk, Ruslan Tushnytskyy, Roman Kvit, Tetyana Salo
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.