New data clustering heuristic algorithm

Authors

  • Volodymyr Mosorov Lodz University of Technology Stefanowskiego str. 18\22, Lodz, Poland, 90-924, Poland
  • Taras Panskyi Lodz University of Technology Stefanowskiego str. 18\22, Lodz, Poland, 90-924, Poland

DOI:

https://doi.org/10.15587/1729-4061.2015.39785

Keywords:

clustering method, cluster, heuristic algorithm, density distribution, density based

Abstract

Clustering is the data mining technique that is used to place or collect objects into groups in such a way that objects in the same group are more similar or related among themselves than to those in other groups. These groups, called clusters, resemble each other but differ from other groups in objects which those contain. In this article the method of data clustering on the example of random data with uniform distribution was proposed. This article is focused on clustering in data mining. Data mining represents solving the problems by clustering large data sets with different data types and properties. The main task of the research was investigating data clustering and finding out how many clusters the data set contains. In particular, we were interested in answering the question whether there is more than one cluster in this data set. New method includes the decision rule. Decision rule uses the following parameters: area of regions found by the density distribution of input data, the number and magnitude of local maxima (peaks) found in each region, the number of elements (of the total number of primary elements) that fall into each found region. Proposed clustering method differs from existing, that the input parameter is the only data set and the criterion for evaluating the correctness of this method, is an objective assessment of a person or group of people based on visual logical analysis. All manipulations with the data mentioned in this article were made by using the Matlab software.

Author Biographies

Volodymyr Mosorov, Lodz University of Technology Stefanowskiego str. 18\22, Lodz, Poland, 90-924

Doctor of Technical Sciences

Institute of Applied Computer Science

Taras Panskyi, Lodz University of Technology Stefanowskiego str. 18\22, Lodz, Poland, 90-924

graduate student

Institute of Applied Computer Science

References

  1. Kudo, M., Sklansky, J. (2000). Comparison of algorithms that select features for pattern classifiers. Pattern Recognition, 33 (1), 25–41. doi: 10.1016/S0031-3203(99)00041-2
  2. Wernick, M. N., Yang, Y., Brankov, J. G., Yourganov, G., Strother, S. C. (2010) "Machine Learning in Medical Imaging", IEEE Signal Processing Magazine, 27 (4), 25–38. doi: 10.1109/msp.2010.936730
  3. Solomon, C. J., Breckon, T. P. (2010). Fundamentals of Digital Image Processing: A Practical Approach with Examples in Matlab. Wiley-Blackwell, 328. doi: 10.1002/9780470689776
  4. McCallum, A., Nigam, K., Ungar, L. H. (2000). Efficient Clustering of High Dimensional Data Sets with Application to Reference Matching. Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 169–178. doi: 10.1145/347090.347123
  5. Deepti. S., Lokesh. S., Sheetal. S., Khushboo. S. (2012). Clustering Techniques: A Brief Survey of Different Clustering Algorithms. International Journal of Latest Trends in Engineering and Technology (IJLTET), 1, 82–87.
  6. Khushali, M., Swapnil, A., Sahista, M. (2013) NDCMD: A Novel Approach Towards Density Based Clustering Using Multidimensional Spatial Data. International Journal of Engineering Research & Technology (IJERT), 2 (6).
  7. Shou, S.-G., Zhou, A.-Y. Jin, W., Fan, Y., Qian, W.-N. (2000). A Fast DBSCAN Algorithm. Journal of Software, 735–744.
  8. Peter, J. H., Antonysamy, A. (2010). An Optimised Density Based Clustering Algorithm. International Journal of Computer Applications, 6 (9), 20–25. doi: 10.5120/1102-1445
  9. Wei, W., Shuang, Z., Bingfei, R., Suoju, H. (2013). improved VDBscan with global optimum K.
  10. Birant, D., Kut, A. (2007). ST-DBSCAN: An algorithm for clustering spatial–temporal data. Data & Knowledge Engineering, 60 (1), 208–221. doi: 10.1016/j.datak.2006.01.013
  11. Navneet, G., Poonam, G., Venkatramaiah, K., Deepak, P. C., Sanoop, P. S. (2011). An Efficient Density Based Incremental Clustering Algorithm in Data Warehousing Environment. 2009 International Conference on Computer Engineering and Applications IPCSIT, 2.
  12. Rehman, M., Mehdi, S. A. Comparison of density-based clustering algorithms. Available at: https://www.google.com.ua/url?sa=t&rct=j&q=&esrc=s&source=web&cd=1&ved=0CBwQFjAA&url=http%3A%2F%2Fwww.researchgate.net%2Fprofile%2FSyed_Atif_Mehdi%2Fpublication%2F242219043_COMPARISON_OF_DENSITY-BASED_CLUSTERING_ALGORITHMS%2Flinks%2F5422e1120cf26120b7a6b36e.pdf&ei=LHgRVaSTA6Gv7Abh34CACw&usg=AFQjCNFA9JnzuIbam4BOKYCS_30Yw8Czmg&sig2=wNiTYQiNzFKcDOfEV3mLFw&cad=rja
  13. Berkhin, P. (2002). Survey Of Clustering Data Mining Techniques. Available at: http://www.cc.gatech.edu/~isbell/reading/papers/berkhin02survey.pdf
  14. Abu Abbas, O. (2008).Comparison Between Data Clustering Algorithm. The International Arab Journal of Information Technology, 5 (3), 320–325.
  15. Gan, G., Chaoqun, M., Jianhong, W. (2007). Data Clustering: Theory, Algorithms, and Applications. ASA-SIAM Series on Statistics and Applied Probability, SIAM, Philadelphia, ASA, Alexandria, 466. doi: 10.1137/1.9780898718348
  16. Jiawei, H., Kamber, M., Pei, J. (2006). Data Mining: Concepts and Techniques, Second Edition. Series Editor Morgan Kaufmann Publishers, 800.
  17. Riley, K. F., Hobson, M. P., Bence, S. J. (2010).Mathematical methods for physics and engineering. Cambridge University Press, 1359.
  18. Anil, K. J., Dubes, R. C. (1988). Algorithms for clustering data. Prentice-Hall, Inc. Upper Saddle River, NJ, USA.

Downloads

Published

2015-04-20

How to Cite

Mosorov, V., & Panskyi, T. (2015). New data clustering heuristic algorithm. Eastern-European Journal of Enterprise Technologies, 2(9(74), 10–16. https://doi.org/10.15587/1729-4061.2015.39785

Issue

Section

Information and controlling system