Privacy protection based distributed clustering with deep learning algorithm for distributed data mining

Authors

DOI:

https://doi.org/10.15587/1729-4061.2022.263692

Keywords:

deep learning, privacy protection, distributed clustering, distributed data mining

Abstract

Distributed Data Mining (DDM) is vital in various applications for processing large volumes of data. The datasets are saved in the local databases and operated by local communities, but it provides the solution locally and globally. However, the datasets are stored in a distributed manner which affects the scalability and reliability issues. In addition, locally stored data is influenced by security and privacy challenges. In addition, the third party may access the DDM, which causes authorization issues. Therefore, the DDM process fuses sensor data from different sources to improve knowledge discovery. During this process, the DDM faces several issues such as security concerns, privacy restrictions, technical barriers, and trust issues. To address these issues, distributed data mining (DDM) should be improved to handle homogeneous and heterogeneous data. This work uses the privacy protection-based distributed clustering (PPDC) algorithm to handle the privacy and security challenges while analyzing the distributed data. The clustering algorithm generates the semi-trusted third parties to form the cluster, which protects the data from unauthorized users. The semi-trusted party protect the locally analyzed solution by creating the random vector-based trusted process. Further, the process uses the optimized deep learning approach and clustering to improve the heterogeneous data analysis. Then the effectiveness of the introduced PPDC method is compared with existing methods, and the PPDC algorithm ensures the 0.202 error rate, 0.95 % of accuracy and manages the data security.

Supporting Agency

  • The authors would like to thank to Technical Instructors Training Institute, Middle Technical University for supported, encourage and providing infrastructure to carry our research work.

Author Biographies

Alaa Thamer Mahmood, Technical Instructors Training Institute

Master of Information Technology, Assistant Lecturer

Middle Technical University

Raed Kamil Naser, Ministry of Defense, Iraq

Computer Officer, Military Training Directorate

Sura Khalil Abd, Dijlah University College

Doctor of Network and Communication Systems Engineering, Lecturer

Department of Computer Techniques Engineering

References

  1. Omidipoor, M., Toomanian, A., Neysani Samany, N., Mansourian, A. (2020). Knowledge Discovery Web Service for Spatial Data Infrastructures. ISPRS International Journal of Geo-Information, 10 (1), 12. doi: https://doi.org/10.3390/ijgi10010012
  2. Liu, X., Huang, Q., Gao, S., Xia, J. (2021). Activity knowledge discovery: Detecting collective and individual activities with digital footprints and open source geographic data. Computers, Environment and Urban Systems, 85, 101551. doi: https://doi.org/10.1016/j.compenvurbsys.2020.101551
  3. Qasem, M. H., Obeid, N., Hudaib, A., Almaiah, M. A., Al-Zahrani, A., Al-Khasawneh, A. (2021). Multi-Agent System Combined With Distributed Data Mining for Mutual Collaboration Classification. IEEE Access, 9, 70531–70547. doi: https://doi.org/10.1109/access.2021.3074125
  4. Mewada, S. (2021). Data Mining-Based Privacy Preservation Technique for Medical Dataset Over Horizontal Partitioned. International Journal of E-Health and Medical Communications, 12 (5), 50–66. doi: https://doi.org/10.4018/ijehmc.20210901.oa4
  5. Zhan, Z.-H., Shi, L., Tan, K. C., Zhang, J. (2021). A survey on evolutionary computation for complex continuous optimization. Artificial Intelligence Review, 55 (1), 59–110. doi: https://doi.org/10.1007/s10462-021-10042-y
  6. Lee, J.-S., Jun, S.-P. (2021). Privacy-preserving data mining for open government data from heterogeneous sources. Government Information Quarterly, 38 (1), 101544. doi: https://doi.org/10.1016/j.giq.2020.101544
  7. Cunha, M., Mendes, R., Vilela, J. P. (2021). A survey of privacy-preserving mechanisms for heterogeneous data types. Computer Science Review, 41, 100403. doi: https://doi.org/10.1016/j.cosrev.2021.100403
  8. Du, G., Zhang, J., Li, S., Li, C. (2021). Learning from class-imbalance and heterogeneous data for 30-day hospital readmission. Neurocomputing, 420, 27–35. doi: https://doi.org/10.1016/j.neucom.2020.08.064
  9. Soomro, T. A., Zheng, L., Afifi, A. J., Ali, A., Yin, M., Gao, J. (2021). Artificial intelligence (AI) for medical imaging to combat coronavirus disease (COVID-19): a detailed review with direction for future research. Artificial Intelligence Review, 55 (2), 1409–1439. doi: https://doi.org/10.1007/s10462-021-09985-z
  10. Alomari, E., Katib, I., Albeshri, A., Yigitcanlar, T., Mehmood, R. (2021). Iktishaf+: A Big Data Tool with Automatic Labeling for Road Traffic Social Sensing and Event Detection Using Distributed Machine Learning. Sensors, 21 (9), 2993. doi: https://doi.org/10.3390/s21092993
  11. Guo, Y., Zhao, R., Lai, S., Fan, L., Lei, X., Karagiannidis, G. K. (2022). Distributed Machine Learning for Multiuser Mobile Edge Computing Systems. IEEE Journal of Selected Topics in Signal Processing, 16 (3), 460–473. doi: https://doi.org/10.1109/jstsp.2022.3140660
  12. Matsumoto, N., Hamakawa, Y., Tatsumura, K., Kudo, K. (2022). Distance-based clustering using QUBO formulations. Scientific Reports, 12 (1). doi: https://doi.org/10.1038/s41598-022-06559-z
  13. Sharma, K. K., Seal, A. (2021). Spectral embedded generalized mean based k-nearest neighbors clustering with S-distance. Expert Systems with Applications, 169, 114326. doi: https://doi.org/10.1016/j.eswa.2020.114326
  14. Kotsiopoulos, T., Sarigiannidis, P., Ioannidis, D., Tzovaras, D. (2021). Machine Learning and Deep Learning in smart manufacturing: The Smart Grid paradigm. Computer Science Review, 40, 100341. doi: https://doi.org/10.1016/j.cosrev.2020.100341
  15. Du, J., Jiang, C., Gelenbe, E., Xu, L., Li, J., Ren, Y. (2018). Distributed Data Privacy Preservation in IoT Applications. IEEE Wireless Communications, 25 (6), 68–76. doi: https://doi.org/10.1109/mwc.2017.1800094
  16. Chamikara, M. A. P., Bertok, P., Khalil, I., Liu, D., Camtepe, S. (2021). Privacy preserving distributed machine learning with federated learning. Computer Communications, 171, 112–125. doi: https://doi.org/10.1016/j.comcom.2021.02.014
  17. Javid, T., Gupta, M. K., Gupta, A. (2022). A hybrid-security model for privacy-enhanced distributed data mining. Journal of King Saud University - Computer and Information Sciences, 34 (6), 3602–3614. doi: https://doi.org/10.1016/j.jksuci.2020.06.010
  18. Xia, C., Hua, J., Tong, W., Zhong, S. (2020). Distributed K-Means clustering guaranteeing local differential privacy. Computers & Security, 90, 101699. doi: https://doi.org/10.1016/j.cose.2019.101699
  19. Shewale, A., Keshavamurthy, B. N., Modi, C. N. (2018). An Efficient Approach for Privacy Preserving Distributed K-Means Clustering in Unsecured Environment. Recent Findings in Intelligent Computing Techniques, 425–431. doi: https://doi.org/10.1007/978-981-10-8639-7_44
  20. Xiong, J., Ren, J., Chen, L., Yao, Z., Lin, M., Wu, D., Niu, B. (2019). Enhancing Privacy and Availability for Data Clustering in Intelligent Electrical Service of IoT. IEEE Internet of Things Journal, 6 (2), 1530–1540. doi: https://doi.org/10.1109/jiot.2018.2842773
  21. Chen, Y., Xie, H., Lv, K., Wei, S., Hu, C. (2019). DEPLEST: A blockchain-based privacy-preserving distributed database toward user behaviors in social networks. Information Sciences, 501, 100–117. doi: https://doi.org/10.1016/j.ins.2019.05.092
  22. Ni, L., Li, C., Wang, X., Jiang, H., Yu, J. (2018). DP-MCDBSCAN: Differential Privacy Preserving Multi-Core DBSCAN Clustering for Network User Data. IEEE Access, 6, 21053–21063. doi: https://doi.org/10.1109/access.2018.2824798
  23. Zhang, T., Zhu, Q. (2018). Distributed Privacy-Preserving Collaborative Intrusion Detection Systems for VANETs. IEEE Transactions on Signal and Information Processing over Networks, 4 (1), 148–161. doi: https://doi.org/10.1109/tsipn.2018.2801622
  24. MNIST Database of Handwritten Digits. Available at: https://archive-beta.ics.uci.edu/ml/datasets/mnist+database+of+handwritten+digits
  25. Covertype Data Set. Available at: https://archive.ics.uci.edu/ml/datasets/covertype
  26. Dataset for Sensorless Drive Diagnosis Data Set. Available at: https://archive.ics.uci.edu/ml/datasets/dataset+for+sensorless+drive+diagnosis

Downloads

Published

2022-08-31

How to Cite

Mahmood, A. T., Kamil Naser, R., & Khalil Abd, S. (2022). Privacy protection based distributed clustering with deep learning algorithm for distributed data mining. Eastern-European Journal of Enterprise Technologies, 4(9(118), 48–58. https://doi.org/10.15587/1729-4061.2022.263692

Issue

Section

Information and controlling system