Privacy protection based distributed clustering with deep learning algorithm for distributed data mining
DOI:
https://doi.org/10.15587/1729-4061.2022.263692Keywords:
deep learning, privacy protection, distributed clustering, distributed data miningAbstract
Distributed Data Mining (DDM) is vital in various applications for processing large volumes of data. The datasets are saved in the local databases and operated by local communities, but it provides the solution locally and globally. However, the datasets are stored in a distributed manner which affects the scalability and reliability issues. In addition, locally stored data is influenced by security and privacy challenges. In addition, the third party may access the DDM, which causes authorization issues. Therefore, the DDM process fuses sensor data from different sources to improve knowledge discovery. During this process, the DDM faces several issues such as security concerns, privacy restrictions, technical barriers, and trust issues. To address these issues, distributed data mining (DDM) should be improved to handle homogeneous and heterogeneous data. This work uses the privacy protection-based distributed clustering (PPDC) algorithm to handle the privacy and security challenges while analyzing the distributed data. The clustering algorithm generates the semi-trusted third parties to form the cluster, which protects the data from unauthorized users. The semi-trusted party protect the locally analyzed solution by creating the random vector-based trusted process. Further, the process uses the optimized deep learning approach and clustering to improve the heterogeneous data analysis. Then the effectiveness of the introduced PPDC method is compared with existing methods, and the PPDC algorithm ensures the 0.202 error rate, 0.95 % of accuracy and manages the data security.
Supporting Agency
- The authors would like to thank to Technical Instructors Training Institute, Middle Technical University for supported, encourage and providing infrastructure to carry our research work.
References
- Omidipoor, M., Toomanian, A., Neysani Samany, N., Mansourian, A. (2020). Knowledge Discovery Web Service for Spatial Data Infrastructures. ISPRS International Journal of Geo-Information, 10 (1), 12. doi: https://doi.org/10.3390/ijgi10010012
- Liu, X., Huang, Q., Gao, S., Xia, J. (2021). Activity knowledge discovery: Detecting collective and individual activities with digital footprints and open source geographic data. Computers, Environment and Urban Systems, 85, 101551. doi: https://doi.org/10.1016/j.compenvurbsys.2020.101551
- Qasem, M. H., Obeid, N., Hudaib, A., Almaiah, M. A., Al-Zahrani, A., Al-Khasawneh, A. (2021). Multi-Agent System Combined With Distributed Data Mining for Mutual Collaboration Classification. IEEE Access, 9, 70531–70547. doi: https://doi.org/10.1109/access.2021.3074125
- Mewada, S. (2021). Data Mining-Based Privacy Preservation Technique for Medical Dataset Over Horizontal Partitioned. International Journal of E-Health and Medical Communications, 12 (5), 50–66. doi: https://doi.org/10.4018/ijehmc.20210901.oa4
- Zhan, Z.-H., Shi, L., Tan, K. C., Zhang, J. (2021). A survey on evolutionary computation for complex continuous optimization. Artificial Intelligence Review, 55 (1), 59–110. doi: https://doi.org/10.1007/s10462-021-10042-y
- Lee, J.-S., Jun, S.-P. (2021). Privacy-preserving data mining for open government data from heterogeneous sources. Government Information Quarterly, 38 (1), 101544. doi: https://doi.org/10.1016/j.giq.2020.101544
- Cunha, M., Mendes, R., Vilela, J. P. (2021). A survey of privacy-preserving mechanisms for heterogeneous data types. Computer Science Review, 41, 100403. doi: https://doi.org/10.1016/j.cosrev.2021.100403
- Du, G., Zhang, J., Li, S., Li, C. (2021). Learning from class-imbalance and heterogeneous data for 30-day hospital readmission. Neurocomputing, 420, 27–35. doi: https://doi.org/10.1016/j.neucom.2020.08.064
- Soomro, T. A., Zheng, L., Afifi, A. J., Ali, A., Yin, M., Gao, J. (2021). Artificial intelligence (AI) for medical imaging to combat coronavirus disease (COVID-19): a detailed review with direction for future research. Artificial Intelligence Review, 55 (2), 1409–1439. doi: https://doi.org/10.1007/s10462-021-09985-z
- Alomari, E., Katib, I., Albeshri, A., Yigitcanlar, T., Mehmood, R. (2021). Iktishaf+: A Big Data Tool with Automatic Labeling for Road Traffic Social Sensing and Event Detection Using Distributed Machine Learning. Sensors, 21 (9), 2993. doi: https://doi.org/10.3390/s21092993
- Guo, Y., Zhao, R., Lai, S., Fan, L., Lei, X., Karagiannidis, G. K. (2022). Distributed Machine Learning for Multiuser Mobile Edge Computing Systems. IEEE Journal of Selected Topics in Signal Processing, 16 (3), 460–473. doi: https://doi.org/10.1109/jstsp.2022.3140660
- Matsumoto, N., Hamakawa, Y., Tatsumura, K., Kudo, K. (2022). Distance-based clustering using QUBO formulations. Scientific Reports, 12 (1). doi: https://doi.org/10.1038/s41598-022-06559-z
- Sharma, K. K., Seal, A. (2021). Spectral embedded generalized mean based k-nearest neighbors clustering with S-distance. Expert Systems with Applications, 169, 114326. doi: https://doi.org/10.1016/j.eswa.2020.114326
- Kotsiopoulos, T., Sarigiannidis, P., Ioannidis, D., Tzovaras, D. (2021). Machine Learning and Deep Learning in smart manufacturing: The Smart Grid paradigm. Computer Science Review, 40, 100341. doi: https://doi.org/10.1016/j.cosrev.2020.100341
- Du, J., Jiang, C., Gelenbe, E., Xu, L., Li, J., Ren, Y. (2018). Distributed Data Privacy Preservation in IoT Applications. IEEE Wireless Communications, 25 (6), 68–76. doi: https://doi.org/10.1109/mwc.2017.1800094
- Chamikara, M. A. P., Bertok, P., Khalil, I., Liu, D., Camtepe, S. (2021). Privacy preserving distributed machine learning with federated learning. Computer Communications, 171, 112–125. doi: https://doi.org/10.1016/j.comcom.2021.02.014
- Javid, T., Gupta, M. K., Gupta, A. (2022). A hybrid-security model for privacy-enhanced distributed data mining. Journal of King Saud University - Computer and Information Sciences, 34 (6), 3602–3614. doi: https://doi.org/10.1016/j.jksuci.2020.06.010
- Xia, C., Hua, J., Tong, W., Zhong, S. (2020). Distributed K-Means clustering guaranteeing local differential privacy. Computers & Security, 90, 101699. doi: https://doi.org/10.1016/j.cose.2019.101699
- Shewale, A., Keshavamurthy, B. N., Modi, C. N. (2018). An Efficient Approach for Privacy Preserving Distributed K-Means Clustering in Unsecured Environment. Recent Findings in Intelligent Computing Techniques, 425–431. doi: https://doi.org/10.1007/978-981-10-8639-7_44
- Xiong, J., Ren, J., Chen, L., Yao, Z., Lin, M., Wu, D., Niu, B. (2019). Enhancing Privacy and Availability for Data Clustering in Intelligent Electrical Service of IoT. IEEE Internet of Things Journal, 6 (2), 1530–1540. doi: https://doi.org/10.1109/jiot.2018.2842773
- Chen, Y., Xie, H., Lv, K., Wei, S., Hu, C. (2019). DEPLEST: A blockchain-based privacy-preserving distributed database toward user behaviors in social networks. Information Sciences, 501, 100–117. doi: https://doi.org/10.1016/j.ins.2019.05.092
- Ni, L., Li, C., Wang, X., Jiang, H., Yu, J. (2018). DP-MCDBSCAN: Differential Privacy Preserving Multi-Core DBSCAN Clustering for Network User Data. IEEE Access, 6, 21053–21063. doi: https://doi.org/10.1109/access.2018.2824798
- Zhang, T., Zhu, Q. (2018). Distributed Privacy-Preserving Collaborative Intrusion Detection Systems for VANETs. IEEE Transactions on Signal and Information Processing over Networks, 4 (1), 148–161. doi: https://doi.org/10.1109/tsipn.2018.2801622
- MNIST Database of Handwritten Digits. Available at: https://archive-beta.ics.uci.edu/ml/datasets/mnist+database+of+handwritten+digits
- Covertype Data Set. Available at: https://archive.ics.uci.edu/ml/datasets/covertype
- Dataset for Sensorless Drive Diagnosis Data Set. Available at: https://archive.ics.uci.edu/ml/datasets/dataset+for+sensorless+drive+diagnosis
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2022 Alaa Thamer Mahmood, Raed Kamil Naser, Sura Khalil Abd
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with TECHNOLOGY CENTER PC, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher TECHNOLOGY CENTER PC does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.