Improving efficiency of providing data group anonymity by automating data modification quality evaluation

Authors

DOI:

https://doi.org/10.15587/1729-4061.2017.113046

Keywords:

memetic algorithm, group anonymity, microfile, outlier, modified Thompson tau technique

Abstract

In the work, a modification of the method for solving the task of providing data group anonymity is proposed, which implies automated solution selection without expert participation. Modification lies in identifying solutions to the task, in which outliers are detected automatically and don’t match the outliers in the initial distribution of the information about the group of respondents. Thus, automating the solution selection improves data group anonymization efficiency by reducing the time necessary for their analysis for masking sensitive features of the distribution.

Testing the developed modification is done by solving the task of masking regional distribution of military personnel in the state of New York. As a result of solving the corresponding group anonymization task, 1,000 solutions were obtained. It is established that only 24 out of 1,000 solutions, or 2.4 % of the total number, are feasible, i. e. the ones in which all the outliers are masked. Automated selection of such a small number of solutions is significantly faster than the manual approach, which speaks in favor of the proposed modification for improving data group anonymization efficiency.

Author Biographies

Oleg Chertov, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute” Peremohy ave., 37, Kyiv, Ukraine, 03056

Doctor of Technical Sciences, Associate Professor

Department of Applied Mathematics 

Dan Tavrov, National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute” Peremohy ave., 37, Kyiv, Ukraine, 03056

PhD

Department of Applied Mathematics 

References

  1. Rafalski, E. M. (Ed.) Health Insurance Portability and Accountability Act of 1996 (HIPAA). Encyclopedia of Health Services Research. doi: 10.4135/9781412971942.n180
  2. Patient Safety and Quality Improvement Act of 2005 (PSQIA) (2001). Federal Register, No. 73 (266).
  3. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016: May 4, 2016 (2016). Official Journal of the European Union, L 119, 1–88.
  4. Pfitzmann, A., Hansen, M. (2010). A Terminology for Talking About Privacy by Data Minimization: Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity Management. Version v0.34. Privacy and data security. Available at: http://dud.inf.tu-dresden.de/Anon_Terminology.shtml
  5. Hawkins, D. (1980). Identification of Outliers. Springer, 198. doi: 10.1007/978-94-015-3994-4
  6. Chertov, O., Tavrov, D. (2010). Group Anonymity. Information Processing and Management of Uncertainty in Knowledge-Based Systems. Applications, 592–601. doi: 10.1007/978-3-642-14058-7_61
  7. Chertov, O., Tavrov, D. (2014). Microfiles as a Potential Source of Confidential Information Leakage. Studies in Computational Intelligence, 87–114. doi: 10.1007/978-3-319-08624-8_4
  8. Sweeney, L. (2002). k-Anonymity: A Model for Protecting Privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10 (05), 557–570. doi: 10.1142/s0218488502001648
  9. Angiuli, O., Waldo, J. (2016). Statistical Tradeoffs between Generalization and Suppression in the De-identification of Large-Scale Data Sets. 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC). doi: 10.1109/compsac.2016.198
  10. Templ, M., Meindl, B., Kowarik, A. (2015). Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro. Journal of Statistical Software, 67 (4). doi: 10.18637/jss.v067.i04
  11. Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M. (2007). L-Diversity: Privacy Beyond k-Anonymity. ACM Transactions on Knowledge Discovery from Data, 1 (1). doi: 10.1145/1217299.1217302
  12. Domingo-Ferrer, J., Soria-Comas, J. (2015). From t-closeness to differential privacy and vice versa in data anonymization. Knowledge-Based Systems, 74, 151–158. doi: 10.1016/j.knosys.2014.11.011
  13. Salazar-González, J.-J. (2008). Statistical confidentiality: Optimization techniques to protect tables. Computers & Operations Research, 35 (5), 1638–1651. doi: 10.1016/j.cor.2006.09.007
  14. Parmar, A. A., Rao, U. P., Patel, D. R. (2011). Blocking Based Approach for Classification Rule Hiding to Preserve the Privacy in Database. 2011 International Symposium on Computer Science and Society. doi: 10.1109/isccs.2011.103
  15. Singh, A., Bansal, D., Sofat, S. (2014). Privacy Preserving Techniques in Social Networks Data Publishing – A Review. International Journal of Computer Applications, 87 (15), 9–14. doi: 10.5120/15282-3880
  16. Chertov, O., Tavrov, D. (2016). Two-Phase Memetic Modifying Transformation for Solving the Task of Providing Group Anonymity. Studies in Fuzziness and Soft Computing, 239–253. doi: 10.1007/978-3-319-32229-2_17
  17. Kleinberg, J., Tardos, E. (2005). Algorithm Design. Pearson, 864.
  18. Tavrov, D. (2015). Memetic approach to anonymizing groups that can be approximated by a fuzzy inference system. 2015 Annual Conference of the North American Fuzzy Information Processing Society (NAFIPS) Held Jointly with 2015 5th World Conference on Soft Computing (WConSC). doi: 10.1109/nafips-wconsc.2015.7284189
  19. Chertov, O., Tavrov, D. (2014). Memetic Algorithm for Solving the Task of Providing Group Anonymity. Studies in Fuzziness and Soft Computing, 281–292. doi: 10.1007/978-3-319-03674-8_27
  20. Neri, F., Cotta, C. (2012). A Primer on Memetic Algorithms. Studies in Computational Intelligence, 43–52. doi: 10.1007/978-3-642-23247-3_4
  21. Eiben, A. E., Smith, J. E. (2015). Introduction to Evolutionary Computing. Berlin, Heidelberg: Springer-Verlag, 287. doi: 10.1007/978-3-662-44874-8
  22. Zhang, Y., Liu, J., Zhou, M., Jiang, Z. (2016). A multi-objective memetic algorithm based on decomposition for big optimization problems. Memetic Computing, 8 (1), 45–61. doi: 10.1007/s12293-015-0175-9
  23. Turky, A., Sabar, N. R., Song, A. (2016). A multi-population memetic algorithm for dynamic shortest path routing in mobile ad-hoc networks. 2016 IEEE Congress on Evolutionary Computation (CEC). doi: 10.1109/cec.2016.7744313
  24. Wang, Y., Chen, J., Sun, H., Yin, M. (2017). A Memetic Algorithm for Minimum Independent Dominating Set Problem. Neural Computing and Applications. doi: 10.1007/s00521-016-2813-7
  25. Jain, P., Srivastava, K., Saran, G. (2016). Minimizing cyclic cutwidth of graphs using a memetic algorithm. Journal of Heuristics, 22 (6), 815–848. doi: 10.1007/s10732-016-9319-4
  26. Aggarwal, C. C. (2013). Outlier Analysis. New York: Springer-Verlag, 461. doi: 10.1007/978-1-4614-6396-2
  27. Ruggles, S., Genadek, K., Goeken, R., Grover, J., Sobek, M. (2015). Integrated Public Use Microdata Series: Version 6.0. Minneapolis: University of Minnesota. Available at: https://usa.ipums.org/usa/
  28. Base Structure Report Fiscal Year 2014 Baseline – A Summary of the Real Property Inventory. Available at: https://www.acq.osd.mil/eie/Downloads/BSI/Base%20Structure%20Report%20FY14.pdf
  29. Syswerda, G. (1991). Schedule Optimization Using Genetic Algorithms. Handbook of Genetic Algorithms. New York: Van Nostrand Reinhold, 332–349.
  30. Brindle, A. (1981). Genetic Algorithms for Function Optimization. Edmonton: University of Alberta, Department of Computer Science, 93.

Downloads

Published

2017-10-30

How to Cite

Chertov, O., & Tavrov, D. (2017). Improving efficiency of providing data group anonymity by automating data modification quality evaluation. Eastern-European Journal of Enterprise Technologies, 5(4 (89), 31–39. https://doi.org/10.15587/1729-4061.2017.113046

Issue

Section

Mathematics and Cybernetics - applied aspects