Improving efficiency of providing data group anonymity by automating data modification quality evaluation
Keywords:memetic algorithm, group anonymity, microfile, outlier, modified Thompson tau technique
In the work, a modification of the method for solving the task of providing data group anonymity is proposed, which implies automated solution selection without expert participation. Modification lies in identifying solutions to the task, in which outliers are detected automatically and don’t match the outliers in the initial distribution of the information about the group of respondents. Thus, automating the solution selection improves data group anonymization efficiency by reducing the time necessary for their analysis for masking sensitive features of the distribution.
Testing the developed modification is done by solving the task of masking regional distribution of military personnel in the state of New York. As a result of solving the corresponding group anonymization task, 1,000 solutions were obtained. It is established that only 24 out of 1,000 solutions, or 2.4 % of the total number, are feasible, i. e. the ones in which all the outliers are masked. Automated selection of such a small number of solutions is significantly faster than the manual approach, which speaks in favor of the proposed modification for improving data group anonymization efficiency.
- Rafalski, E. M. (Ed.) Health Insurance Portability and Accountability Act of 1996 (HIPAA). Encyclopedia of Health Services Research. doi: 10.4135/9781412971942.n180
- Patient Safety and Quality Improvement Act of 2005 (PSQIA) (2001). Federal Register, No. 73 (266).
- Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016: May 4, 2016 (2016). Official Journal of the European Union, L 119, 1–88.
- Pfitzmann, A., Hansen, M. (2010). A Terminology for Talking About Privacy by Data Minimization: Anonymity, Unlinkability, Undetectability, Unobservability, Pseudonymity, and Identity Management. Version v0.34. Privacy and data security. Available at: http://dud.inf.tu-dresden.de/Anon_Terminology.shtml
- Hawkins, D. (1980). Identification of Outliers. Springer, 198. doi: 10.1007/978-94-015-3994-4
- Chertov, O., Tavrov, D. (2010). Group Anonymity. Information Processing and Management of Uncertainty in Knowledge-Based Systems. Applications, 592–601. doi: 10.1007/978-3-642-14058-7_61
- Chertov, O., Tavrov, D. (2014). Microfiles as a Potential Source of Confidential Information Leakage. Studies in Computational Intelligence, 87–114. doi: 10.1007/978-3-319-08624-8_4
- Sweeney, L. (2002). k-Anonymity: A Model for Protecting Privacy. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 10 (05), 557–570. doi: 10.1142/s0218488502001648
- Angiuli, O., Waldo, J. (2016). Statistical Tradeoffs between Generalization and Suppression in the De-identification of Large-Scale Data Sets. 2016 IEEE 40th Annual Computer Software and Applications Conference (COMPSAC). doi: 10.1109/compsac.2016.198
- Templ, M., Meindl, B., Kowarik, A. (2015). Statistical Disclosure Control for Micro-Data Using the R Package sdcMicro. Journal of Statistical Software, 67 (4). doi: 10.18637/jss.v067.i04
- Machanavajjhala, A., Kifer, D., Gehrke, J., Venkitasubramaniam, M. (2007). L-Diversity: Privacy Beyond k-Anonymity. ACM Transactions on Knowledge Discovery from Data, 1 (1). doi: 10.1145/1217299.1217302
- Domingo-Ferrer, J., Soria-Comas, J. (2015). From t-closeness to differential privacy and vice versa in data anonymization. Knowledge-Based Systems, 74, 151–158. doi: 10.1016/j.knosys.2014.11.011
- Salazar-González, J.-J. (2008). Statistical confidentiality: Optimization techniques to protect tables. Computers & Operations Research, 35 (5), 1638–1651. doi: 10.1016/j.cor.2006.09.007
- Parmar, A. A., Rao, U. P., Patel, D. R. (2011). Blocking Based Approach for Classification Rule Hiding to Preserve the Privacy in Database. 2011 International Symposium on Computer Science and Society. doi: 10.1109/isccs.2011.103
- Singh, A., Bansal, D., Sofat, S. (2014). Privacy Preserving Techniques in Social Networks Data Publishing – A Review. International Journal of Computer Applications, 87 (15), 9–14. doi: 10.5120/15282-3880
- Chertov, O., Tavrov, D. (2016). Two-Phase Memetic Modifying Transformation for Solving the Task of Providing Group Anonymity. Studies in Fuzziness and Soft Computing, 239–253. doi: 10.1007/978-3-319-32229-2_17
- Kleinberg, J., Tardos, E. (2005). Algorithm Design. Pearson, 864.
- Tavrov, D. (2015). Memetic approach to anonymizing groups that can be approximated by a fuzzy inference system. 2015 Annual Conference of the North American Fuzzy Information Processing Society (NAFIPS) Held Jointly with 2015 5th World Conference on Soft Computing (WConSC). doi: 10.1109/nafips-wconsc.2015.7284189
- Chertov, O., Tavrov, D. (2014). Memetic Algorithm for Solving the Task of Providing Group Anonymity. Studies in Fuzziness and Soft Computing, 281–292. doi: 10.1007/978-3-319-03674-8_27
- Neri, F., Cotta, C. (2012). A Primer on Memetic Algorithms. Studies in Computational Intelligence, 43–52. doi: 10.1007/978-3-642-23247-3_4
- Eiben, A. E., Smith, J. E. (2015). Introduction to Evolutionary Computing. Berlin, Heidelberg: Springer-Verlag, 287. doi: 10.1007/978-3-662-44874-8
- Zhang, Y., Liu, J., Zhou, M., Jiang, Z. (2016). A multi-objective memetic algorithm based on decomposition for big optimization problems. Memetic Computing, 8 (1), 45–61. doi: 10.1007/s12293-015-0175-9
- Turky, A., Sabar, N. R., Song, A. (2016). A multi-population memetic algorithm for dynamic shortest path routing in mobile ad-hoc networks. 2016 IEEE Congress on Evolutionary Computation (CEC). doi: 10.1109/cec.2016.7744313
- Wang, Y., Chen, J., Sun, H., Yin, M. (2017). A Memetic Algorithm for Minimum Independent Dominating Set Problem. Neural Computing and Applications. doi: 10.1007/s00521-016-2813-7
- Jain, P., Srivastava, K., Saran, G. (2016). Minimizing cyclic cutwidth of graphs using a memetic algorithm. Journal of Heuristics, 22 (6), 815–848. doi: 10.1007/s10732-016-9319-4
- Aggarwal, C. C. (2013). Outlier Analysis. New York: Springer-Verlag, 461. doi: 10.1007/978-1-4614-6396-2
- Ruggles, S., Genadek, K., Goeken, R., Grover, J., Sobek, M. (2015). Integrated Public Use Microdata Series: Version 6.0. Minneapolis: University of Minnesota. Available at: https://usa.ipums.org/usa/
- Base Structure Report Fiscal Year 2014 Baseline – A Summary of the Real Property Inventory. Available at: https://www.acq.osd.mil/eie/Downloads/BSI/Base%20Structure%20Report%20FY14.pdf
- Syswerda, G. (1991). Schedule Optimization Using Genetic Algorithms. Handbook of Genetic Algorithms. New York: Van Nostrand Reinhold, 332–349.
- Brindle, A. (1981). Genetic Algorithms for Function Optimization. Edmonton: University of Alberta, Department of Computer Science, 93.
How to Cite
Copyright (c) 2017 Oleg Chertov, Dan Tavrov
This work is licensed under a Creative Commons Attribution 4.0 International License.
The consolidation and conditions for the transfer of copyright (identification of authorship) is carried out in the License Agreement. In particular, the authors reserve the right to the authorship of their manuscript and transfer the first publication of this work to the journal under the terms of the Creative Commons CC BY license. At the same time, they have the right to conclude on their own additional agreements concerning the non-exclusive distribution of the work in the form in which it was published by this journal, but provided that the link to the first publication of the article in this journal is preserved.
A license agreement is a document in which the author warrants that he/she owns all copyright for the work (manuscript, article, etc.).
The authors, signing the License Agreement with PC TECHNOLOGY CENTER, have all rights to the further use of their work, provided that they link to our edition in which the work was published.
According to the terms of the License Agreement, the Publisher PC TECHNOLOGY CENTER does not take away your copyrights and receives permission from the authors to use and dissemination of the publication through the world's scientific resources (own electronic resources, scientometric databases, repositories, libraries, etc.).
In the absence of a signed License Agreement or in the absence of this agreement of identifiers allowing to identify the identity of the author, the editors have no right to work with the manuscript.
It is important to remember that there is another type of agreement between authors and publishers – when copyright is transferred from the authors to the publisher. In this case, the authors lose ownership of their work and may not use it in any way.