An Efficient Guilt Detection Approach for Identifying Data Leakages

International Journal of Computer Trends and Technology (IJCTT)          
© 2015 by IJCTT Journal
Volume-25 Number-2
Year of Publication : 2015
Authors : Anand Kiran
DOI :  10.14445/22312803/IJCTT-V25P111


Anand Kiran "An Efficient Guilt Detection Approach for Identifying Data Leakages". International Journal of Computer Trends and Technology (IJCTT) V25(2):62-67, July 2015. ISSN:2231-2803. Published by Seventh Sense Research Group.

Abstract -
In this paper we develop a model for assessing the “guilt” of agents. We also present algorithms for distributing objects to agents, in a way that improves our chances of identifying a leaker. Finally, we also consider the option of adding “fake” objects to the distributed set. Such objects do not correspond to real entities but appear realistic to the agents. In a sense, the fake objects acts as a type of watermark for the entire set, without modifying any individual members. If it turns out an agent was given one or more fake objects that were leaked, then the distributor can be more confident that agent was guilty. A distributor owns a set T = {t1, t2, . . . , tm} of valuable data objects. The distributor wants to share some of the objects with a set of agents U1, U2,… Un, but does not wish the objects be leaked to other third parties. The objects in T could be of any type and size, e.g., they could be tuples in a relation, or relations in a database. An agent Ui receives a subset of objects Ri ? T, determined either by a sample request or an explicit request: • Sample request Ri = SAMPLE(T,mi): Any subset of mi records from T can be given to Ui. • Explicit request Ri = EXPLICIT(T, condi): Agent Ui receives all the T objects that satisfy condi. A data distributor has given sensitive data to a set of supposedly trusted agents (third parties). Some of the data is leaked and found in an unauthorized place (e.g., on the web or somebody’s laptop). The distributor must assess the likelihood that the leaked data came from one or more agents, as opposed to having been independently gathered by other means.

[1] R. Agrawal and J. Kiernan,“Watermarking Relational Databases”Proc 28th Int’l Conf. Very Large Data Bases (VLDB ’02), VLDB Endowment, pp. 155-166, 2002.
[2] P. Bonatti, S.D.C. di Vimercati, and P. Samarati, “An Algebra for Composing Access Control Policies,” ACM Trans. Information and System Security, vol. 5, no. 1, pp. 1-35, 2002.
[3] P. Buneman, S. Khanna, and W.C. Tan, “Why and Where: A Characterization of Data Provenance,” Proc. Eighth Int’l Conf.Database Theory (ICDT ’01), J.V. den Bussche and V. Vianu, eds.,pp. 316-330, Jan. 2001.
[4] P. Buneman and W.-C. Tan, “Provenance in Databases,” Proc.ACM SIGMOD, pp. 1171-1173, 2007.
[5] Y. Cui and J. Widom, “Lineage Tracing for General Data Warehouse Transformations,” The VLDB J., vol. 12, pp. 41- 58,2003.
[6] S. Czerwinski, R. Fromm, and T. Hodes, “Digital Music Distribution and Audio Watermarking,”, 2007.
[7] F. Guo, J. Wang, Z. Zhang, X. Ye, and D. Li, “An Improved Algorithm to Watermark Numeric Relational Data,” Information Security Applications, pp. 138-149, Springer, 2006.
[8] F. Hartung and B. Girod, “Watermarking of Uncompressed and Compressed Video,” Signal Processing, vol. 66, no. 3, pp. 283-301,1998.
[9] S. Jajodia, P. Samarati, M.L. Sapino, and V.S. Subrahmanian, “Flexible Support for Multiple Access Control Policies,” ACM Trans. Database Systems, vol. 26, no. 2, pp. 214-260, 2001.
[10] Y. Li, V. Swarup, and S. Jajodia, “Fingerprinting Relational Databases: Schemes and Specialties,” IEEE Trans. Dependable and Secure Computing, vol. 2, no. 1, pp. 34-45, Jan.-Mar. 2005.
[11] B. Mungamuru and H. Garcia-Molina, “Privacy, Preservation and Performance: The 3 P’s of Distributed Data Management,”technical report, Stanford Univ., 2008.
[12] V.N. Murty, “Counting the Integer Solutions of a Linear Equation with Unit Coefficients,” Math. Magazine, vol. 54, no. 2, pp. 79-81,1981.
[13] S.U. Nabar, B. Marthi, K. Kenthapadi, N. Mishra, and R. Motwani,“Towards Robustness in Query Auditing,” Proc. 32nd Int’l Conf.Very Large Data Bases (VLDB ’06), VLDB Endowment, pp. 151-162,2006.
[14] P. Papadimitriou and H. Garcia-Molina, “Data Leakage Detection,”technical report, Stanford Univ., 2008.
[15] P.M. Pardalos and S.A. Vavasis, “Quadratic Programming with One Negative Eigenvalue Is NP-Hard,” J. Global Optimization,vol. 1, no. 1, pp. 15-22, 1991.

Fake Object, Guilty Agent, Data Object, Third Party, Watermark, Data Warehousing.