Date Available

12-14-2011

Year of Publication

2008

Degree Name

Doctor of Philosophy (PhD)

Document Type

Dissertation

College

Engineering

Department

Computer Science

First Advisor

Dr. Jun Zhang

Abstract

Access to huge amounts of various data with private information brings out a dual demand for preservation of data privacy and correctness of knowledge discovery, which are two apparently contradictory tasks. Low-rank approximations generated by matrix decompositions are a fundamental element in this dissertation for the privacy preserving data mining (PPDM) applications. Two categories of PPDM are studied: data value hiding (DVH) and data pattern hiding (DPH). A matrix-decomposition-based framework is designed to incorporate matrix decomposition techniques into data preprocessing to distort original data sets. With respect to the challenge in the DVH, how to protect sensitive/confidential attribute values without jeopardizing underlying data patterns, we propose singular value decomposition (SVD)-based and nonnegative matrix factorization (NMF)-based models. Some discussion on data distortion and data utility metrics is presented. Our experimental results on benchmark data sets demonstrate that our proposed models have potential for outperforming standard data perturbation models regarding the balance between data privacy and data utility.

Based on an equivalence between the NMF and K-means clustering, a simultaneous data value and pattern hiding strategy is developed for data mining activities using K-means clustering. Three schemes are designed to make a slight alteration on submatrices such that user-specified cluster properties of data subjects are hidden. Performance evaluation demonstrates the efficacy of the proposed strategy since some optimal solutions can be computed with zero side effects on nonconfidential memberships. Accordingly, the protection of privacy is simplified by one modified data set with enhanced performance by this dual privacy protection.

In addition, an improved incremental SVD-updating algorithm is applied to speed up the real-time performance of the SVD-based model for frequent data updates. The performance and effectiveness of the improved algorithm have been examined on synthetic and real data sets. Experimental results indicate that the introduction of the incremental matrix decomposition produces a significant speedup. It also provides potential support for the use of the SVD technique in the On-Line Analytical Processing for business data analysis.

Recommended Citation

Wang, Jie, "MATRIX DECOMPOSITION FOR DATA DISCLOSURE CONTROL AND DATA MINING APPLICATIONS" (2008). University of Kentucky Doctoral Dissertations. 677.
https://uknowledge.uky.edu/gradschool_diss/677

Download

Included in

Computer Engineering Commons

COinS

University of Kentucky Doctoral Dissertations

MATRIX DECOMPOSITION FOR DATA DISCLOSURE CONTROL AND DATA MINING APPLICATIONS

Date Available

Year of Publication

Degree Name

Document Type

College

Department

First Advisor

Abstract

Recommended Citation

Included in

Search

Browse by Author

Author Corner

Connect

University of Kentucky Doctoral Dissertations

MATRIX DECOMPOSITION FOR DATA DISCLOSURE CONTROL AND DATA MINING APPLICATIONS

Author

Date Available

Year of Publication

Degree Name

Document Type

College

Department

First Advisor

Abstract

Recommended Citation

Included in

Share

Search

Browse by Author

Author Corner

Connect