Privacy-Optimized Randomized Response for Sharing Multi-Attribute Data
CoRR(2024)
摘要
With the increasing amount of data in society, privacy concerns in data
sharing have become widely recognized. Particularly, protecting personal
attribute information is essential for a wide range of aims from crowdsourcing
to realizing personalized medicine. Although various differentially private
methods based on randomized response have been proposed for single attribute
information or specific analysis purposes such as frequency estimation, there
is a lack of studies on the mechanism for sharing individuals' multiple
categorical information itself. The existing randomized response for sharing
multi-attribute data uses the Kronecker product to perturb each attribute
information in turn according to the respective privacy level but achieves only
a weak privacy level for the entire dataset. Therefore, in this study, we
propose a privacy-optimized randomized response that guarantees the strongest
privacy in sharing multi-attribute data. Furthermore, we present an efficient
heuristic algorithm for constructing a near-optimal mechanism. The time
complexity of our algorithm is O(k^2), where k is the number of attributes, and
it can be performed in about 1 second even for large datasets with k = 1,000.
The experimental results demonstrate that both of our methods provide
significantly stronger privacy guarantees for the entire dataset than the
existing method. In addition, we show an analysis example using genome
statistics to confirm that our methods can achieve less than half the output
error compared with that of the existing method. Overall, this study is an
important step toward trustworthy sharing and analysis of multi-attribute data.
The Python implementation of our experiments and supplemental results are
available at https://github.com/ay0408/Optimized-RR.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要