Watermarks in the Sand: Impossibility of Strong Watermarking for Generative Models.
CoRR(2023)
摘要
Watermarking generative models consists of planting a statistical signal
(watermark) in a model's output so that it can be later verified that the
output was generated by the given model. A strong watermarking scheme satisfies
the property that a computationally bounded attacker cannot erase the watermark
without causing significant quality degradation. In this paper, we study the
(im)possibility of strong watermarking schemes. We prove that, under
well-specified and natural assumptions, strong watermarking is impossible to
achieve. This holds even in the private detection algorithm setting, where the
watermark insertion and detection algorithms share a secret key, unknown to the
attacker. To prove this result, we introduce a generic efficient watermark
attack; the attacker is not required to know the private key of the scheme or
even which scheme is used. Our attack is based on two assumptions: (1) The
attacker has access to a "quality oracle" that can evaluate whether a candidate
output is a high-quality response to a prompt, and (2) The attacker has access
to a "perturbation oracle" which can modify an output with a nontrivial
probability of maintaining quality, and which induces an efficiently mixing
random walk on high-quality outputs. We argue that both assumptions can be
satisfied in practice by an attacker with weaker computational capabilities
than the watermarked model itself, to which the attacker has only black-box
access. Furthermore, our assumptions will likely only be easier to satisfy over
time as models grow in capabilities and modalities. We demonstrate the
feasibility of our attack by instantiating it to attack three existing
watermarking schemes for large language models: Kirchenbauer et al. (2023),
Kuditipudi et al. (2023), and Zhao et al. (2023). The same attack successfully
removes the watermarks planted by all three schemes, with only minor quality
degradation.
更多查看译文
关键词
strong watermarking,sand,models
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要