Editing Arbitrary Propositions in LLMs without Subject Labels
CoRR(2024)
摘要
Large Language Model (LLM) editing modifies factual information in LLMs.
Locate-and-Edit (L&E) methods accomplish this by finding where relevant
information is stored within the neural network, and editing the weights at
that location. The goal of editing is to modify the response of an LLM to a
proposition independently of its phrasing, while not modifying its response to
other related propositions. Existing methods are limited to binary
propositions, which represent straightforward binary relations between a
subject and an object. Furthermore, existing methods rely on semantic subject
labels, which may not be available or even be well-defined in practice. In this
paper, we show that both of these issues can be effectively skirted with a
simple and fast localization method called Gradient Tracing (GT). This
localization method allows editing arbitrary propositions instead of just
binary ones, and does so without the need for subject labels. As propositions
always have a truth value, our experiments prompt an LLM as a boolean
classifier, and edit its T/F response to propositions. Our method applies GT
for location tracing, and then edit the model at that location using a mild
variant of Rank-One Model Editing (ROME). On datasets of binary propositions
derived from the CounterFact dataset, we show that our method – without access
to subject labels – performs close to state-of-the-art L&E methods which has
access subject labels. We then introduce a new dataset, Factual Accuracy
Classification Test (FACT), which includes non-binary propositions and for
which subject labels are not generally applicable, and therefore is beyond the
scope of existing L&E methods. Nevertheless, we show that with our method
editing is possible on FACT.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要