An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications
arxiv(2024)
摘要
Automatic Program Repair (APR) has garnered significant attention as a
practical research domain focused on automatically fixing bugs in programs.
While existing APR techniques primarily target imperative programming languages
like C and Java, there is a growing need for effective solutions applicable to
declarative software specification languages. This paper presents a systematic
investigation into the capacity of Large Language Models (LLMs) for repairing
declarative specifications in Alloy, a declarative formal language used for
software specification. We propose a novel repair pipeline that integrates a
dual-agent LLM framework, comprising a Repair Agent and a Prompt Agent. Through
extensive empirical evaluation, we compare the effectiveness of LLM-based
repair with state-of-the-art Alloy APR techniques on a comprehensive set of
benchmarks. Our study reveals that LLMs, particularly GPT-4 variants,
outperform existing techniques in terms of repair efficacy, albeit with a
marginal increase in runtime and token usage. This research contributes to
advancing the field of automatic repair for declarative specifications and
highlights the promising potential of LLMs in this domain.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要