Exploring the Effectiveness of LLMs in Automated Logging Generation: An Empirical Study
arxiv(2023)
摘要
Automated logging statement generation supports developers in documenting
critical software runtime behavior. Given the great success in natural language
generation and programming language comprehension, large language models (LLMs)
might help developers generate logging statements, but this has not yet been
investigated. To fill the gap, this paper performs the first study on exploring
LLMs for logging statement generation.We first build a logging statement
generation dataset, LogBench, with two parts: (1) LogBench-O: logging
statements collected from GitHub repositories, and (2) LogBench-T: the
transformed unseen code from LogBench-O. Then, we leverage LogBench to evaluate
the effectiveness and generalization capabilities (using LogBench-T) of eleven
top-performing LLMs. In addition, we examine the performance of these LLMs
against classical retrieval-based and machine learning-based logging methods
from the era preceding LLMs. We further evaluate LLM's logging generalization
capabilities using unseen data (LogBench-T) derived from code transformation
techniques. While existing LLMs deliver decent predictions on logging levels
and logging variables, our study indicates that they only achieve a maximum
BLEU score of 0.249, thus calling for improvements. The paper also highlights
the importance of prompt constructions and external factors (e.g., programming
contexts and code comments) for LLMs' logging performance. Based on these
findings, we identify five implications and provide practical advice for future
logging research. Our empirical analysis discloses the limitations of current
logging approaches while showcasing the potential of LLM-based logging tools,
and provides actionable guidance for building more practical models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要