ChatGPT: Understanding Code Syntax and Semantics

Wei Ma,Shangqing Liu,Wenhan Wang,Qiang Hu,Ye Liu,Cen Zhang,Liming Nie,Yang Liu

arXiv (Cornell University)（2023）

引用 0|浏览23

暂无评分

摘要

ChatGPT demonstrates significant potential to revolutionize software engineering (SE) by exhibiting outstanding performance in SE tasks such as code and document generation. However, the high reliability and risk control requirements in software engineering raise concerns about the lack of interpretability of ChatGPT. To address this concern, we conducted a study to evaluate the capabilities of ChatGPT and its limitations for code analysis in SE. We break down the abilities needed for artificial intelligence (AI) models to address SE tasks related to code analysis into three categories:1) syntax understanding, 2) static behavior understanding, and 3) dynamic behavior understanding. Our investigation focused on the ability of ChatGPT to comprehend code syntax and semantic structures, which include abstract syntax trees (AST), control flow graphs (CFG), and call graphs (CG). We assessed the performance of ChatGPT on cross-language tasks involving C, Java, Python, and Solidity. Our findings revealed that while ChatGPT has a talent for understanding code syntax, it struggles with comprehending code semantics, particularly dynamic semantics. We conclude that ChatGPT possesses capabilities similar to an Abstract Syntax Tree (AST) parser, demonstrating initial competencies in static code analysis. Furthermore, our study highlights that ChatGPT is susceptible to hallucinations when interpreting code semantic structures and fabricating nonexistent facts. These results indicate the need to explore methods to verify the correctness of ChatGPT output to ensure its dependability in SE. More importantly, our study provides an initial answer to why the codes generated by LLM are usually syntax correct but vulnerable.

查看译文

关键词

chatgpt,software engineering

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要