Learning lexical features of programming languages from imagery using convolutional neural networks.

Jordan Ott,Abigail Atchison,Paul Harnack,Natalie Best,Haley Anderson,Cristiano Firmani,Erik Linstead

ICPC（2018）

引用 24|浏览33

暂无评分

摘要

We demonstrate the ability of deep architectures, specifically convolutional neural networks, to learn and differentiate the lexical features of different programming languages presented in coding video tutorials found on the Internet. We analyze over 17,000 video frames containing examples of Java, Python, and other textual and non-textual objects. Our results indicate that not only can computer vision models based on deep architectures be taught to differentiate among programming languages with over 98% accuracy, but can learn language-specific lexical features in the process. This provides a powerful mechanism for carrying out program comprehension research on repositories where source code is represented with imagery rather than text, while simultaneously avoiding the computational overhead of optical character recognition.

查看译文

关键词

deep learning, convolutional neural networks, program syntax

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要