Partial Credit Grading of DFAs: Automation vs Human Graders.

SIGCSE (2)(2023)

引用 0|浏览0
暂无评分
摘要
We examined the efficacy of automatic partial credit approaches for assignments asking students to construct a Deterministic Finite Automaton (DFA) for a given language. We chose two DFA problems, and generated a representative sample of 10 benchmark submissions for each. Next, in order to get an accurate baseline of the results of human graders, we asked professors at our university to submit their grader guides to us. We found that the grader guides, at least within our institution, were very consistent but also quite problem-specific and reliant on human understanding, hence unlikely to lead to an automated process applicable to all DFA problems. We generated a "consensus grader guide'' and graded each benchmark submission, obtaining a baseline human partial credit score. Then, we assessed the submissions using three techniques proposed by Alur et al.: The Solution Syntactic Difference (SSD) technique's score corresponds to the number of changes that must be made to the DFA. The Problem Syntactic Difference (PSyD) score is based on converting each DFA into Monadic Second Order (MSO) Logic and examining the number of necessary changes. For Problem Semantic Difference (PSeD), the score is the limit of the ratio of incorrect strings to correct strings. The final score is the maximum of these three scores. In general, the results closely matched the consensus grades, but there were some peculiarities generated by PSeD. Additionally, for each problem, one submission included two separate types of mistakes. These submissions had automatic grades much lower than the consensus grades.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要