Assessing AI capabilities with education tests

Mila Staneva, Abel Baret, Ángel Aso-Mollar, Joseph A. Blass, Salvador Carrión Ponz,Vincent Conitzer,Ulises Cortés, Pradeep Dasigi, Alderico Rodrigues de Paula, Carlos Galindo, Janice D. Gobert, Jordi Gonzàlez,Fredrik Heintz,James A. Hendler, Daniel Hendrycks,Lawrence Hunter,Juan Izquierdo-Domenech, Maria Juárez, Adriana Frias, Aviv Keren, Rik Koncel-Kedziorski,David Leake,Bao Sheng Loe,Fernando Martínez-Plumed,Aqueasha Martin-Hammond,Cynthia Matuszek, A. Gascon, J.A. Moreno, Constantine Nakos, Taylor Olson,Carolyn Penstein Rosé, Armen Sarvazyan,Brian Scassellati,Wout Schellaert,Claes Strannegård,Neşet Özkan Tan, Tadahiro Taniguchi,Karina Vold,Michael Wooldridge

Educational research and innovation（2023）

引用 0|浏览33

暂无评分

摘要

This chapter introduces three exploratory studies that assessed the capabilities of artificial intelligence (AI) through standardised education tests designed for humans. The first two studies, conducted in 2016 and 2021/22, asked experts to evaluate AI’s performance on the literacy and numeracy tests of the OECD’s Survey of Adult Skills (PIAAC). The third study collected expert judgements of whether AI can solve science questions from the OECD's Programme for International Student Assessment (PISA). The studies aimed to refine the assessment framework for eliciting expert knowledge on AI using established educational assessments. They explored different test formats, response methodologies and rating instructions, along with two distinct assessment approaches. A “behavioural approach” used in the PIAAC studies emphasised smaller expert groups engaging in discussions, and a "mathematical approach" adopted in the PISA study relied more heavily on quantitative data from a larger expert pool. This chapter presents the results of the studies and discusses the advantages and disadvantages of their methodological approaches.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要