Rethink Reporting of Evaluation Results in AI
Ryan Burnell,Wout Schellaert,John Burden,Tomer Ullman,Fernando Martínez‐Plumed,Joshua B. Tenenbaum,Danaja Rutar,Lucy G. Cheke,Jascha Sohl‐Dickstein,Melanie Mitchell,Douwe Kiela,Murray Shanahan,Ellen M. Voorhees,Anthony G. Cohn,Joel Z. Leibo,José Hernández‐Orallo Science(2023)
摘要
Aggregate metrics and lack of access to results limit understanding.
更多查看译文
AI 理解论文
溯源树
样例