Predicting the replicability of social and behavioural science claims from the COVID-19 Preprint Replication Project with structured expert and novice groups

Alexandru Marcoci,David Peter Wilkinson, Anna Abatayo,Ernest Baskin, Henk Berkman,Erin Michelle Buchanan, Sara Capitán, Tabaré Capitán, Ginny Chan, Kent Jason Go Cheng,Tom Coupe,Sarah Dryhurst,Jianhua Duan,John Edlund,Timothy M. Errington,Anna Fedor,Fiona Fidler, James Field,Nicholas William Fox,Hannah Fraser,Alexandra L.J. Freeman,Anca Hanea,Felix Holzmeister,Sanghyun Hong, Raquel Huggins, Nick Huntington-Klein,Magnus Johannesson, Angela Jones,Hansika Kapoor,John R Kerr,Melissa Kline Struhl,Marta Kolczynska,Yang Liu,Zachary Loomas, Brianna Luis, Esteban Méndez, Olivia Miske, Carolin Nast,Brian A. Nosek, Elan Simon Parsons,Thomas Pfeiffer,W. Robert Reed,Jon Roozenbeek, Alexa R. Schlyfestone,Claudia R. Schneider, Andrew Soh,Anirudh Tagat, Melba Tutor, Andrew Tyner, Karolina Urbanska,Sander van der Linden,Ans Vercammen,Bonnie Wintle

crossref（2023）

引用 0|浏览12

暂无评分

摘要

Replication is an important “credibility control” mechanism for clarifying the reliability of published findings. However, replication is costly, and it is infeasible to replicate everything. Accurate, fast, lower cost alternatives such as eliciting predictions from experts or novices could accelerate credibility assessment and improve allocation of replication resources for important and uncertain findings. We elicited judgments from experts and novices on 100 claims from preprints about an emerging area of research (COVID-19 pandemic) using a new interactive structured elicitation protocol and we conducted 35 new replications. Participants’ average estimates were similar to the observed replication rate of 60%. After interacting with their peers, novices updated both their estimates and confidence in their judgements significantly more than experts and their accuracy improved more between elicitation rounds. Experts’ average accuracy was 0.54 (95% CI: [0.454, 0.628]) after interaction and they correctly classified 55% of claims; novices’ average accuracy was 0.55 (95% CI: [0.455, 0.628]), correctly classifying 61% of claims. The difference in accuracy between experts and novices was not significant and their judgments on the full set of claims were strongly correlated (r=.48). These results are consistent with prior investigations eliciting predictions about the replicability of published findings in established areas of research and suggest that expertise may not be required for credibility assessment of some research findings.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要