谷歌浏览器插件
订阅小程序
在清言上使用

The WMDP Benchmark: Measuring and Reducing Malicious Use with Unlearning

Nathaniel Li,Alexander Pan,Anjali Gopal,Summer Yue, Daniel Berrios,Alice Gatti, Justin Li,Ann-Kathrin Dombrowski,Shashwat Goel,Gabriel Mukobi, Nathan Helm-Burger,Rassin Lababidi, Lennart Justen,Andrew Liu, Michael Chen, Isabelle Barrass,Oliver Zhang, Xiaoyuan Zhu,Rishub Tamirisa, Bhrugu Bharathi, Ariel Herbert-Voss, Cort Breuer,Andy Zou,Mantas Mazeika,Zifan Wang, Palash Oswal,Weiran Lin, Adam Hunt, Justin Tienken-Harder, Kevin Shih, Kemper Talley, John Guan, Ian Steneker, David Campbell, Brad Jokubaitis,Steven Basart,Stephen Fitz,Ponnurangam Kumaraguru,Kallol Karmakar,Uday Tupakula,Vijay Varadharajan,Yan Shoshitaishvili,Jimmy Ba,Kevin Esvelt, Alexandr Wang,Dan Hendrycks

ICML(2024)

引用 0|浏览89
暂无评分
关键词
Intrusion Detection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要