Dolma: an Open Corpus of Three Trillion Tokens for Language Model Pretraining Research
Luca Soldaini,Rodney Kinney,Akshita Bhagia,Dustin Schwenk, David Atkinson,Russell Authur,Ben Bogin,Khyathi Chandu, Jennifer Dumas,Yanai Elazar,Valentin Hofmann,Ananya Harsh Jha,Sachin Kumar,Li Lucy,Xinxi Lyu,Nathan Lambert,Ian Magnusson,Jacob Morrison,Niklas Muennighoff,Aakanksha Naik, Crystal Nam,Matthew E. Peters,Abhilasha Ravichander,Kyle Richardson,Zejiang Shen,Emma Strubell,Nishant Subramani,Oyvind Tafjord,Pete Walsh,Luke Zettlemoyer,Noah A. Smith,Hannaneh Hajishirzi,Iz Beltagy,Dirk Groeneveld,Jesse Dodge,Kyle Lo ACL (1)(2024)
AI 理解论文
溯源树
样例