BigDocs: an Open and Permissively-Licensed Dataset for Training Multimodal Models on Document and Code Tasks
Juan Rodriguez, Xiangru Jian, Siba Smarak Panigrahi, Tianyu Zhang, Aarash Feizi, Abhay Puri, Akshay Kalkunte,François Savard,Ahmed Masry,Shravan Nayak,Rabiul Awal, Mahsa Massoud, Amirhossein Abaskohi,Zichao Li,Suyuchen Wang, Pierre-André Noël,Mats Leon Richter, Saverio Vadacchino, Shubbam Agarwal, Sanket Biswas, Sara Shanian, Ying Zhang, Noah Bolger, Kurt MacDonald, Simon Fauvel, Sathwik Tejaswi,Srinivas Sunkara, Joao Monteiro, Krishnamurthy DJ Dvijotham,Torsten Scholak,Nicolas Chapados, Sepideh Kharagani, Sean Hughes, M. Özsu,Siva Reddy,Marco Pedersoli,Yoshua Bengio,Christopher Pal,Issam Laradji, Spandanna Gella,Perouz Taslakian,David Vazquez,Sai Rajeswar arxiv(2024)
AI 理解论文
溯源树
样例