USC-DCT: A Collection of Diverse Classification Tasks

Adam M. Jones, Gozde Sahin, Zachary W. Murdock,Yunhao Ge, Ao Xu,Yuecheng Li, Di Wu,Shuo Ni, Po-Hsuan Huang,Kiran Lekkala,Laurent Itti

Data(2023)

引用 0|浏览13
暂无评分
摘要
Machine learning is a crucial tool for both academic and real-world applications. Classification problems are often used as the preferred showcase in this space, which has led to a wide variety of datasets being collected and utilized for a myriad of applications. Unfortunately, there is very little standardization in how these datasets are collected, processed, and disseminated. As new learning paradigms like lifelong or meta-learning become more popular, the demand for merging tasks for at-scale evaluation of algorithms has also increased. This paper provides a methodology for processing and cleaning datasets that can be applied to existing or new classification tasks as well as implements these practices in a collection of diverse classification tasks called USC-DCT. Constructed using 107 classification tasks collected from the internet, this collection provides a transparent and standardized pipeline that can be useful for many different applications and frameworks. While there are currently 107 tasks, USC-DCT is designed to enable future growth. Additional discussion provides explanations of applications in machine learning paradigms such as transfer, lifelong, or meta-learning, how revisions to the collection will be handled, and further tips for curating and using classification tasks at this scale.Dataset: https://github.com/iLab-USC/USC-DCTDataset License: CC-BY-NC
更多
查看译文
关键词
machine learning,data sharing,classification,computer vision,visual classification,dataset collection,dataset organization,data cleaning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要