Towards A Platform and Benchmark Suite for Model Training on Dynamic Datasets

EuroMLSys '23: Proceedings of the 3rd Workshop on Machine Learning and Systems(2023)

引用 0|浏览14
暂无评分
摘要
Machine learning (ML) is often applied in use cases where training data evolves and/or grows over time. Training must incorporate data changes for high model quality, however this is often challenging and expensive due to large datasets and models. In contrast, ML researchers often train and evaluate ML models on static datasets or with artificial assumptions about data dynamics. This gap between research and practice is largely due to (i) the absence of an open-source platform that manages dynamic datasets at scale and supports pluggable policies for when and what data to train on, and (ii) the lack of representative open-source benchmarks for ML training on dynamic datasets. To address this gap, we propose to design a platform that enables ML researchers and practitioners to explore training and data selection policies, while alleviating the burdens of managing large dynamic datasets and orchestrating recurring training jobs. We also propose to build an accompanying benchmark suite that integrates public dynamic datasets and ML models from a variety of representative use cases.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要