Asynch-SGBDT: Train Stochastic Gradient Boosting Decision Trees in an Asynchronous Parallel Manner

IPDPS(2023)

引用 0|浏览8
暂无评分
摘要
Gradient Boosting Decision Tree (GBDT) is a costly machine learning model. Current parallel GBDT algorithms generally follow a synchronous parallel design: Fork-join parallel manner, like MapReduce. Fork-join parallel manner needs considerable time. Thus, we propose whether synchronization is necessary for GBDT training and is asynchronous training manner efficient. In this paper, we solve the above problem by offering an asynchronous algorithm. We try to build a stochastic optimization problem by sampling, which shares the same output with original GBDT training problem and use asynchronous parallel SGD manner to train Gradient step GBDT. We name our algorithm as asynch-SGBDT. Our theoretical and experimental results indicate that compared with the serial GBDT training process, when the datasets’ high sample diversity is high and using Gradient step training GBDT, asynch-SGBDT does not slow down convergence speed on the epoch, and the sample diversity of current high-dimensional sparse datasets is usually high. We conduct experiments on a 32-node cluster using four different datasets. The results show that with LightGBM using a single worker as the baseline, LightGBM (the state-of-the-art synchronous parallel algorithm implement) on 32 workers achieves 5x-7x speedup, while our asynch-SGBDT on 32 workers increases the speedup to 11x-15x.
更多
查看译文
关键词
asynchronous parallel,GBDT
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要