Naive Parallelization of Coordinate Descent Methods and an Application on Multi-core L1-regularized Classification.

CIKM(2018)

引用 12|浏览165
暂无评分
摘要
It is well known that a direct parallelization of sequential optimization methods (e.g., coordinate descent and stochastic gradient methods) is often not effective. The reason is that at each iteration, the number of operations may be too small. We point out that this common understanding may not be true if the algorithm sequentially accesses the data in a feature-wise manner. For almost all real-world sparse sets we have examined, some features are much denser than others. Thus a direct parallelization of loops in a sequential method may result in excellent speedup. This approach possesses an advantage of retaining all convergence results because the algorithm is not changed at all. We apply this idea on coordinate descent (CD) methods, which are effective single-thread technique for L1-regularized classification. Further, an investigation on the shrinking technique commonly used to remove some features in the training process shows that this technique helps the parallelization of CD methods. Experiments indicate that a naive parallelization achieves better speedup than existing methods that laboriously modify the algorithm to achieve parallelism. Though a bit ironic, we conclude that the naive parallelization of the CD method is a highly competitive and robust multi-core implementation for L1-regularized classification.
更多
查看译文
关键词
Parallelization, Coordinate Descent Methods, Multi-core, L1-regularized Classification
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要