Efficient and Programmable Machine Learning on Distributed Shared Memory via Static Analysis

semanticscholar(2018)

引用 0|浏览6
暂无评分
摘要
Distributed shared memory (DSM) offers superior performance for applications that perform fine-grain reads and writes to inmemory variables, such as iterative machine learning (ML) training, but presents a great challenge for application developers due to data dependency over shared mutable states. Thus iterative machine learning training is often parallelized by replicating the same computation on different data partitions (a.k.a. data parallelism) ignoring data dependency. Since ML training can often tolerate bounded error [7], such parallelization may still produce a working solution at the cost of additional computation. In many cases, preserving data dependency greatly reduces the computation needed to achieve the same model quality without harming computation throughput. In this paper, we show that such opportunity may be exploited with minimal programmer effort via static analysis. We present a system called Orion that statically parallelizes serial for-loop nests, which read and write distributed shared memory and schedules computation on a distributed cluster, preserving fine-grained data dependency. Orion may ignore certain dependences (given programmer permission), potentially falling back to data parallelism, when preserving all data dependency results in a serial execution. We show that a machine learning training program parallelized by Orion may get a 3.5× speedup compared to a data-parallel implementation based on parameter servers due to preserving data dependency, while enjoying a much more usable programming model.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要