A Class of Fast and Accurate Multi-layer Block Summation and Dot Product Algorithms.

NPC(2021)

引用 0|浏览8
暂无评分
摘要
Basic recursive summation and common dot product algorithm have a backward error bound that grows linearly with the vector dimension. Blanchard [ 1 ] proposed a class of fast and accurate summation and dot product algorithms respectively called FABsum and FABdot, which trades off the calculation accuracy and speed by the block size. Castaldo [ 2 ] proposed a multi-layer block summation and dot product algorithm called SuperBlocksum and SuperBlockdot that can increase the accuracy while adding almost no additional calculations. We combine the idea of [ 1 ] with the multi-layer block structure to propose SuperFABsum (for “super fast and accurate block summation”) and SuperFABdot (for “super fast and accurate block dot product”). Our algorithms have two variants, one is SuperFAB(within), the other is SuperFAB(outside). Our algorithms further improve accuracy and speed compared with FAB and SuperBlock. We conducted accuracy and speed tests on the high-performance FT2000+ processor. Experimental results show that SuperFABdot(within) algorithm is more accurate than FABdot and SuperBlockdot. Compared with FABdot, SuperFABdot(outside) algorithm can achieve up to 1.2 × performance speedup while ensuring similar accuracy.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要