Towards a SYCL API for Approximate Computing

Lorenzo Carpentieri,Biagio Cosenza

IWOCL '23: Proceedings of the 2023 International Workshop on OpenCL(2023)

引用 0|浏览11
暂无评分
摘要
Approximate computing is a well-known method [7] to achieve higher performance or lower energy consumption while accepting a loss of output accuracy. Many applications such as image processing and neural networks, are tolerant of a certain amount of error, and have the potential for significant improvements in terms of execution time and energy consumption. The most advanced software approximation techniques are mixed precision, which uses a lower precision data representation for both integer and floating point variables [1, 4]; perforation, which skips instruction blocks in a program, iterations in a loop, or data in buffers assuming that nearby data have similar values [2, 5, 6, 8]; and relaxed synchronization which removes synchronization points that represent one of the major bottleneck in parallel applications [3, 9]. These approximate approaches differ in performance achieved and also in error produced. Usually, perforation and synchronization elision have higher performance compared with mixed precision but produce more errors. In particular, synchronization elision introduces non-deterministic errors that are complex to handle. Support for approximate computing is provided by the SYCL heterogeneous programming model often used for developing portable HPC applications. SYCL supports approximate computing by providing a set of built-in functions and data types that can be used to perform approximate operations, such as half-floating-point reductions and bit-level operations. In this technical talk, we present SYprox, a SYCL-based API supporting a broad set of approximation techniques in modern C++. SYprox introduces a set of semantics that extend SYCL’s buffers and accessors to provide a high-level easy-to-use programming API. It supports data perforation and elision patterns for efficient approximation, as well as signal reconstruction algorithms for error mitigation. Figure 1 (a) depicts the accurate execution of an application while Figure 1 (b) shows the approximation process: an input buffer is perforated according to the chosen schema, and the perforated data can be approximated before or after computation using respectively input or output reconstruction. The code snippet below illustrates the accurate version of a SYCL program and our proposed approximate approach using SYprox: Figure 2 shows a visual representation of the schemes on 1D and 2D buffers. Gray components are perforated, whereas blue-colored elements are computed. Schemes (a) and (b) can be applied to 2D buffers and respectively calculate a row and column of results. Also, scheme (c) is applicable to 2D buffers and perforates data following a checkerboard layout. Finally, schema (d) works on 1D buffers and perforates data according to a user-defined skip factor. As applying perforation strategies introduce errors in the final output, the developed library also provides two types of reconstruction techniques to mitigate applications error: output and input reconstruction. Output reconstruction approximates perforated data with an interpolation of the output. Differently, input reconstruction approximates perforated data before computation. In this case, the selected perforation schema defines which data will not be loaded in local memory, while the skipped data will be approximated directly in local memory using interpolation. This approach mixes local memory optimization with perforation, decreasing the number of global memory accesses that represent a bottleneck in GPUs application. Loading data in local memory requires a synchronization point to ensure that all threads in a block have the same view of the local memory. To decrease the time lost during synchronization, SYprox provides a synchronization elision mechanism that defines a way to handle the number of synchronization points. Both input and output reconstructions are based on data interpolation. Figure 3 shows the data reconstruction using three different types of interpolation. For basic interpolation (b) it is necessary that elements to be reconstructed have adjacent elements on both sides. In stencil interpolation (c) we need adjacent elements on all four direction (top, down, left, right). When this requirement is not respected we employ nearest-neighbor interpolation (a) which approximates data with the nearest element. Since the effectiveness of the reconstruction techniques depends on the perforation strategy adopted and the input data distribution, SYprox also provides a simple way to implement an ad-hoc perforation strategy that best fits the characteristics of the given input. In this talk, we show a preliminary performance and error evaluation comparing the base implementation of 3 applications with the approximated version. Performance-wise, all applications have a speedup higher than 2x compared to the accurate version. On the other hand, results show that the error introduced by the approximation is highly dependent on how the perforation strategy and reconstruction technique are combined. Despite this, there is an error of less than 10% for all applications.
更多
查看译文
关键词
sycl api,approximate computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要