Application and system support for reconfigurable coprocessors in multicore devices

Application and system support for reconfigurable coprocessors in multicore devices（2012）

引用 23|浏览5

暂无评分

摘要

Embedded multicore devices often require high performance with minimal power consumption; many systems use dedicated hardware units to meet these constraints. However, embedded systems have also become increasingly multi-purpose and must be able to execute a wide range of applications—some of which might not yet be known at design time. It is therefore difficult to choose an appropriate mix of dedicated hardware that meets a device's size, cost, and capability constraints. A reconfigurable hardware (RH) coprocessor is a potential solution, as it is highly effective at accelerating a variety of different tasks (which need not necessarily be known in advance), and does so using less energy than general-purpose processors. In this thesis, I propose a reconfigurable computing system-on-chip that combines general-purpose processor core(s) with a reconfigurable coprocessor. Applications executing on this system use the RH to accelerate commonly-executed functions. In this thesis, I first describe the communication model used between the processor(s) and RH coprocessor. I then describe the programming interface applications use to access the RH, and show that my model allows applications to securely access the RH coprocessor without requiring operating system intervention—greatly reducing the overhead of using the coprocessor. Because of this, my RH coprocessor can even accelerate tasks (or kernels of an application) whose execution time (when running in software) is measured in hundreds of cycles. After establishing the platform, I examine how my proposed system performs, and propose extensions to the system to further improve system performance. In this thesis, I will demonstrate that, when using my coprocessor memory interface, workloads executing across eight processor cores and the shared RH fabric perform ∼95% as well as they would on an idealized system where the coprocessor has zero-cycle access to shared memory. Additionally, I examine the impact hybrid RH/software applications have on software-only applications, and propose a mechanism that prevents streaming RH applications from polluting shared levels of the system's cache; this simple modification improved the performance of software-only applications by up to 32%. I also examine the behavior of software-only applications coscheduled alongside hybrid RH/software applications on simultaneous multithreaded processors, showing that they perform up to ∼95% as fast as they do when a multicore system executes the two applications. This is much faster than two software applications can run when coscheduled together, but not as fast as a multicore machine because the hybrid application still requires CPU resources to execute, slowing down the coscheduled software-only application Finally, I examine methods that allow multicore RH systems to better utilize RH resources, allowing systems with limited RH resources to perform nearly as well as systems containing more RH resources. I first show that hybrid applications that call the same RH kernel can better utilize the RH by sharing the configured resources. On eight-processor systems executing eight copies of the same applications, workloads that shared configured RH kernels performed 97.4% as well as systems that did not, despite the fact that shared systems required ∼⅛th of the RH resources. I also examined a modified RH kernel scheduling algorithm that periodically determines which RH kernels should be loaded on the RH at any given time. This new scheduling algorithm could better select which RH kernels should be configured on multicore systems. I show that this new scheduler always performs as good, or better than the previous scheduler, and in extreme cases can result in RH allocations that improve system performance by over 2x. In this thesis, I examine many of the design choices involved in creating a multicore RH computing system, and examine how a modern operating system should present the RH resources to user applications. I then demonstrate that such a system provides the performance required in next-generation computing application, while providing the programmability and flexibility to accelerate many different application domains, and even offer performance improvements to applications not considered when the chip was first fabricated. By doing this, embedded systems manufacturers can make faster, more capable products that consume less energy. Additionally, the hardware in these new devices will be able to adapt to new applications that are created after the device has shipped, allowing all applications to be accelerated by the processor, and not just the applications that the processor was optimized for.

查看译文

关键词

RH resource,modified RH kernel scheduling,RH coprocessor,system support,RH fabric,hybrid RH,multicore device,multicore RH computing system,RH kernel,RH application,limited RH resource,RH allocation,reconfigurable coprocessors

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要