Fast and Correct Load-Link/Store-Conditional Instruction Handling in DBT Systems

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems(2020)

引用 7|浏览54
暂无评分
摘要
Dynamic binary translation (DBT) requires the implementation of load-link/store-conditional (LL/SC) primitives for guest systems that rely on this form of synchronization. When targeting, e.g., ×86 host systems, LL/SC guest instructions are typically emulated using atomic compare-and-swap (CAS) instructions on the host. Whilst this direct mapping is efficient, this approach is problematic due to subtle differences between LL/SC and CAS semantics. In this article, we demonstrate that this is a real problem, and we provide code examples that fail to execute correctly on QEMU and a commercial DBT system, which both use the CAS approach to LL/SC emulation. We then develop two novel and provably correct LL/SC emulation schemes: 1) a purely software-based scheme, which uses the DBT system's page translation cache for correctly selecting between fast, but unsynchronized, and slow, but fully synchronized memory accesses and 2) a hardware-accelerated scheme that leverages hardware transactional memory (HTM) provided by the host. We have implemented these two schemes in the Synopsys DesignWare ARC nSIM DBT system, and we evaluate our implementations against full applications, and targeted microbenchmarks. We demonstrate that our novel schemes are not only correct but also deliver competitive performance on-par or better than the widely used, but broken CAS scheme.
更多
查看译文
关键词
Parallel architectures,platform virtualization
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要