Abstract / Description of output
Dynamic Binary Translation (DBT) requires the implementation of load-link/store-conditional (LL/SC) primitives for guest systems that rely on this form of synchronization. When targeting e.g. x86 host systems, LL/SC guest instructions are typically emulated using atomic Compare-and-Swap (CAS) instructions on the host. Whilst this direct mapping is efficient, this approach is problematic due to subtle differences between LL/SC and CAS semantics. In this paper, we demonstrate that this is a real problem, and we provide code examples that fail to execute correctly on QEMU and a commercial DBT system, which both use the CAS approach to LL/SC emulation. We then develop two novel and provably correct LL/SC emulation schemes: (1) A purely software based scheme, which uses the DBT system’s page translation cache for correctly selecting between fast, but unsynchronized, and slow, but fully synchronized memory accesses, and (2) a hardware accelerated scheme that leverages hardware transactional memory (HTM) provided by the host. We have implemented these two schemes in the Synopsys DesignWare® ARC® nSIM DBT system, and we evaluate our implementations against full applications, and targeted micro-benchmarks. We demonstrate that our novel schemes are not only correct, but also deliver competitive performance on-par or better than the widely used, but broken CAS scheme.
Original language | English |
---|---|
Pages (from-to) | 3544 - 3554 |
Number of pages | 11 |
Journal | IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems |
Volume | 39 |
Issue number | 11 |
Early online date | 2 Oct 2020 |
DOIs | |
Publication status | Published - 1 Nov 2020 |
Event | 2020 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems - Duration: 20 Sept 2020 → 25 Sept 2020 http://esweek.hosting2.acm.org/cases/ |
Keywords / Materials (for Non-textual outputs)
- Parallel architectures
- platform virtualization