Edinburgh Research Explorer

Generalized just-in-time trace compilation using a parallel task farm in a dynamic binary translator

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Related Edinburgh Organisations

Open Access permissions

Open

Documents

http://dl.acm.org/citation.cfm?doid=1993498.1993508
Original languageEnglish
Title of host publicationProceedings of the 32nd ACM SIGPLAN conference on Programming Language Design and Implementation (PLDI '11)
Place of PublicationNew York, NY, USA
PublisherACM
Pages74-85
Number of pages12
ISBN (Print)978-1-4503-0663-8
DOIs
StatePublished - 2011

Abstract

Dynamic Binary Translation (DBT) is the key technology behind cross-platform virtualization and allows software compiled for one Instruction Set Architecture (ISA) to be executed on a processor supporting a different ISA. Under the hood, DBT is typically implemented using Just-In-Time (JIT) compilation of frequently executed program regions, also called traces. The main challenge is translating frequently executed program regions as fast as possible into highly efficient native code. As time for JIT compilation adds to the overall execution time, the JIT compiler is often decoupled and operates in a separate thread independent from the main simulation loop to reduce the overhead of JIT compilation. In this paper we present two innovative contributions. The first contribution is a generalized trace compilation approach that considers all frequently executed paths in a program for JIT compilation, as opposed to previous approaches where trace compilation is restricted to paths through loops. The second contribution reduces JIT compilation cost by compiling several hot traces in a concurrent task farm. Altogether we combine generalized light-weight tracing, large translation units, parallel JIT compilation and dynamic work scheduling to ensure timely and efficient processing of hot traces. We have evaluated our industry-strength, LLVM-based parallel DBT implementing the ARCompact ISA against three benchmark suites (EEMBC, BioPerf and SPEC CPU2006) and demonstrate speedups of up to 2.08 on a standard quad-core Intel Xeon machine. Across short- and long-running benchmarks our scheme is robust and never results in a slowdown. In fact, using four processors total execution time can be reduced by on average 11.5% over state-of-the-art decoupled, parallel (or asynchronous) JIT compilation.

    Research areas

  • dynamic binary translation, dynamic work scheduling, just-in-time compilation, parallelization, task farm

Download statistics

No data available

ID: 695305