A large percentage of computed results have fewer significant bits compared to the full width of a register. We exploit this fact to pack multiple results into a single physical register to reduce the pressure on the register file in a superscalar processor. Two schemes for dynamically packing multiple "narrow-width" results into partitions within a single register are evaluated. The first scheme is conservative and allocates a full-width register for a computed result. If the computed result turns out to be narrow, the result is reallocated to partitions within a common register, freeing up the full-width register. The second scheme allocates register partitions based on a prediction of the width of the result and reallocates register partitions when the actual result width is higher than what was predicted. If the actual width is narrower than what was predicted, allocated partitions are freed up. A detailed evaluation of our schemes show that average IPC gains of up to 15% can be realized across the SPEC 2000 benchmarks on a somewhat register-constrained datapath.
|Title of host publication||Microarchitecture, 2004. MICRO-37 2004. 37th International Symposium on|
|Publisher||Institute of Electrical and Electronics Engineers (IEEE)|
|Number of pages||12|
|Publication status||Published - 1 Dec 2004|