Exploiting replicated checkpoints for soft error detection and correction

Fahrettin Koc, Kenan Bozdas, Burak Karsli, Oguz Ergin

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Register renaming is a widely used technique to remove false dependencies in contemporary superscalar microprocessors. A register alias table (RAT) is formed to hold current locations of the values that correspond to the architectural registers. Some recently designed processors take a copy of the rename table at each branch instruction, in order to recover its contents when a misspeculation occurs. In this paper first we investigate the RAT vulnerability against transient errors. Then we analyze the vulnerability of RAT checkpoints and propose two techniques for soft error detection and correction utilizing redundantly taken copies of the entries whose content is the same with the previous and/or next checkpoints. Simulation results of the spec 2006 benchmarks reveal that on the average RAT vulnerability is 25% and checkpoint vulnerability is 6%. Results also reveal that redundancy exists at sequential checkpoint copies and can be used for error detection and correction purposes. We propose techniques that exploit this redundancy and show that faults in 41% of all checkpoints and 44% of rolled-back checkpoints can be detected and errors in 33% of the rolled-back checkpoints can be corrected. Since we exploit the already available storage, proposed error detection and correction techniques can be implemented with minimal hardware overhead.
Original languageEnglish
Title of host publicationDesign, Automation and Test in Europe, DATE 13, Grenoble, France, March 18-22, 2013
PublisherInstitute of Electrical and Electronics Engineers (IEEE)
Pages1494-1497
Number of pages4
ISBN (Print)978-1-4673-5071-6
DOIs
Publication statusPublished - 2013

Fingerprint

Dive into the research topics of 'Exploiting replicated checkpoints for soft error detection and correction'. Together they form a unique fingerprint.

Cite this