This project investigated new ways to design microprocessors. These focused on ways to automate the design process in order to automatically evaluate many possible designs before selecting the design that best meets the needs of the application. The project tackled both software and hardware aspects of the design process, and in doing so developed a new microprocessor (EnCore) and a new high-speed microprocessor simulator (ArcSim). Both of these new technologies were transferred to industry during, or shortly after completion of, the project.
Embedded computers are used in a wide range of applications from automotive control to portable media and communications devices. These are increasingly called upon to perform computationally intensive tasks which a few years ago would be considered "supercomputing". Many embedded systems are portable or battery-powered and must
therefore rely on a limited energy source. Designers of embedded systems are hence called upon to optimise the hardware and software so that it operates at high speed whilst consuming very little power.
In this project we sought to automate the design and optimisation of embedded systems. This involved research in a number of complementary areas including: new ways of extending the design of a microprocessor to add application-specific instructions; high-speed simulation and modelling techniques to enable performance estimation for automated design decision-making; new compilation techniques to map software onto our new configurable and extendable processors; and the application of machine-learning based techniques in both design-space exploration and high-speed
modelling and simulation.
This project developed new algorithms for synthesizing the hardware of an extended processor in two ways; first, by selecting and mapping new application-specific instructions to a Configurable Flow Accelerator; and secondly by exposing the vast design space created during the critical process of hardware resource sharing. The project also developed novel learning-based approaches to navigating through the design space of possible hardware solutions. This learning-based approach allowed the hardware design tools to build a statistical model of how best to merge the logical implementations of the many fragments of new hardware created when new instructions are added to a processor. By learning about the design space through the creation of a statistical model, the search for the best solution was shown to be reduced by a
factor of 200. This reduces the potential design cost from several weeks to just a few minutes of compute time. One of the stated goals of the project was to create a silicon implementation of a microprocessor embodying the new ideas developed during the project. To that end we developed a low-power embedded microprocessor called EnCore, and produced an initial silicon test chip in November 2008. This used the UMC 130nm high-speed CMOS process, which we accessed via Europractice.
The first chip, codenamed Calton, was fully functional, occupied less than 1 mm2 of silicon, ran at up to 375 MHz and consumed 97 μW/MHz of power. We followed that with a second device a few months later, to test a revised hardware design flow. This used the same 130nm silicon process, and was again fully functional. Towards the end of the project we designed and fabricated a 90nm chip, codenamed Castle, which contained a more powerful EnCore processor combined with a synthetic Configurable Flow Accelerator. This device was again fully functional, ran at up to 600 MHz, occupied around 1.7 mm2 of silicon, and consumed 125 μW/MHz of power. The Configurable Flow Accelerator contained a configurable data-path with 4 multipliers, 4 shifters and 4 adders, all of which could be combined into a single powerful instruction. We researched a range of compilation techniques for mapping application code to this configurable processor, and were able to run code generated by our own compiler on the EnCore
processor. This research project also innovated significantly in the use of statistical machine learning to create accurate performance models for embedded microcessors. This was coupled with the development of a very high speed full-system simulator,
based on parallel just-in-time (JIT) dynamic binary translation. This allows designers to predict the performance of software running on hardware that may not yet exist, and even to try various alternative hardware configurations before committing to the design.