Abstract
Machine learning algorithms are compute- and memory-intensive. Their execution at the edge on resource-constrained embedded systems is challenging. Data quantization, i.e. data bit-width reduction, contributes to reducing de-facto the memory bandwidth requirement. In order to best exploit this bit-width reduction, a prevailing approach consists of tailored hardware accelerators. Another approach relies on general-purpose compute units with Single Instruction Multiple Data (SIMD) support for reduced data bit-width precision, as in ARM Cortex-M [1] or RISC-V based RI5CY [2] processors. However, such processors only handle a few predefined bit-width ranges, e.g. 8-bit and 16-bit only for the ARM SIMD.This paper proposes a flexible architecture of Multiply-and-Accumulate (MAC) unit allowing asymmetric multiplication for operand sizes in powers of 2, up to 32 bits. The synthesis of this architecture in 28nm FD-SOI technology shows 10% and 25% reduction in area and dynamic power respectively, compared to the RI5CY MAC unit. From the energy-efficiency point of view, up to 50% improvements are achieved.
Original language | English |
---|---|
Title of host publication | 2021 19th IEEE International New Circuits and Systems Conference (NEWCAS) |
Publisher | Institute of Electrical and Electronics Engineers |
Number of pages | 4 |
ISBN (Electronic) | 978-1-6654-2429-5 |
ISBN (Print) | 978-1-6654-2430-1 |
DOIs | |
Publication status | Published - 25 Jun 2021 |
Event | 19th IEEE Interregional New Circuits and Systems Conference, 2021 - Online Duration: 13 Jun 2021 → 16 Jun 2021 Conference number: 19 https://newcas2021.univ-tln.fr/ |
Conference
Conference | 19th IEEE Interregional New Circuits and Systems Conference, 2021 |
---|---|
Abbreviated title | NEWCAS 2021 |
Period | 13/06/21 → 16/06/21 |
Internet address |
Keywords / Materials (for Non-textual outputs)
- Multiply-and-Accumulate units
- MAC
- Machine Learning
- Edge-Computing
- Quantization