TY - GEN
T1 - PASS
T2 - 33rd International Conference on Field-Programmable Logic and Applications
AU - Montgomerie-Corcoran, Alexander
AU - Yu, Zhewen
AU - Cheng, Jianyi
AU - Bouganis, Christos Savvas
PY - 2023/11/2
Y1 - 2023/11/2
N2 - With the ever-growing popularity of Artificial Intelligence, there is an increasing demand for more performant and efficient underlying hardware. Convolutional Neural Networks (CNN) are a workload of particular importance, which achieve high accuracy in computer vision applications. Inside CNNs, a significant number of the post-activation values are zero, resulting in many redundant computations. Recent works have explored this post-activation sparsity on instruction-based CNN accelerators but not on streaming CNN accelerators, despite the fact that streaming architectures are considered the leading design methodology in terms of performance. In this paper, we highlight the challenges associated with exploiting post-activation sparsity for performance gains in streaming CNN accelerators, and demonstrate our approach to address them. Using a set of modern CNN benchmarks, our streaming sparse accelerators achieve 1.41 x to 1.93 x efficiency (GOP/sDSP) compared to state-of-the-art instruction-based sparse accelerators.
AB - With the ever-growing popularity of Artificial Intelligence, there is an increasing demand for more performant and efficient underlying hardware. Convolutional Neural Networks (CNN) are a workload of particular importance, which achieve high accuracy in computer vision applications. Inside CNNs, a significant number of the post-activation values are zero, resulting in many redundant computations. Recent works have explored this post-activation sparsity on instruction-based CNN accelerators but not on streaming CNN accelerators, despite the fact that streaming architectures are considered the leading design methodology in terms of performance. In this paper, we highlight the challenges associated with exploiting post-activation sparsity for performance gains in streaming CNN accelerators, and demonstrate our approach to address them. Using a set of modern CNN benchmarks, our streaming sparse accelerators achieve 1.41 x to 1.93 x efficiency (GOP/sDSP) compared to state-of-the-art instruction-based sparse accelerators.
UR - https://www.scopus.com/pages/publications/85178522422
U2 - 10.1109/FPL60245.2023.00049
DO - 10.1109/FPL60245.2023.00049
M3 - Conference contribution
AN - SCOPUS:85178522422
SN - 9798350341522
T3 - Proceedings of the International Conference on Field-Programmable Logic and Applications
SP - 288
EP - 293
BT - 2023 33rd International Conference on Field-Programmable Logic and Applications
A2 - Sourdis, Ioannis
A2 - Mentens, Nele
A2 - Sousa, Leonel
A2 - Trancoso, Pedro
PB - Institute of Electrical and Electronics Engineers
Y2 - 4 September 2023 through 8 September 2023
ER -