Skip to main navigation Skip to search Skip to main content

A Selective Quantization Tuner for ONNX Models

Nick Louloudakis, Ajitha Rajan

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Quantization reduces the precision of deep neural networks to lower model size and computational demands, but often at the expense of accuracy. Fully quantized models can suffer significant accuracy degradation, and resource-constrained hardware accelerators may not support all quantized operations. A common workaround is selective quantization, where only some layers are quantized while others remain at full precision. However, determining the optimal balance between accuracy and efficiency is a challenging task. To this direction, we propose SeQTO, a framework that enables selective quantization, deployment, and execution of ONNX models on diverse CPU and GPU devices, combined with profiling and multi-objective optimization. SeQTO generates selectively quantized models, deploys them across hardware accelerators, evaluates performance on metrics such as accuracy and size, applies Pareto Front-based objective minimization to identify optimal candidates, and provides visualization of results. We evaluated SeQTO on four ONNX models under two quantization settings across CPU and GPU devices. Our results show that SeQTO effectively identifies high-quality selectively quantized models, achieving up to 54.14% lower accuracy loss while maintaining up to 98.18% of size reduction compared to fully quantized models.
Original languageEnglish
Title of host publicationProceedings of Proceedings of the 48th International Conference on Software Engineering
PublisherAssociation for Computing Machinery
Number of pages5
Publication statusAccepted/In press - 1 Dec 2025
EventThe 48th IEEE/ACM International Conference on Software Engineering - Rio de Janeiro, Brazil
Duration: 12 Apr 202618 Apr 2026
Conference number: 48
https://conf.researchr.org/home/icse-2026

Publication series

NameIEEE/ACM International Conference on Software Engineering
PublisherACM
ISSN (Print)0270-5257
ISSN (Electronic)1558-1225

Conference

ConferenceThe 48th IEEE/ACM International Conference on Software Engineering
Abbreviated titleICSE 2026
Country/TerritoryBrazil
CityRio de Janeiro
Period12/04/2618/04/26
Internet address

Fingerprint

Dive into the research topics of 'A Selective Quantization Tuner for ONNX Models'. Together they form a unique fingerprint.

Cite this