Abstract
Quantization reduces the precision of deep neural networks to lower model size and computational demands, but often at the expense of accuracy. Fully quantized models can suffer significant accuracy degradation, and resource-constrained hardware accelerators may not support all quantized operations. A common workaround is selective quantization, where only some layers are quantized while others remain at full precision. However, determining the optimal balance between accuracy and efficiency is a challenging task. To this direction, we propose SeQTO, a framework that enables selective quantization, deployment, and execution of ONNX models on diverse CPU and GPU devices, combined with profiling and multi-objective optimization. SeQTO generates selectively quantized models, deploys them across hardware accelerators, evaluates performance on metrics such as accuracy and size, applies Pareto Front-based objective minimization to identify optimal candidates, and provides visualization of results. We evaluated SeQTO on four ONNX models under two quantization settings across CPU and GPU devices. Our results show that SeQTO effectively identifies high-quality selectively quantized models, achieving up to 54.14% lower accuracy loss while maintaining up to 98.18% of size reduction compared to fully quantized models.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of Proceedings of the 48th International Conference on Software Engineering |
| Publisher | Association for Computing Machinery |
| Number of pages | 5 |
| Publication status | Accepted/In press - 1 Dec 2025 |
| Event | The 48th IEEE/ACM International Conference on Software Engineering - Rio de Janeiro, Brazil Duration: 12 Apr 2026 → 18 Apr 2026 Conference number: 48 https://conf.researchr.org/home/icse-2026 |
Publication series
| Name | IEEE/ACM International Conference on Software Engineering |
|---|---|
| Publisher | ACM |
| ISSN (Print) | 0270-5257 |
| ISSN (Electronic) | 1558-1225 |
Conference
| Conference | The 48th IEEE/ACM International Conference on Software Engineering |
|---|---|
| Abbreviated title | ICSE 2026 |
| Country/Territory | Brazil |
| City | Rio de Janeiro |
| Period | 12/04/26 → 18/04/26 |
| Internet address |
Fingerprint
Dive into the research topics of 'A Selective Quantization Tuner for ONNX Models'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver