Toward Improved Deep Learning-based Vulnerability Detection

Adriana Sejfia, Saad Shafiq, Satyaki Das, Nenad Medvidović

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Deep learning (DL) has been a common thread across several recent techniques for vulnerability detection. The rise of large, publicly available datasets of vulnerabilities has fueled the learning process underpinning these techniques. While these datasets help the DL-based vulnerability detectors, they also constrain these detectors’ predictive abilities. Vulnerabilities in these datasets have to be represented in a certain way, e.g., code lines, functions, or program slices within which the vulnerabilities exist. We refer to this representation as a base unit. The detectors learn how base units can be vulnerable and then predict whether other base units are vulnerable. We have hypothesized that this focus on individual base units harms the ability of the detectors to properly detect those vulnerabilities that span multiple base units (or MBU vulnerabilities). For vulnerabilities such as these, a correct detection occurs when all comprising base units are detected as vulnerable. Verifying how existing techniques perform in detecting all parts of a vulnerability is important to establish their effectiveness for other downstream tasks. To evaluate our hypothesis, we conducted a study focusing on three prominent DL-based detectors: ReVeal, DeepWukong, and LineVul. Our study shows that all three detectors contain MBU vulnerabilities in their respective datasets. Further, we observed significant accuracy drops when detecting these types of vulnerabilities. We present our study and a framework that can be used to help DL-based detectors toward the proper inclusion of MBU vulnerabilities.

Original languageEnglish
Title of host publicationICSE 2024 - Proceedings of the 46th IEEE/ACM International Conference on Software Engineering
PublisherIEEE Computer Society Press
Pages1-12
ISBN (Electronic)9798400702174
DOIs
Publication statusPublished - 6 Feb 2024
Event46th International Conference on Software Engineering - Lisbon, Portugal
Duration: 14 Apr 202420 Apr 2024
Conference number: 46
https://conf.researchr.org/home/icse-2024

Publication series

NameProceedings - International Conference on Software Engineering
ISSN (Print)0270-5257

Conference

Conference46th International Conference on Software Engineering
Abbreviated titleICSE 2024
Country/TerritoryPortugal
CityLisbon
Period14/04/2420/04/24
Internet address

Fingerprint

Dive into the research topics of 'Toward Improved Deep Learning-based Vulnerability Detection'. Together they form a unique fingerprint.

Cite this