Provenance expressiveness benchmarking on non-deterministic executions

Sheung Chi Chan, James Cheney, Pramod Bhatotia

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

Data provenance is a form of meta-data recording inputs and processes. It provides historical records and origin information of the data. Because of the rich information provided, provenance is increasingly being used as a foundation for security analysis and forensic auditing. These applications require provenance with high quality. Earlier works have proposed a provenance expressiveness benchmarking approach to automatically identify and compare the results of different provenance systems and their generated provenance. However, previous work was limited to benchmarking deterministic activities, whereas all real-world systems involve non-determinism, for example through concurrency and multiprocessing. Benchmarking non-deterministic events is challenging because the process owner has no control over the interleaving between processes or the execution order of system calls coming from different processes, leading to a rapid growth in the number of possible schedules that need to be observed. To cover these cases and provide all-around automated expressiveness benchmarking for real-world examples, we proposed an extension to the automated provenance benchmarking tool, ProvMark, to handle non-determinism.
Original languageEnglish
Title of host publication13th International Workshop on Theory and Practice of Provenance
PublisherUSENIX Association
Publication statusPublished - 16 Jul 2021
Event13th International Workshop on Theory and Practice of Provenance - Virtual Workshop
Duration: 19 Jul 202120 Jul 2021
https://iitdbgroup.github.io/ProvenanceWeek2021/tapp.html

Workshop

Workshop13th International Workshop on Theory and Practice of Provenance
Abbreviated titleTaPP 2021
Period19/07/2120/07/21
Internet address

Fingerprint

Dive into the research topics of 'Provenance expressiveness benchmarking on non-deterministic executions'. Together they form a unique fingerprint.

Cite this