Adversarial augmentation training makes action recognition models more robust to realistic video distribution shifts

Kiyoon Kim, Shreyank Narayana Gowda, Panagiotis Eustratiadis, Antreas Antoniou, Robert B Fisher

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

Despite recent advances in video action recognition achieving strong performance on existing
benchmarks, these models often lack robustness when faced with natural distribution shifts between training and test data. We propose two novel evaluation methods to assess model resilience to such distribution disparity. One method uses two different datasets collected from different sources and uses one for training and validation, and the other for testing. More precisely, we created dataset splits of HMDB-51 or UCF-101 for training, and Kinetics-400 for testing, using the subset of the classes that are overlapping in both train and test datasets. The other proposed method extracts the feature mean of each class from the target evaluation dataset’s training data (i.e. class prototype), and estimates test video prediction as a cosine similarity score between each sample to the class prototypes of each target class. This procedure does not alter model weights using the target dataset and it does not require aligning overlapping classes of two different datasets, thus it is a very efficient method to test the model robustness to distribution shifts, without prior knowledge of the target distribution. We address the robustness problem by adversarial augmentation training – generating augmented views of videos that are “hard” for the classification model by applying gradient ascent on the augmentation parameters – as well as “curriculum” scheduling the strength of the video augmentations. We experimentally demonstrate the superior performance of the proposed adversarial augmentation approach over baselines across three state-of-the-art action recognition models - TSM, Video Swin Transformer, and Uniformer. Our curated datasets and source code are publicly available. The presented work provides critical insight into model robustness to distribution shifts and presents effective techniques to enhance video action recognition performance in a realworld deployment.
Original languageEnglish
Title of host publicationPattern Recognition and Artificial Intelligence
PublisherSpringer, Cham
Publication statusAccepted/In press - 3 Apr 2024
EventThe 4th International Conference on
Pattern Recognition and Artificial Intelligence
- ICC Jeju, Jeju Island, Korea, Democratic People's Republic of
Duration: 3 Jul 20246 Jul 2024
https://brain.korea.ac.kr/icprai2024/

Publication series

NameLecture Notes in Computer Science
PublisherSpringer, Cham
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

ConferenceThe 4th International Conference on
Pattern Recognition and Artificial Intelligence
Abbreviated titleICPRAI 2024
Country/TerritoryKorea, Democratic People's Republic of
CityJeju Island
Period3/07/246/07/24
Internet address

Keywords / Materials (for Non-textual outputs)

  • action recognition
  • distribution shifts
  • adversarial training
  • data augmentation

Fingerprint

Dive into the research topics of 'Adversarial augmentation training makes action recognition models more robust to realistic video distribution shifts'. Together they form a unique fingerprint.

Cite this