Is modularity transferable? A case study through the lens of knowledge distillation

Mateusz Klimaszewski, Piotr Andruszkiewicz, Alexandra Birch

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract / Description of output

The rise of Modular Deep Learning showcases its potential in various Natural Language Processing applications. Parameter-efficient fine-tuning (PEFT) modularity has been shown to work for various use cases, from domain adaptation to multilingual setups. However, all this work covers the case where the modular components are trained and deployed within one single Pre-trained Language Model (PLM). This model-specific setup is a substantial limitation on the very modularity that modular architectures are trying to achieve. We ask whether current modular approaches are transferable between models and whether we can transfer the modules from more robust and larger PLMs to smaller ones. In this work, we aim to fill this gap via a lens of Knowledge Distillation, commonly used for model compression, and present an extremely straightforward approach to transferring pre-trained, task-specific PEFT modules between same-family PLMs. Moreover, we propose a method that allows the transfer of modules between incompatible PLMs without any change in the inference complexity. The experiments on Named Entity Recognition, Natural Language Inference, and Paraphrase Identification tasks over multiple languages and PEFT methods showcase the initial potential of transferable modularity.
Original languageEnglish
Title of host publicationProceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation
EditorsNicoletta Calzolari, Min-Yen Kan, Veronique Hoste, Alessandro Lenci, Sakriani Sakti, Nianwen Xue
PublisherELRA and ICCL
Pages9352–9360
Number of pages9
ISBN (Electronic)9782493814104
Publication statusPublished - 25 May 2024
EventThe 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation - Lingotto Conference Centre, Torino, Italy
Duration: 20 May 202425 May 2024
https://lrec-coling-2024.org/

Publication series

NameProceedings of the Joint International Conference on Computational Linguistics, Language Resources and Evaluation
PublisherELRA and ICCL
ISSN (Print)2522-2686
ISSN (Electronic)2951-2093

Conference

ConferenceThe 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation
Abbreviated titleLREC-COLING 2024
Country/TerritoryItaly
CityTorino
Period20/05/2425/05/24
Internet address

Fingerprint

Dive into the research topics of 'Is modularity transferable? A case study through the lens of knowledge distillation'. Together they form a unique fingerprint.

Cite this