Abstract
Foundation models are large models pre-trained on tremendous amount of data. They can be typically adapted to diverse downstream tasks with minimal effort. However, as foundation models are usually pre-trained on images or texts sourced from the Internet, their performance in specialized domains, such as plant phenotyping, comes into question. In addition, fully fine-tuning foundation models is time-consuming and requires high computational power. This paper investigates the efficient adaptation of foundation models for plant phenotyping settings and tasks. We perform extensive experiments on fine-tuning three foundation models, MAE, DINO, and DINOv2 on three essential plant phenotyping tasks: leaf counting, instance segmentation, and disease classification. In particular, the pretrained backbones are kept frozen, while two distinct fine-tuning methods are evaluated, namely adapter tuning (using LoRA) and decoder tuning. The experimental results show that a foundation model can be efficiently adapted to multiple plant phenotyping tasks, yielding similar performance as the state-of-the-art (SoTA) models specifically designed or trained for each task. Despite exhibiting great transferability over different tasks, the fine-tuned foundation models perform slightly worse than the SoTA task-specific models in some scenarios, which requires further investigation.
| Original language | English |
|---|---|
| Pages | 604-613 |
| DOIs | |
| Publication status | Published - 25 Dec 2023 |
| Event | 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) - Paris, France Duration: 2 Oct 2023 → 6 Oct 2023 |
Conference
| Conference | 2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW) |
|---|---|
| Period | 2/10/23 → 6/10/23 |
Fingerprint
Dive into the research topics of 'Adapting Vision Foundation Models for Plant Phenotyping'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver