Genotype-environment association (GEA) approaches are commonly used to identify potential loci under selection. Although missing data are pervasive in genomic datasets and large sample sizes are often difficult to obtain, little is known about their impact on the power to identify adaptive loci. We used genomic datasets simulated under varying selection strengths, dispersal abilities, and landscape configurations to assess the impact of missing data and sample size on the performance of two GEA approaches: distance-based redundancy analyses (dbRDA) and latent factor mixed models (LFMM). Increasing missing data levels up to 40% did not impact the performance of either method with a large sample size (500 individuals). Although true positive rates (TPRs) were high using LFMM, false positive rates (FPRs) were also high. TPRs decreased with increasing missing data when sample size was reduced to 100, particularly with moderate selection and dispersal. TPRs were low with only 30 individuals, regardless of missing data. Overall, dbRDA performed better than LFMM, showing higher TPRs and low FPRs with missing data and reduced sample size. A strong isolation-by-distance (IBD) signal with larger sample sizes appears to be linked to elevated FPRs with LFMM. By contrast, IBD did not influence dbRDA. Our study suggests that LFMM should not be used when IBD is strong. Our results indicate that relatively high levels of missing data have limited impact on GEA performance, but increasing sample sizes increases power. However, increasing sample size can substantially increase FPRs when patterns of neutral population structure do not match underlying assumptions.
|Conference||US-IALE 2017 Annual Meeting, People, Places, Patterns: Linking Landscape Heterogeneity and Socio-Environmental Systems |
|Period||9/04/17 → 13/04/17|