The longitudinal study of populations is a core tool for understanding ecological and evolutionary processes. Long-term studies typically collect samples repeatedly over individual lifetimes and across generations. These samples are then analysed in batches (e.g. qPCR plates) and clusters (i.e. group of batches) over time in the laboratory. However, these analyses are constrained by cross-classified data structures introduced biologically or through experimental design. The separation of biological variation from the confounding among-batch and among-cluster variation is crucial, yet often ignored. The commonly used approaches to structuring samples for analysis, sequential and randomization, generate bias due to the non-independence between time of collection and the batch and cluster they are analysed in. We propose a new sample structuring strategy, called slicing, designed to separate confounding among-batch and among-cluster variation from biological variation. Through simulations, we tested the statistical power and precision to detect within-individual, between-individual, year and cohort effects of this novel approach. Our slicing approach, whereby recently and previously collected samples are sequentially analysed in clusters together, enables the statistical separation of collection time and cluster effects by bridging clusters together, for which we provide a case study. Our simulations show, with reasonable slicing width and angle, similar precision and similar or greater statistical power to detect year, cohort, within- and between-individual effects when samples are sliced across batches, compared with strategies that aggregate longitudinal samples or use randomized allocation. While the best approach to analysing long-term datasets depends on the structure of the data and questions of interest, it is vital to account for confounding among-cluster and batch variation. Our slicing approach is simple to apply and creates the necessary statistical independence of batch and cluster from environmental or biological variables of interest. Crucially, it allows sequential analysis of samples and flexible inclusion of current data in later analyses without completely confounding the analysis. Our approach maximizes the scientific value of every sample, as each will optimally contribute to unbiased statistical inference from the data. Slicing thereby maximizes the power of growing biobanks to address important ecological, epidemiological and evolutionary questions.
- long-term studies
- mixed models