Data from: Stepwise evolution of a butterfly supergene via duplication and inversion

Dataset

Description

Genome_assemblies.tgz (7 files):

MB18102_MAT.fasta.gz - Genome assembly of maternal haplotype of individual MB18102, made with Canu.
MB18102_PAT.fasta.gz - Genome assembly of paternal haplotype of individual MB18102, made with Canu.
SB211_MAT.fasta.gz - Genome assembly of maternal haplotypes from brood SB211, made with Canu.
SB211_PAT.fasta.gz - Genome assembly of paternal haplotypes from brood SB211, made with hifiasm.
SB211_MAT.hifiasm.fasta.gz - Alternative genome assembly of paternal haplotypes from brood SB211, made with hifiasm.
SB211_PAT.canu.fasta.gz - Alternative genome assembly of paternal haplotypes from brood SB211, made with Canu.
Dchry2.haplotigs.fasta.gz - Purged haplotigs from Danaus chrysippus assembly Dchry2.2 (Singh et al. 2022).

Genome_annotations.tgz (4 files)

MB18102_MAT.tidy.gff3 - Genome annotation for MB18102MAT maternal assembly
MB18102_PAT.tidy.gff3 - Genome annotation for MB18102PAT paternal assembly
SB211_MAT.tidy.gff3 - Genome annotation for SB211MAT maternal assembly
SB211_PAT.tidy.gff3 - Genome annotation for SB211PAT paternal assembly



Repeat_data.tgz (7 files)

Lepidoptera_and_danaus_chrysippus2.2.repeatmasker - Repeat library for Dchry2.2 assembly
Dchry2.2.chr15.TE_50kb - Repeat content in 50 kb windows for Dchry2.2 assembly chr15
MB18102MAT.chr15.TE_50kb - Repeat content in 50 kb windows for MB18102MAT maternal assembly chr15
MB18102PAT.chr15.TE_50kb - Repeat content in 50 kb windows for MB18102PAT paternal assembly chr15
SB211PAT.chr15.TE_50kb - Repeat content in 50 kb windows for SB211PAT paternal assembly chr15
SB211MAT.chr15.TE_50kb - Repeat content in 50 kb windows for SB211MAT maternal assembly chr15
DplexMex.chr15.TE_50kb - Repeat content in 50 kb windows for D. plexippus DplexMex assembly chr15



minimap2_alignments.tgz (13 files)

DplexMex_Dchry2.2_minimap2.asm20.paf.gz - minimap2 alignment: reference=DplexMex, query=Dchry2.2
DplexMex_Dchry2hap.mm2asm20.paf.gz - minimap2 alignment: reference=DplexMex, query=Dchry2.2HAP haplotigs
DplexMex_MB18102MAT.mm2asm20.paf.gz - minimap2 alignment: reference=DplexMex, query=MB18102MAT
DplexMex_MB18102PAT.mm2asm20.paf.gz - minimap2 alignment: reference=DplexMex, query=MB18102PAT
DplexMex_SB211MAT.mm2asm20.paf.gz - minimap2 alignment: reference=DplexMex, query=SB211MAT
DplexMex_SB211PAT.mm2asm20.paf.gz - minimap2 alignment: reference=DplexMex, query=SB211PAT
Dchry2.2_MB18102MAT_mm2asm10.paf.gz - minimap2 alignment: reference=Dchry2.2, query=MB18102MAT
Dchry2.2_MB18102PAT_mm2asm10.paf.gz - minimap2 alignment: reference=Dchry2.2, query=MB18102PAT
Dchry2HAP_MB18102MAT_mm2asm10.paf.gz - minimap2 alignment: reference=Dchry2.2HAP haplotigs, query=MB18102MAT
MB18102MAT_SB211PAT.mm2asm10.paf.gz - minimap2 alignment: reference=MB18102MAT, query=SB211PAT
SB211PAT_Dchry2.2_mm2asm10.paf.gz - minimap2 alignment: reference=SB211PAT, query=Dchry2.2
SB211MAT_hifiasm_vs_canu.mm2asm10.paf.gz - minimap2 alignment: reference=SB211MAT alternative hifiasm assembly, query=SB211MAT canu assembly
SB211PAT_hifiasm_vs_canu.mm2asm10.paf.gz - minimap2 alignment: reference=SB211PAT hifiasm assembly, query=SB211PAT alternative canu assembly



Regions_coordinates.tgz (7 files)

BC_regions_coordinates_DplexMex_Dplex4.xlsx - Regions 1-4 coordinates in D. plexippus assemblies DplexMex and Dplex4
DplexMex_region_coordinates.tsv - Regions 1-4 coordinates in DplexMex assembly
MB18102MAT_region_coordinates.tsv - Regions 1-4 coordinates in MB18102MAT assembly
MB18102PAT_region_coordinates.tsv - Regions 1-4 coordinates in MB18102PAT assembly
Dchry2.2_region_coordinates.tsv - Regions 1-4 coordinates in Dchry2.2 assembly
SB211MAT_region_coordinates.tsv - Regions 1-4 coordinates in SB211MAT assembly
SB211PAT_region_coordinates.tsv - Regions 1-4 coordinates in SB211PAT assembly



VCF_and_geno_files.tgz (14 files)

dan17.BT.DP5GQ20.CDS.vcf.gz - VCF for CDS sites only for 16 Danaus samples and outgroup Tirumala formosa aligned to the Danaus plexippus Dplex4 assembly.
dan17.BT.DP5GQ20.4Dsites.geno.gz - Genotypes file for 4-fold degenerate sites only for 16 Danaus samples and outgroup Tirumala formosa aligned to the Danaus plexippus Dplex4 assembly.
chry10.BT.Dchry2.2.DP8GQ20.chr15.vcf.gz - VCF for 10 Danaus chrysippus samples aligned to the Dchry2.2 assembly, chr15 only.
chry10.BT.Dchry2HAP.DP8GQ20.chr15.vcf.gz - VCF for 10 Danaus chrysippus samples aligned to the Dchry2.2 alternative haplotig, chr15 only.
chry10.BT.MB18102MAT.DP8GQ20.chr15.vcf.gz - VCF for 10 Danaus chrysippus samples aligned to the MB18102MAT maternal assembly, chr15 only.
chry10.BT.MB18102PAT.DP8GQ20.chr15.vcf.gz - VCF for 10 Danaus chrysippus samples aligned to the MB18102PAT paternal assembly, chr15 only.
chry10.BT.SB211MAT.DP8GQ20.chr15.vcf.gz - VCF for 10 Danaus chrysippus samples aligned to the SB211MAT maternal assembly, chr15 only.
chry10.BT.SB211PAT.DP8GQ20.chr15.vcf.gz - VCF for 10 Danaus chrysippus samples aligned to the SB211PAT paternal assembly, chr15 only.
chry10.BT.Dchry2.2.DP8GQ20.chr15.geno.gz - Genotypes file for 10 Danaus chrysippus samples aligned to the Dchry2.2 assembly, chr15 only.
chry10.BT.Dchry2HAP.DP8GQ20.chr15.geno.gz - Genotypes file for 10 Danaus chrysippus samples aligned to the Dchry2.2 alternative haplotig, chr15 only.
chry10.BT.MB18102MAT.DP8GQ20.chr15.geno.gz - Genotypes file for 10 Danaus chrysippus samples aligned to the MB18102MAT maternal assembly, chr15 only.
chry10.BT.MB18102PAT.DP8GQ20.chr15.geno.gz - Genotypes file for 10 Danaus chrysippus samples aligned to the MB18102PAT paternal assembly, chr15 only.
chry10.BT.SB211MAT.DP8GQ20.chr15.geno.gz - Genotypes file for 10 Danaus chrysippus samples aligned to the SB211MAT maternal assembly, chr15 only.
chry10.BT.SB211PAT.DP8GQ20.chr15.geno.gz - Genotypes file for 10 Danaus chrysippus samples aligned to the SB211PAT paternal assembly, chr15 only.



all_gene_alignments.tgz (5954 files) - sequence alignments for each of 5954 genes (Dplex4 assembly) for 16 Danaus samples and an outgroup Tirumala formosa sample.

BC_Region_alignments.tgz (4 files)
Region1.1_concat.fasta - Concatenated alignment for genes in Region 1.1
Region1.2_concat.fasta - Concatenated alignment for genes in Region 1.2
Region2_concat.fasta - Concatenated alignment for genes in Region 2
Region4_concat.fasta - Concatenated alignment for genes in Region 4



Diversity_and_divergence_data.tgz (7 files)

chry10.BT.Dchry2.2.DP8GQ20.chr15.divStats.w25ksitesMin10k.csv.gz - diversity and divergence measures for 25kb windows for 10 D. chrysippus samples aligned to the Dchry2.2 assembly chr15
chry10.BT.Dchry2HAP.DP8GQ20.chr15.divStats.w25ksitesMin10k.csv.gz - diversity and divergence measures for 25kb windows for 10 D. chrysippus samples aligned to the Dchry2HAP haplotig assembly chr15
chry10.BT.MB18102MAT.DP8GQ20.chr15.divStats.w25ksitesMin10k.csv.gz - diversity and divergence measures for 25kb windows for 10 D. chrysippus samples aligned to the MB18102MAT assembly chr15
chry10.BT.MB18102PAT.DP8GQ20.chr15.divStats.w25ksitesMin10k.csv.gzv - diversity and divergence measures for 25kb windows for 10 D. chrysippus samples aligned to the MB18102PAT assembly chr15
chry10.BT.SB211MAT.DP8GQ20.chr15.divStats.w25ksitesMin10k.csv.gz - diversity and divergence measures for 25kb windows for 10 D. chrysippus samples aligned to the SB211MAT assembly chr15
chry10.BT.SB211PAT.DP8GQ20.chr15.divStats.w25ksitesMin10k.csv.gz - diversity and divergence measures for 25kb windows for 10 D. chrysippus samples aligned to the SB211PAT assembly chr15
dan17.BT.DP5GQ20.4Dsites.divStats.geneWindows.csv.gz - divergence measures for each gene for 16 Danaus samples and one outgroup Tirumala formosa aligned to the Dplex4 assembly



Read_depth_data.tgz (7 files)

chry10.BT.Dchry2.2.DPstats.w50.csv - Read depth statistics for 50kb windows for 10 D. chrysippus samples aligned to the Dchry2.2 assembly
chry10.BT.Dchry2HAP.DPstats.w50.csv - Read depth statistics for 50kb windows for 10 D. chrysippus samples aligned to the Dchry2.2HAP haplotig assembly
chry10.BT.MB18102MAT.DPstats.w50.csv - Read depth statistics for 50kb windows for 10 D. chrysippus samples aligned to the MB18102MAT assembly
chry10.BT.MB18102PAT.DPstats.w50.csv - Read depth statistics for 50kb windows for 10 D. chrysippus samples aligned to the MB18102PAT assembly
chry10.BT.SB211MAT.DPstats.w50.csv - Read depth statistics for 50kb windows for 10 D. chrysippus samples aligned to the SB211MAT assembly
chry10.BT.SB211PAT.DPstats.w50.csv - Read depth statistics for 50kb windows for 10 D. chrysippus samples aligned to the SB211PAT assembly
dan17.Dplex4.chr7.BT.CDS.allshared.dpstats.geneWindows.csv - Read depth statistics for each gene in Dplex4 chr7 (=chr15 in D. chrysippus) for 16 Danaus samples and an outgroup Tirumala formosa sample

IBSrelate_results_NewTrios140202.txt - Relatedness measures for SM18W01, SM18S10 and MB18102

Abstract

Supergenes maintain adaptive clusters of alleles in the face of genetic mixing. Although usually attributed to inversions, supergenes can be complex, and reconstructing the precise processes that led to recombination suppression and their timing is challenging. We investigated the origin of the BC supergene, which controls variation in warning colouration in the African Monarch butterfly, Danaus chrysippus. By generating chromosome-scale assemblies for all three alleles, we identified multiple structural differences. Most strikingly, we find that a region of >1 million bp underwent several segmental duplications at least 7.5 million years ago. The resulting duplicated fragments appear to have triggered four inversions in surrounding parts of the chromosome, resulting in stepwise growth of the region of suppressed recombination. Phylogenies for the inversions are incongruent with the species tree, and suggest that structural polymorphisms have persisted for at least 4.1 million years. In addition to the role of duplications in triggering inversions, our results suggest a previously undescribed mechanism of recombination suppression through independent losses of divergent duplicated tracts. Overall, our findings add support for a stepwise model of supergene evolution involving a variety of structural changes.
Date made available3 May 2022
PublisherDryad

Cite this