Missense variants are alterations to protein coding sequences that alter the identity of an amino acid at a specific site. They can be deleterious if the site is required for function but are likely to be tolerated at other sites. Consequently, missense variation within a healthy population mirrors the effects of negative selection on protein structure and function. Advances in high-throughput sequencing have dramatically increased the sample size of human variation data, allowing for population-wide interpretations of selective pressures. In this study, we developed a convenient framework for mapping missense variants onto solved or modelled protein structures and applied it to characterize the ARID family of gene regulators. ARID family members are implicated in multiple cancer types, developmental disorders, and immunological diseases but current understanding of their mechanistic roles is incomplete. Combined with phylogenetic and structural analyses, our approach allowed us to identify important sites for protein-protein interactions, histone code recognition, and DNA binding by the ARID proteins. We propose that missense variants can serve as a valuable tool for complementing experimental findings and formulating mechanistic hypotheses.
Cook, Atlanta; Deak, Gauri. (2021). The use of missense variations in functional annotation of the human ARID family of DNA binding proteins, [dataset]. University of Edinburgh. School of Biological Sciences. Wellcome Trust Centre for Cell Biology. https://doi.org/10.7488/ds/3190.