214T Poster - Population Genetics
Thursday June 09, 8:30 PM - 9:15 PM

Human populations exhibit correlated abundances and variation of tandem repeat content.


Authors:
Iskander Said; Andrew Clark; Daniel Barbash

Affiliation: Cornell University

Keywords:
Molecular Evolution

Human genome sequence data, by virtue of the telomere-to-telomere assembly, exceptionally high-quality annotation, and massive sample size, provides an excellent opportunity to investigate population genetic processes at play on repetitive DNA. Some repeats are integral to aspects of cellular and organismal function, such as meiotic segregation and genome regulation, as well as being implicated in complex evolutionary processes, such as speciation and meiotic drive. In humans, population-scale analysis of short tandem repeat polymorphisms (microsatellites) in euchromatin have found high levels of diversity in repeat content and extensive population stratification. Some of this polymorphism has functional consequences and has been implicated in disease etiology. To extend these results to a genome-wide scale, including long tandem arrays that cannot be assembled with short sequence reads, we employ the method k-seek, which directly queries unaligned fastq files to discover and quantify tandem repeats consisting of 1-20bp long repeating monomers, without requiring genomic assemblies. We have mined a set of 2,504 high coverage human genomes from the 1,000 Genomes Project to analyze the inter- and intra-population variation of human tandem repeats. We have found over 16,000 distinct tandem repeats, whose expansions and contractions can account for over 10 Mbp of difference in genome size among individuals, with the Y chromosome accounting for a substantial portion of this in males. We see high levels of inter-population divergence, consistent with a high rate of copy number changes. As we have seen in other organisms, there is a striking pattern of correlation in abundances among groups of repeats, whose cause remains a mystery. An exciting future opportunity that will rely on extensive telomere-to-telomere assemblies will be to consider satellite repeat changes localized to specific genomic loci.