Genetic variation in recalcitrant repetitive genomic regions in Drosophila melanogaster
Authors: Harsh G. Shukla; Mahul Chakraborty; J.J. Emerson
Affiliation: University of California, Irvine
Keywords: c. chromosome structural variation; n. other (Hetrochromatin and Repeats)
About one third of the Drosophila melanogaster genome is heterochromatic and consists of repetitive sequences like satellites, transposable elements (TE), ribosomal DNA, and occasional single-copy sequence. Despite its role in chromosome segregation, nuclear organization, and gene expression, much of the highly repetitive heterochromatin has been recalcitrant to assembly. This limitation has impeded delineation of genetic variation, evolution, and function of this crucial genomic region. To resolve the sequence of the repetitive genomic regions and map the genetic variation within them, we de novo assembled the genomes of two isogenic strains, the reference strain ISO1 and A4, using Pacific Biosciences highly accurate (HiFi) long reads and compared their assemblies. The euchromatin arms are gapless in our assemblies and provide a complete map of genetic variation in them. We incorporated ~8 Mb of new heterochromatin sequences into the chromosome arm scaffolds, including ~3.5 Mb of X pericentric heterochromatin containing rDNA. We also assembled ~15 Mb of Y Chromosome from the two strains, unveiling the first detailed map of genetic variation for this highly repetitive chromosome. We show that despite being prone to structural mutations, the repetitive regions of the D. melanogaster genome exhibit contrasting patterns of copy number variation across different gene arrays. For example, the size of the Histone cluster is similar (~560 kbp) between A4 and ISO1, whereas the X-linked Stellate (Ste) gene cluster shows striking variation between the two strains. The Histone cluster in both strains consists of ~110 copies, whereas A4 and ISO1 carry 192 and 11 tandem copies of Ste in the X euchromatin, respectively. The varying degrees of structural variation in these two gene clusters are likely because the Histone copy numbers are evolving under stabilizing or purifying selection, whereas the Ste copy numbers are shaped by an evolutionary arms race between X-linked Ste and their Y-linked suppressors Su(Ste). Furthermore, complete resolution of tandem arrays like the Histone cluster at nucleotide level offer an avenue for determining the relative roles of birth-and-death vs concerted processes in the evolution of such clusters. Our results not only show a detailed map of molecular genetic variation within the hitherto unassembled repetitive genomic regions, but also lay the foundation for comparative and functional genomics of complete D. melanogaster genomes.