544C Poster - 07. Chromatin, epigenetics and genomics
Saturday April 09, 1:30 PM - 3:30 PM

Towards telomere-to-telomere genome assemblies of Drosophila melanogaster


Authors:
Nicolas Altemose 1; Susan E. Celniker 2; Mahul Chakraborty 3; Cécile Courret 4; J.J. Emerson 3; Gary H. Karpen 1,2; Bernard Y. Kim 5; Charles H. Langley 6; Sasha Langley 1; Amanda M. Larracuente 4; Barbara Mellone 7; Karen H. Miga 8; Danny E. Miller 9,10; Rachel J. O’Neill 7; Adam M. Phillippy 11; Brandon D. Pickett 11; Harsh G. Shukla 3; The Drosophila Telomere-to-Telomere Consortium

Affiliations:
1) University of California, Berkeley, Berkeley, CA; 2) Lawrence Berkeley National Lab, Berkeley, CA; 3) University of California, Irvine, Irvine, CA; 4) University of Rochester, Rochester, NY; 5) Stanford University, Stanford, CA; 6) University of California, Davis, Davis, CA; 7) University of Connecticut, Storrs, CT; 8) University of California, Santa Cruz, CA; 9) University of Washington, Seattle, WA; 10) Seattle Children's Hospital, Seattle, WA; 11) National Human Genome Research Institute, NIH, Bethesda, MD

Keywords:
e. heterochromatin; q. other (Telomere to telomere sequencing)

The goal of genome sequencing and assembly is to capture all sequence features that play critical roles in organisms and to join them together as parts of complete, gapless chromosome reference assemblies. However, in Drosophila melanogaster, the most complete assemblies are at least 15-20% smaller than the known genome size. The segments missing from assemblies are almost entirely composed of repetitive sequences, primarily transposable elements and satellite repeats concentrated in pericentromeric heterochromatin. These missing regions not only contain essential genes, they also harbor other elements that play crucial roles in genome stability, chromosome segregation, protein translation, and TE repression. To recover these important regions, we are building telomere-to-telomere assemblies of multiple D. melanogaster strains, including the genome reference strain iso-1. Our goal is to produce complete, extremely accurate (fewer than 1 error per megabase) assemblies for all three autosomes and the X and Y chromosomes. Efforts by the human Telomere-to-Telomere (T2T) Consortium have pioneered the completion of virtually gapless chromosome assemblies that span large repetitive arrays, including satellites, scrambled transposable elements, and ribosomal DNAs. These approaches leverage ultra-long sequencing reads to untangle assembly graphs derived from highly accurate long sequence reads. By applying this approach to additional strains, we can study variation in chromosome structures that have previously resisted scrutiny, like centromeres. This open, collaborative initiative aims to produce a gapless assembly of D. melanogaster, outline best practices for extending this approach to other strains and species, and support public accessibility of data releases and methodologies.