149W Poster - Evolutionary Genetics
Wednesday June 08, 9:15 PM - 10:00 PM

Treenome Browser: concurrent phylogeny-aware visualization of millions of genomes


Authors:
Alexander Kramer; Russell Corbett-Detig

Affiliation: University of California, Santa Cruz

Keywords:
Phylogenetics, Macroevolution, and Biogeography

The ongoing pandemic led to an unprecedented global sequencing effort that has yielded over seven million complete SARS-CoV-2 genomes. This massive dataset presented important challenges for data exploration and visualization. By taking advantage of the evolutionary redundancy among sequences, the mutation-annotated tree (MAT) data format1 uses phylogenetic compression to enable efficient storage and traversal of such large datasets. A MAT encodes the complete genetic variation of its component sequences, storing the same information as a multiple sequence alignment or VCF file in a fraction of the space. This reduction in size along with the phylogenetic structure makes visualization of very large trees and the underlying genome sequences feasible. We developed Treenome Browser, a genome browser for viewing millions of genomes in their phylogenetic context. Treenome Browser uses the MAT format to display the amino acid mutations present in each genome alongside a tree that displays mutations in the full phylogeny, remaining performant on trees of over 7M genomes. The MAT is traversed at run-time to quickly reconstruct and display mutations accumulated in samples relative to the root of the tree. Interactive exploration of the tree (provided by Taxonium2) reveals mutation patterns that arise in Treenome Browser at the levels of clades and individual genomes, allowing visual identification of the defining signatures of SARS-CoV-2 strains, for example. Treenome Browser has potential applications in phylogenetically-informed primer design and variant monitoring of SARS-CoV-2. This application empowers data visualization at unprecedented scale for SARS-CoV-2 and future similarly enormous datasets.

1. Turakhia, Y. et al. Ultrafast Sample placement on Existing tRees (UShER) enables real-time phylogenetics for the SARS-CoV-2 pandemic. Nat. Genet. 53, 809–816 (2021).
2. https://github.com/theosanderson/taxonium