145W Poster - Evolutionary Genetics
Wednesday June 08, 9:15 PM - 10:00 PM

Understanding the spread of SARS-CoV-2 clusters through an integrated pipeline using UShER, Cluster Tracker and StrainHub


Authors:
Adriano de Bernardi Schneider 1; Colby T Ford 2,3; Jakob McBroome 1; Jennifer Martin 1; Daniel Janies 2; Yatish Turakhia 4; Russel Corbett-Detig 1

Affiliations:
1) Biomolecular Engineering and Genomics Institute, University of California, Santa Cruz, USA; 2) Department of Bioinformatics and Genomics, University of North Carolina at Charlotte, Charlotte, NC, USA; 3) School of Data Science, University of North Carolina at Charlotte, Charlotte, NC, USA; 4) Electrical and Computer Engineering, University of California, San Diego, San Diego, USA

Keywords:
Phylogenetics, Macroevolution, and Biogeography

Response by the scientific community to the SARS-CoV-2 pandemic has created an unprecedented amount of genomic data that has to be processed and analyzed in a timely manner to have public health impact. These circumstances push for a change in the current way genomic data has been evaluated in order to assist epidemiologists and public health officials make effective policy changes. To these ends, we integrate a pipeline to bring together three tools currently available and supported by our research groups and evaluate select SARS-CoV-2 clusters identified to understand the spread of these clusters in the United States. Our pipeline consists of three applications: UShER, a program for rapid, accurate placement of viral genomic samples to existing phylogenies, which allows the evaluation of very large datasets in a timely manner; Cluster Tracker, a program which automatically identifies and highlights groups of closely related SARS-CoV-2 infections resulting from inter-regional transmission across the United States through a phylogenetically-informed summary heuristic; and StrainHub, a web-based application that generates transmission networks based on character state changes in metadata mapped to a phylogeny. Strainhub allows the user to visualize networks in a network format, on a map as well as calculate centrality metrics to provide insights on the behavior of network nodes (i.e., source, sink or hub behavior of traits). Using these three tools, we created a workflow that allows the user to identify the genomic sequences of interest with Cluster Tracker, select and extract the sequences and metadata from the underlying dataset using UShER’s MatUtils tools through a snakemake workflow, and evaluate the cluster structure and behavior using StrainHub. We evaluated four SARS-CoV-2 large and diverse nodes identified in ClusterTracker using StrainHub in order to present our tools capabilities. This pipeline offers genetics researchers, epidemiologists, and public health officials the tools needed to rapidly reduce the impact of SARS-CoV-2 in our communities. Additionally, this pipeline can be expanded to other pathogens, increasing the reach to more research groups and the scientific community.