Insights into D. melanogaster and D. simulans transcriptome evolution and complexity using transcript distance (TranD)
Authors: Adalena Nanni 1,2; James Titus-McQuillan 3; Oleksandr Moskalenko 4; Francisco Pardo-Palacios 5; Sarah Signor 6; Srna Vlaho 7; Zihao Liu 1,2; Ana Conesa 2,5,8; Rebekah Rogers 3; Lauren McIntyre 1,2
Affiliations: 1) Department of Molecular Genetics and Microbiology, University of Florida, Gainesville, FL; 2) University of Florida Genetics Institute, University of Florida, Gainesville, FL; 3) University of North Carolina Department of Bioinformatics, Charlotte, NC; 4) University of Florida Research Computing, University of Florida, Gainesville, FL; 5) Dept. of Applied Statistics and Operational Research, and Quality, Polytechnical University of Valencia, Spain; 6) Department of Biological Sciences, North Dakota State University, Fargo, ND; 7) Department of Biological Sciences, University of Southern California, Los Angeles, CA; 8) Institute for Integrative Systems Biology, Spanish National Research Council (CSIC), Paterna, Spain
Keywords: t. bioinformatic and genome tools; g. alternative splicing
Alternative splicing is an important driver of phenotypic diversity in higher eukaryotes. Understanding how alternative splicing and variation in transcript structure diverge across species can provide insights into phenotypic divergence and speciation. Long-read sequencing of mRNA provides an opportunity to observe transcript structure. We present metrics of complexity and nucleotide-level descriptions of structural phenotypes that can be calculated within an individual transcriptome or compared across transcriptomes. Using this tool, we show how patterns of transcriptome complexity can be compared across species without depending on the identification of orthologs. We further demonstrate that distance metrics can be used to compare transcriptomes of the closely related species, D. melanogaster and D. simulans, and identify novel exons which we validate. We implement out metrics in a PyPi package TranD and in the open source bioinformatics for everyone platform Galaxy (www.Galaxyproject.org) which will empower a wide range of researchers to quickly identify minimum distance transcripts between species, interesting structural variants within species and genome complexity enabling deeper understanding of splicing mechanisms and transcript evolution.