169V Poster Online - Virtual Posters
Tuesday June 07, 11:00 AM - 3:00 PM

Mixing genome annotation methods in a comparative analysis inflates the apparent number of lineage-specific genes


Authors:
Caroline Weisman 1; Sean Eddy 2; Andrew Murray 2

Affiliations:
1) Princeton University; 2) Harvard University

Keywords:
Comparative genomics & genome evolution

Comparing the genomes of different species regularly reveals genes that seem to be unique to one or a few related species. These “lineage-specific” genes are often thought to represent genetic novelty, with many potentially interesting consequences; for example, these genes have been proposed to underlie species- or taxon-specific evolutionary innovations. The comparative analyses from which these genes emerge often use genome sequences from which genes have been inferred, or annotated, using a mixture of different methods. Using different methods to annotate different species increases the risk that orthologous DNA sequences of the same coding status, actually both genic or both non-genic, have been erroneously annotated in one species but not in another, merely appearing to be lineage-specific. Here, we quantitatively evaluate the impact of this effect, which we term “annotation heterogeneity,” in four case studies. We find that annotation heterogeneity consistently, and often substantially, increases the apparent number of lineage-specific genes, suggesting that it may be a source of substantial artifact.