302A Poster - 03. Evolution
Thursday April 07, 2:00 PM - 4:00 PM

Identification of Three Novel Paralogs of CG3795


Authors:
Jaquelyn Hester 1; Amanda Moy 2; Jayda Cavanaugh 3; Kaitlyn Schoonover 4; Kelly Alvarado 5; Stanley Guan 6; Evan Merkhofer 3; Gerard McNeil 6; Howard Granok 5; Martin Burg 2; Michael Foulk 4; Christopher Ellison 1; Wilson Leung 5; Cindy Arrigo 7

Affiliations:
1) Rutgers University - New Brunswick, New Brunswick, NJ; 2) Grand Valley State University, Allendale Charter Township, MI; 3) Mount Saint Mary College, Newburgh, NY; 4) Mercyhurst University, Erie, PA; 5) Washington University in St. Louis, St. Louis County, MO; 6) CUNY York College, Jamaica, NY; 7) New Jersey City University, Jersey City, NJ

Keywords:
d. evolution of gene expression; e. heterochromatin

The CG3795 gene is located on the Muller A Element (X chromosome) in Drosophila melanogaster. The FlyBase Gene Summary indicates that this gene is involved in the breakdown of proteins (proteolysis) and exhibits serine-type endopeptidase activity. As part of an investigation into the expansion of the Drosophila ananassae Muller F Element (~19.1 Mb compared to ~1.3 Mb in D. melanogaster), we identified four features within the D. ananassae Oct. 2018 (AGI/DanaRS2) assembly that show significant sequence similarity to the D. melanogaster CG3795-PA protein. Two of the features were located on scaffold QMES02000001 (tentatively assigned to the Muller E Element based on synteny analysis), and two of the features were located on scaffold QMES02000178 (tentatively assigned to the Muller D Element). Upon further analysis, one of the features on scaffold QMES02000178 was assigned as the putative ortholog based on the following criteria: highest protein alignment coverage (i.e., subject coverage), low E-value, and high percent identity. The other features were determined to be novel paralogs of CG3795 since the RNA-Seq data shows that these regions are being actively transcribed in D. ananassae. In addition, the BLASTX alignments of the D. ananassae genomic region surrounding each feature against the D. melanogaster CG3795-PA protein did not show any in-frame stop codons or frame shifts, thereby supporting the hypothesis that these features are protein-coding genes, not pseudogenes. The RNA-Seq data in D. melanogaster suggests that CG3795 is a male-specific gene, with it’s highest expression levels in the testis (modENCODE Tissue Specific RNA-Seq data and Developmental RNA-Seq data). In contrast, the modENCODE RNA-Seq data indicates that CG3795 is not expressed in embryos or in adult females. This CG3795 male-specific RNA-Seq data expression pattern can also be observed in D. ananassae. Comparative annotation of species closely-related to D. ananassae shows that the CG3795 ortholog and the three CG3795 paralogs are also present in D. bipectinata. Given the number of novel paralogs of CG3795 found in both D. ananassae and D. bipectinata, we expect neofunctionalization of these male-specific genes in these two species. Future investigations will try to identify the CG3795 orthologs and paralogs in other Drosophila species in order to better understand the evolution of this male-specific gene.