De novo discovery of motifs enriched in promoters of D. ananassae F Element genes
Authors: Annabelle Laughlin 1; Wilson Leung 1; Chris Shaffer 1; Cindy Arrigo 2; Genomics Education Partnership
Affiliations: 1) Washington University in St. Louis, St. Louis, MO; 2) New Jersey City University, Jersey City, NJ
Keywords: a. core promoters and general transcription factors; e. heterochromatin
The Drosophila melanogaster Muller F Element exhibits mostly heterochromatic characteristics (e.g., high repeat density, low recombination rates), but the distal ~1.3 Mb region contains ~80 protein-coding genes that show expression levels similar to that of euchromatic genes. Interestingly, this region has expanded to ~19.1 Mb in Drosophila ananassae due to an increase in repeat density (particularly retrotransposons). This project seeks to understand the regulatory mechanisms that allow the successful transcription of F Element genes in such repeat-rich domains by conducting comparative analysis of D. ananassae, D. bipectinata, D. takahashii, and D. kikkawai, (where the F Element has expanded to different degrees), focusing on transcription start sites (TSS). The GEP annotation protocol uses experimental data (RAMPAGE, ATAC-Seq, RNA-Seq) and sequence similarity to other Drosophila species to define TSS positions and promoter-flanking regions. Over the past two summers, 97 unique TSSs in D. ananassae have been annotated. The TSS data were used in coding region annotations. For example, the locations of the putative TSSs relative to the available start codons was used as evidence to support the hypothesis that the G isoform of the D. melanogasterZyx gene does not exist in D. ananassae. In a preliminary analysis using 26 unique TSSs in the D. ananassae F Element scaffold QMES02000012, ~77% of the TSS positions and ~86% of the narrow promoter-flanking regions were defined based on RAMPAGE or ATAC-Seq data. In contrast, ~54% of the wide promoter-flanking regions were defined by manual analyses (e.g., BLAST searches, multiple sequence alignments). A de novo motif discovery analysis of the 50 RAMPAGE peaks in QMES02000012 identified three significant motifs. Analysis of these motifs using TomTom showed that the two most significant motifs have no similarity to known D. melanogaster motifs, while the third showed similarity to a putative Top2 motif abundant at heterochromatic TSSs in D. melanogaster. These motifs are being used in searches of the genomes of the other three species to see if they are enriched near the promoters of F Element genes. Comparative analysis of Drosophila F Elements could identify regulatory motifs and promoter architecture that are unique to the F Element and that facilitate expression in heterochromatic domains.