82T Poster - Evolutionary Genetics
Thursday June 09, 8:30 PM - 9:15 PM

Using the Eucalyptus polybractea genome improved genetic variant identification compared to using a pseudo-reference


Authors:
Swapan Chakrabarty 1; Teng Li 2; David Kainer 3; William J. Foley Foley 4; Allen Rodrigo 2; Carsten Külheim 1

Affiliations:
1) College of Forest Resources and Environmental Science, Michigan Technological University, Houghton, Michigan 49931, USA; 2) School of Biological Sciences, The University of Auckland, Auckland 1142, New Zealand; 3) Center for BioEnergy Innovation, Bioscience Division, Oak Ridge National Laboratories, Oak Ridge, TN 37831, USA; 4) Research School of Biology, The Australian National University, Canberra 2600, Australia

Keywords:
Comparative genomics & genome evolution

Eucalyptus polybractea, commonly known as blue mallee is cultivated in South-Eastern Australia to produce Eucalyptus oil. Eucalyptus polybractea oil is valued for its flavor and medicinal properties. These essential oils are dominated by monoterpenes such as 1,8-cineole, and also contain some sesquiterpenes, and other volatile compounds. Using genetic markers based on genome-wide association studies may greatly enhance breeding programs aimed at improving oil yield and quality. A high-quality reference genome sequence can enable us to study the genetic architecture and identify candidate genes related to high-quality foliar essential oils. In this study, we utilized the hybrid assembly of the E. polybractea genome from both short- and long-read technology. We generated 44 Gb of Illumina HiSeq short reads and 8 Gb of Nanopore long reads, representing approximately 83× and 15× genome coverage, respectively. After polishing, the hybrid-assembled genome contained 24,864 scaffolds with an accumulated length of 523 Mb (N50 = 40.3 kb; BUSCO-calculated genome completeness of 94.3%). The genome contained 35,385 predicted protein-coding genes detected by combining homology-based and de-novo approaches. We tested if the hybrid assembled genome improved on the detection of high-quality genetic variants, compared to a published study, where we used an in-vitro generated pseudo genome reference based on the E. grandis genome edited with fixed alleles from E. polybractea. For this comparison we mapped cleaned reads from 480 E. polybractea samples to three reference sequences: 1. E. grandis, 2. E. polybractea pseudo reference, and 3. the hybrid assembled E. polybractea genome reference. Variants were identified from all three approaches and GWAS performed and compared. Thus, the high-quality genome of E. polybractea facilitated better mapping and identification of genetic variants that allowed us to identify candidate genes related to terpene production in E. polybractea.