Deciphering the dark genome
Unravel the fundamental mechanisms governing gene expression, epigenetic reprogramming and developmental ontogenies of animals through RNA structure and single-molecule transcriptomics
Mammalian genomes are extensively transcribed into RNA. In humans, over 80% of the genome is transcribed while less than 2% codes for proteins. The resulting myriad of transcripts are largely composed of long noncoding RNAs (lncRNAs), processed transcripts that exhibit highly specific expression patterns during development and cellular differentiation. These most diverse transcriptional products display higher levels of alternative splicing than protein coding transcripts (mRNAs) and are recognized as important regulators of gene expression and epigenetic states.
Despite multitudes of lncRNAs having been identified in studies employing deep RNA sequencing, characterizing the specific biological functions of lncRNAs remains a daunting task involving costly laboratory experiments that are often limited in scope. There are no current solutions for the systematic functional annotation and classification of lncRNAs. We have previously shown that at least 13% (and likely much more) of the human genome harbours evolutionarily conserved RNA secondary structures, a unifying feature of all functional noncoding RNAs. The majority of these lie outside sequenceconstrained regions, suggesting that these higher-order structures undergo some degree of purifying selection.
We have also developed methods to cluster RNA structures into families, which can be used to identify structural motifs associated with specific biological functions (e.g. protein binding) and for genomewide homology screens. These experiments have shown that homologs to RNA structure families recur throughout the genome and are preferentially enriched within exonic sequences (versus exonintron boundaries).
Our research hypothesizes that these RNA structural motifs serve as modular functional elements (akin to protein domains) that can be selectively assembled into lncRNAs through alternative splicing and ultimately dictate the molecular functions of the host transcript.
This NSERC-funded project seeks to systematically characterize the function of lncRNAs through the genome-wide annotation and biochemical validation of RNA structural motifs.
We are testing this hypothesis in the context of epigenetic reprogramming during neuronal differentiation. Using our expertise in comparative genomics, machine learning, nanopore sequencing and molecular biology, our specific objectives are to:
Chart a genome-wide map of recurring structural RNA domains;
Resolve the transcriptomic and epigenomic landscape of cellular differentiation at singlemolecule resolution;
Substantiate the function of lncRNAs as regulators of gene expression;
Evaluate the prevalence of RNA modifications in lncRNAs and their potential impact on RNA structure formation.