1 Computational and RNA Biology, Department of Biology, Faculty of Science, Københavns Universitet2 Bio-Informatics Group, BRIC Research Groups, BRIC, Københavns Universitet3 Centre for mRNP Biogenesis and Metabolism4 University of Bergen5 Cornell University6 Computational and RNA Biology, Department of Biology, Faculty of Science, Københavns Universitet7 University of Bergen
Mammalian genomes are pervasively transcribed, yielding a complex transcriptome with high variability in composition and cellular abundance. Although recent efforts have identified thousands of new long non-coding (lnc) RNAs and demonstrated a complex transcriptional repertoire produced by protein-coding (pc) genes, limited progress has been made in distinguishing functional RNA from spurious transcription events. This is partly due to present RNA classification, which is typically based on technical rather than biochemical criteria. Here we devise a strategy to systematically categorize human RNAs by their sensitivity to the ribonucleolytic RNA exosome complex and by the nature of their transcription initiation. These measures are surprisingly effective at correctly classifying annotated transcripts, including lncRNAs of known function. The approach also identifies uncharacterized stable lncRNAs, hidden among a vast majority of unstable transcripts. The predictive power of the approach promises to streamline the functional analysis of known and novel RNAs.