Alkhnbashi, Omer S.4; Costa, Fabrizio4; Shah, Shiraz Ali5; Garrett, Roger Antony6; Saunders, Sita J.4; Backofen, Rolf4
1 Functional Genomics, Department of Biology, Faculty of Science, Københavns Universitet2 University of Freiburg3 Department of Biology, Faculty of Science, Københavns Universitet4 University of Freiburg5 Department of Biology, Faculty of Science, Københavns Universitet6 Functional Genomics, Department of Biology, Faculty of Science, Københavns Universitet
Motivation: The discovery of CRISPR-Cas systems almost 20 years ago rapidly changed our perception of the bacterial and archaeal immune systems. CRISPR loci consist of several repetitive DNA sequences called repeats, inter-spaced by stretches of variable length sequences called spacers. This CRISPR array is transcribed and processed into multiple mature RNA species (crRNAs). A single crRNA is integrated into an interference complex, together with CRISPR-associated (Cas) proteins, to bind and degrade invading nucleic acids. Although existing bioinformatics tools can recognize CRISPR loci by their characteristic repeat-spacer architecture, they generally output CRISPR arrays of ambiguous orientation and thus do not determine the strand from which crRNAs are processed. Knowledge of the correct orientation is crucial for many tasks, including the classification of CRISPR conservation, the detection of leader regions, the identification of target sites (protospacers) on invading genetic elements and the characterization of protospacer-adjacent motifs.