Tsangaras, Kyriakos3; Wales, Nathan4; Sicheritz-Pontén, Thomas5; Rasmussen, Simon5; Michaux, Johan9; Ishida, Yasuko7; Morand, Serge8; Kampmann, Marie-Louise3; Gilbert, M. Thomas P.4; Greenwood, Alex D.3
1 Department of Systems Biology, Technical University of Denmark2 Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark3 Leibniz Institute for Zoo and Wildlife Research4 University of Copenhagen5 Department of Bio and Health Informatics, Technical University of Denmark6 University of Liege7 University of Illinois8 Université Montpellier II9 University of Liege
Solution hybridization capture methods utilize biotinylated oligonucleotides as baits to enrich homologous sequences from next generation sequencing (NGS) libraries. Coupled with NGS, the method generates kilo to gigabases of high confidence consensus targeted sequence. However, in many experiments, a non-negligible fraction of the resulting sequence reads are not homologous to the bait. We demonstrate that during capture, the bait-hybridized library molecules add additional flanking library sequences iteratively, such that baits limited to targeting relatively short regions (e.g. few hundred nucleotides) can result in enrichment across entire mitochondrial and bacterial genomes. Our findings suggest that some of the off-target sequences derived in capture experiments are non-randomly enriched, and that CapFlank will facilitate targeted enrichment of large contiguous sequences with minimal prior target sequence information.