Tsangaras, Kyriakos3; Wales, Nathan4; Sicheritz-Pontén, Thomas1; Rasmussen, Simon1; Michaux, Johan5; Ishida, Yasuko6; Morand, Serge7; Kampmann, Marie-Louise3; Gilbert, M. Thomas P.4; Greenwood, Alex D.3
1 Department of Systems Biology, Technical University of Denmark2 Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark3 Leibniz Institute for Zoo and Wildlife Research4 University of Copenhagen5 University of Liège6 University of Illinois7 Université Montpellier II
Solution hybridization capture methods utilize biotinylated oligonucleotides as baits to enrich homologous sequences from next generation sequencing (NGS) libraries. Coupled with NGS, the method generates kilo to gigabases of high confidence consensus targeted sequence. However, in many experiments, a non-negligible fraction of the resulting sequence reads are not homologous to the bait. We demonstrate that during capture, the bait-hybridized library molecules add additional flanking library sequences iteratively, such that baits limited to targeting relatively short regions (e.g. few hundred nucleotides) can result in enrichment across entire mitochondrial and bacterial genomes. Our findings suggest that some of the off-target sequences derived in capture experiments are non-randomly enriched, and that CapFlank will facilitate targeted enrichment of large contiguous sequences with minimal prior target sequence information.