Hutchins, Andrew Paul2; Jauch, Ralf3; Dyla, Mateusz6; Miranda-Saavedra, Diego5
1 Department of Molecular Biology and Genetics - Structural Biology, Department of Molecular Biology and Genetics, Science and Technology, Aarhus University2 Key Laboratory of Regenerative Biology, South China Institute for Stem Cell Biology and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences3 Genome Regulation Laboratory, South China Institute for Stem Cell Biology and Regenerative Medicine, Guangzhou Institutes of Biomedicine and Health, Chinese Academy of Sciences4 Interdisciplinary Nanoscience Center - INANO-MBG, Gustav Wied 10, Interdisciplinary Nanoscience Center, Science and Technology, Aarhus University5 Fibrosis Laboratories, Institute of Cellular Medicine, Newcastle University Medical School6 Interdisciplinary Nanoscience Center - INANO-MBG, Gustav Wied 10, Interdisciplinary Nanoscience Center, Science and Technology, Aarhus University
a framework for combining, analyzing and displaying heterogeneous genomic and high-throughput sequencing data
Genomic datasets and the tools to analyze them have proliferated at an astonishing rate. However, such tools are often poorly integrated with each other: each program typically produces its own custom output in a variety of non-standard file formats. Here we present glbase, a framework that uses a flexible set of descriptors that can quickly parse non-binary data files. glbase includes many functions to intersect two lists of data, including operations on genomic interval data and support for the efficient random access to huge genomic data files. Many glbase functions can produce graphical outputs, including scatter plots, heatmaps, boxplots and other common analytical displays of high-throughput data such as RNA-seq, ChIP-seq and microarray expression data. glbase is designed to rapidly bring biological data into a Python-based analytical environment to facilitate analysis and data processing. In summary, glbase is a flexible and multifunctional toolkit that allows the combination and analysis of high-throughput data (especially next-generation sequencing and genome-wide data), and which has been instrumental in the analysis of complex data sets. glbase is freely available at http://bitbucket.org/oaxiom/glbase/.