1 Department of Systems Biology, Technical University of Denmark2 Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark
This thesis presents the work carried out at the Center for Biological Sequence Analysis, Technical University of Denmark. The thesis includes four articles describing large-scale data integration and methods for the prediction of drug side-effects. Chapter 2 presents ChemProt, a novel disease chemical biology database. ChemProt integrates different chemical-protein annotation resources for disease-associated proteins and protein-protein interaction data. ChemProt is developed to assist in silico evaluation of environmental chemicals, natural products and approved drugs, as well as to aid the selection of new compounds based on their activity profiles against biological targets. The latest update of ChemProt database includes a new visual interface, which enables easy navigation through the pharmacological space. Additionally, new search methods for chemical, protein, disease and side-effect data have been implemented. Chapter 3 presents two articles that showcase the application of systems chemical biology approaches to understand and model drug side-effect data. The first approach applies machine learning methods to cluster side-effects, drugs, proteins and clinical outcomes in networks. This work demonstrates the power of a strategy that uses clinical data mining in association with chemical biology in order to reduce the search space and aid identification of novel drug actions. The second article described in chapter 3 outlines a high confidence side-effect-drug interaction dataset. We estimated based on the placebo-controlled studies from DailyMed that only approximately 20% of the drug-side-effect associations are significant. With the ChemProt database we linked drugs with their biological targets and applied a scoring function in order to capture frequently encountered side-effect-protein associations. We then built a computational chemical biology model, which revealed side-effect predictive capabilities for 55% of the 133 drugs in the SIDER database. Further validation was performed on withdrawn drugs stored in DrugBank and many side-effects were confirmed through literature search. This work demonstrates the importance of using high-confidence drug-side-effect data in deciphering the effect of small molecules in humans. In summary, this thesis presents computational systems chemical biology approaches that can help identify clinical effects of small molecules through large-scale data integration. These approaches also serve to pave the way into a variety of chemogenomics, polypharmacology and systems chemical biology studies.