1 Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark2 Department of Systems Biology, Technical University of Denmark3 Department of Bio and Health Informatics, Technical University of Denmark
Last decade saw an explosion in DNA sequencing and the draft version of the human genome. Now, proteomics is experiencing the same growth. With proteins being the functional elements of living cells, high-throughput proteomics promises more understanding of cellular functions and the interactions between molecules, the essence of systems biology. Internet technologies are very important in this respect as bioinformatics labs around the world generate staggering amounts of novel annotations, increasing the importance of on-line processing and distributed systems. One of the most important new data types in proteomics is protein-protein interactions. Interactions between the functional elements in the cell are a natural place to start when integrating protein annotations with the aim of gaining a systems view of the cell. Interaction data, however, are notoriously biased, erroneous and incomplete. They also necessitate new ways of data preparation as established methods for sequence sets are often useless when dealing with sets of sequence pairs. Therefore careful analysis on the sequence level as well as the integrated network level is needed to benchmark these data prior to use. The networks, which emerge when interaction data are integrated, form a skeleton to which we can attach other annotation types. Then, using graph theoretical methods, we can identify network structures and infer annotations across the links of physical interactions, thus defining novel functional modules, or in the case of dysfunction: disease modules and genes.