1 Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark2 Department of Systems Biology, Technical University of Denmark3 Department of Bio and Health Informatics, Technical University of Denmark
Motivation: Proteins recognizing short peptide fragments play a central role in cellular signaling. As a result of high-throughput technologies, peptide-binding protein specificities can be studied using large peptide libraries at dramatically lower cost and time. Interpretation of such large peptide datasets, however, is a complex task, especially when the data contain multiple receptor binding motifs, and/or the motifs are found at different locations within distinct peptides.Results: The algorithm presented in this article, based on Gibbs sampling, identifies multiple specificities in peptide data by performing two essential tasks simultaneously: alignment and clustering of peptide data. We apply the method to de-convolute binding motifs in a panel of peptide datasets with different degrees of complexity spanning from the simplest case of pre-aligned fixed-length peptides to cases of unaligned peptide datasets of variable length. Example applications described in this article include mixtures of binders to different MHC class I and class II alleles, distinct classes of ligands for SH3 domains and sub-specificities of the HLA-A*02:01 molecule.Availability: The Gibbs clustering method is available online as a web server at http://www.cbs.dtu.dk/services/GibbsCluster.Contact: firstname.lastname@example.orgSupplementary information: Supplementary data are available at Bioinformatics online.