Fast and Accurate Identification of Cross-Linked Peptides for the Structural Analysis of Large Protein Complexes and Elucidation of Interaction Networks. / Tahir, Salman; Bukowski-Wills, Jimi-Carlo; Rasmussen, Morten; Rappsilber, Juri
Fast and Accurate Identification of Cross-Linked Peptides for the structural analysis of large protein complexes and to elucidate interaction networks. Salman Tahir Jimi-Carlo Bukowski-Wills; Morten Rasmussen; Juri RappsilberWellcome Trust Centre for Cell Biology, Edinburgh , United Kingdom Novel Aspect: Our software efficiently and correctly identifies cross-links within large protein complexes, facilitating the construction of low-resolution 3D-models and interaction networks .Introduction Chemical cross-linking of peptides coupled with mass spectrometry emerges as a powerful method to investigate protein structure and protein-protein interactions. When applied to single proteins or small purified protein complexes, this methodology works well. However certain challenges arise when applied to more complex samples. One of the main problems is the combinatorial increase in the search space that occurs when all peptide-peptide combinations are considered in a database search. We have developed an algorithm that finds and validates cross-linked peptides in an efficient and scalable manner by adopting a number of principles both biological and computational. Methods We make use of a high accuracy library of over 1000 synthetic peptides to understand the fragmentation behaviour of cross-linked peptides. This allows us to pre-process spectra through de-isotoping, charge reduction and the removal of loss-of-water/ammonia peaks. Furthermore, using this information we are able to reduce the complexity of searching to essentially two successive searches of linear peptides as opposed to analyzing every possible combination of peptides that could potentially cross-link. We achieve further speedup using parallelization and data-structures that complement the nature of the data we search.Preliminary results The complexity of searching for cross-linked peptides arises from analyzing every possible combination of peptides that could potentially cross-link, with approximately the same mass as one of the unexplained observed masses. Very quickly, as we consider more proteins, the number of potential peptide-peptide-combinations becomes infeasible to compute. We utilize a high accuracy library of >1000 synthetic peptides to understand the fragmentation behaviour of cross-linked peptides. 92.4% of the most intense peaks from the annotated spectra of this library occur due to single fragmentation. Moreover, we note that 92.5% of the top peaks belong to one of the two peptides comprising a cross-linked pair. Using this information, we are able to reduce the complexity of the cross-link search to a linear search. The presence of a primary, more dominant, peptide that fragments better than the secondary peptide of a cross-linked pair, leads us to the observation that we can first search for the primary peptide without constraining the peptide mass. The second, less prominent peptide can then be found in an ordinary database search for a modified peptide using a simplified spectrum. We can simplify a spectrum because we remove all peaks that are accounted for by the fragmentation of peptide one. This approach is highly sensitive and scales well as revealed by searching our data of synthetic cross-links against a large sequence database. Currently, against a protein database of >1300 proteins a spectrum is searched in 0.35 seconds - a vast improvement when compared to the exhaustive search method of combining every potential cross-link for each spectrum(60 hours). In fact the search time is comparable, if not better, than existing linear search engines. Furthermore, we auto-validate the results obtained.