Edwards, Alistair4; Edwards, Gregory2; Larsen, Martin Røssel4; Cordwell, Stuart3
1 Department of Biochemistry and Molecular Biology, Faculty of Science, SDU2 Port Jackson Bioinformatics3 University of Sydney4 Department of Biochemistry and Molecular Biology, Faculty of Science, SDU
Background: Extracting biological meaning from proteomic datasets containing post-translational modification is a central challenge of large scale proteomics and systems biology. We report the generation of a new program (Report Sites) to precisely identify the location and local chemical environment of a particular amino acid residue, or group of residues, within large-scale proteomic data sets, using peptide sequences characterized by mass spectrometry combined with protein sequence databases. The program is ideally suited to distilling regional and spatial information from post-translational modification data sets, wherein patterns of sequence surrounding processed sites may reveal more about the functional and structural requirements of the modification and the biochemical processes that regulate them. Results: We developed Report Sites using a test set of phosphoproteomic data from rat myocardium that contains approximately eleven thousand unique phosphorylation sites. These sites were used to identify patterns associated with site location (spatial sequence information) within the context of the complete protein sequence, and two selected aspects of the immediate physico-chemical environment (local pI and hydrophobicity). These were then also compared to corresponding values extracted from the full database to allow comparison of phosphorylation trends. Conclusions: Report Sites enabled physico-chemical aspects of protein phosphorylation to be deciphered in a test set of eleven thousand phospho sites. Basic properties of modified proteins, such as site location in the context of the complete protein, were also documented. This program can be easily adapted to any post-translational modification (or, indeed, to any defined amino acid sequence), or expanded to include more descriptive factors (such as modification of binding domains or protein structure).This makes it a versatile tool with the potential to aid in revealing new aspects of post-translational modification distribution. The code is freely available from the authors upon request and is accessible online.
Journal of Proteomics and Bioinformatics, 2012, Vol 5, Issue 4
Protein modification; Software; Phosphorylation; Distribution