1 Department of Molecular Biology and Genetics, Science and Technology, Aarhus University2 Identifikation af ikke placeret, Central Administration, Aarhus University3 Institute for Systems Biology4 Department of Molecular Biology and Genetics - Protein science, Department of Molecular Biology and Genetics, Science and Technology, Aarhus University5 LaTrobe University6 Department of Molecular Biology and Genetics - Protein science, Department of Molecular Biology and Genetics, Science and Technology, Aarhus University
Novel Aspect Using cloud computing in conjunction with the TPP provides an expedient, cost effective, and scalable solution for MS/MS data analysis. Introduction The Trans-Proteomic Pipeline (TPP) is a mature and well regarded open source suite of tools for the analysis of large LC-MS/MS datasets. Over a thousand users have downloaded and installed the TPP as a local installation on their desktop. But limitations with local computational resources make it challenging to complete searches on the sheer number of spectra output, complex proteome sequences and potential sequence modifications in a timely manner. Compounding the problem is TPP’s iProphet which can significantly improve the confidence of identifications by combining the output of multiple search engines but then requires orders more searches to be performed. We show that cloud computing products like Amazon Web Services (AWS) provides a viable solution to meet this need. Methods Amazon Web Services provides an especially flexible platform for enabling cloud computing applications by providing virtualized servers that are capable of executing custom virtual machine images, have almost unlimited and secure file storage, and a queuing system that provides job control and communication across AWS. TPP has been enhanced to utilize these services in either a complete hosted form via the TPP Web Application (TWA) interface, via new functionality in the existing Petunia interface for TPP, or via a command line tool called amztpp for use in advanced high capacity requirements. All three implementations simplify the complexity of the communications, marshalling of data, and resource allocation inherent with cloud computing. Preliminary Data To evaluate the cloud capabilities of TPP two large LC-MS/MS datasets where analyzed on Amazon Web Services using the TPP command line tool amztpp. The first was a Canine dataset consisting of 982 runs from an LTQ Orbitrap organized in 35 groups. The second set is a Bovine dataset consisting of 1210 iTRAQ and non-iTRAQ runs from both a QSTAR XL and QSTAR Elite. Both datasets are processed by 1) converting to mzML with msconvert, 2) peptide identification using multiple search engines (X!Tandem, OMSSA, Myrimatch, and Inspect), then 3) using TPP’s PeptideProphet, iProphet, and ProteinProphet to improve the peptide and protein identifications by combining the results of the multiple searches. The storage requirements, network bandwidth usage, computation time and performance, and cost of services will be measured and reported. For comparison the identical analysis will be performed on a modest local compute cluster in addition to estimations for a single local desktop system. Lastly the benefits of running multiple search engines and combining the results will be assessed in order to determine merit of applying significantly more computational resources to improve the identification and validation of the analysis. We show that requiring only an AWS account, TPP users have the capability of utilizing the scalable and cost effective cloud computing resources available through Amazon Web Services as either a purely hosted solution (TWA), through their familiar Petunia web browser interface of TPP, or through the advanced and flexible command line program amztpp. We show that these resources can reduce the cost, time, and expediency of processing large data sets in comparison to utilizing local installations of TPP.
Main Research Area:
60th ASMS Conference on Mass Spectrometry and Allied Topics, 2012