Skip to main content

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • Protocol
  • Published:

Building high-quality assay libraries for targeted analysis of SWATH MS data

Abstract

Targeted proteomics by selected/multiple reaction monitoring (S/MRM) or, on a larger scale, by SWATH (sequential window acquisition of all theoretical spectra) MS (mass spectrometry) typically relies on spectral reference libraries for peptide identification. Quality and coverage of these libraries are therefore of crucial importance for the performance of the methods. Here we present a detailed protocol that has been successfully used to build high-quality, extensive reference libraries supporting targeted proteomics by SWATH MS. We describe each step of the process, including data acquisition by discovery proteomics, assertion of peptide-spectrum matches (PSMs), generation of consensus spectra and compilation of MS coordinates that uniquely define each targeted peptide. Crucial steps such as false discovery rate (FDR) control, retention time normalization and handling of post-translationally modified peptides are detailed. Finally, we show how to use the library to extract SWATH data with the open-source software Skyline. The protocol takes 2–3 d to complete, depending on the extent of the library and the computational resources available.

This is a preview of subscription content, access via your institution

Access options

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Figure 1: Workflow for SWATH assay library generation.
Figure 2: Splitting peptide identifications with distant elution times.

Similar content being viewed by others

References

  1. Domon, B. & Aebersold, R. Options and considerations when selecting a quantitative proteomics strategy. Nat. Biotechnol. 28, 710–721 (2010).

    Article  CAS  Google Scholar 

  2. Picotti, P. & Aebersold, R. Selected reaction monitoring-based proteomics: workflows, potential, pitfalls and future directions. Nat. Methods 9, 555–566 (2012).

    Article  CAS  Google Scholar 

  3. Gillet, L.C. et al. Targeted data extraction of the MS/MS spectra generated by data-independent acquisition: a new concept for consistent and accurate proteome analysis. Mol. Cell. Proteomics 11, O111.016717 (2012).

    Article  Google Scholar 

  4. Venable, J.D., Dong, M.-Q., Wohlschlegel, J., Dillin, A. & Yates, J.R. III. Automated approach for quantitative analysis of complex peptide mixtures from tandem mass spectra. Nat. Methods 1, 39–45 (2004).

    Article  CAS  Google Scholar 

  5. Chapman, J.D., Goodlett, D.R. & Masselon, C.D. Multiplexed and data-independent tandem mass spectrometry for global proteome profiling. Mass Spectrom. Rev. 33, 452–470 (2014).

    Article  CAS  Google Scholar 

  6. Röst, H.L. et al. OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat. Biotechnol. 32, 219–223 (2014).

    Article  Google Scholar 

  7. Bernhardt, O.M. et al. Spectronaut: a fast and efficient algorithm for MRM-like processing of data independent acquisition (SWATH-MS) data. F1000Posters Presented at the 60th American Society for Mass Spectrometry Conference, 20–24 May 2012 5, 1092 (2014).

    Google Scholar 

  8. MacLean, B. et al. Skyline: an open source document editor for creating and analyzing targeted proteomics experiments. Bioinformatics 26, 966–968 (2010).

    Article  CAS  Google Scholar 

  9. Reiter, L. et al. mProphet: automated data processing and statistical validation for large-scale SRM experiments. Nat. Methods 8, 430–435 (2011).

    Article  CAS  Google Scholar 

  10. Zi, J. et al. Expansion of the ion library for mining SWATH-MS data through fractionation proteomics. Anal. Chem. 86, 7242–7246 (2014).

    Article  CAS  Google Scholar 

  11. Lam, H. et al. Building consensus spectral libraries for peptide identification in proteomics. Nat. Methods 5, 873–875 (2008).

    Article  CAS  Google Scholar 

  12. Hughes, M.A., Silva, J.C., Geromanos, S.J. & Townsend, C.A. Quantitative proteomic analysis of drug-induced changes in mycobacteria. J. Proteome Res. 5, 54–63 (2006).

    Article  CAS  Google Scholar 

  13. Frewen, B.E., Merrihew, G.E., Wu, C.C., Noble, W.S. & MacCoss, M.J. Analysis of peptide MS/MS spectra from large-scale proteomics experiments using spectrum libraries. Anal. Chem. 78, 5678–5684 (2006).

    Article  CAS  Google Scholar 

  14. Picotti, P. et al. A database of mass spectrometric assays for the yeast proteome. Nat. Methods 5, 913–914 (2008).

    Article  CAS  Google Scholar 

  15. Prakash, A. et al. Expediting the development of targeted SRM assays: using data from shotgun proteomics to automate method development. J. Proteome Res. 8, 2733–2739 (2009).

    Article  CAS  Google Scholar 

  16. Picotti, P. et al. High-throughput generation of selected reaction-monitoring assays for proteins and proteomes. Nat. Methods 7, 43–46 (2010).

    Article  CAS  Google Scholar 

  17. Collins, B.C. et al. Quantifying protein interaction dynamics by SWATH mass spectrometry: application to the 14-3-3 system. Nat. Methods 10, 1246–1253 (2013).

    Article  CAS  Google Scholar 

  18. Picotti, P. et al. A complete mass-spectrometric map of the yeast proteome applied to quantitative trait analysis. Nature 494, 266–270 (2013).

    Article  CAS  Google Scholar 

  19. Schubert, O.T. et al. The Mtb Proteome library: a resource of assays to quantify the complete proteome of Mycobacterium tuberculosis. Cell Host Microbe 13, 602–612 (2013).

    Article  CAS  Google Scholar 

  20. Karlsson, C., Malmström, L., Aebersold, R. & Malmström, J.A. Proteome-wide selected reaction monitoring assays for the human pathogen Streptococcus pyogenes. Nat. Commun. 3, 1301 (2012).

    Article  Google Scholar 

  21. Hüttenhain, R. et al. N-Glycoprotein SRMAtlas: a resource of mass-spectrometric assays for N-glycosites enabling consistent and multiplexed protein quantification for clinical applications. Mol. Cell. Proteomics 12, 1005–1016 (2013).

    Article  Google Scholar 

  22. Hüttenhain, R. et al. Reproducible quantification of cancer-associated proteins in body fluids using targeted proteomics. Sci. Transl. Med. 4, 142ra94 (2012).

    Article  Google Scholar 

  23. Rosenberger, G. et al. A repository of assays to quantify 10,000 human proteins by SWATH-MS. Sci. Data 1, 140031 (2014).

    Article  CAS  Google Scholar 

  24. Deutsch, E.W. et al. A guided tour of the trans-proteomic pipeline. Proteomics 10, 1150–1159 (2010).

    Article  CAS  Google Scholar 

  25. Chambers, M.C. et al. A cross-platform toolkit for mass spectrometry and proteomics. Nat. Biotechnol. 30, 918–920 (2012).

    Article  CAS  Google Scholar 

  26. Sturm, M. et al. OpenMS: an open-source software framework for mass spectrometry. BMC Bioinformatics 9, 163 (2008).

    Article  Google Scholar 

  27. Cox, J. & Mann, M. MaxQuant enables high peptide identification rates, individualized p.p.b.-range mass accuracies and proteome-wide protein quantification. Nat. Biotechnol. 26, 1367–1372 (2008).

    Article  CAS  Google Scholar 

  28. Lam, H. et al. Development and validation of a spectral library searching method for peptide identification from MS/MS. Proteomics 7, 655–667 (2007).

    Article  CAS  Google Scholar 

  29. Lam, H. & Aebersold, R. Building and searching tandem mass (MS/MS) spectral libraries for peptide identification in proteomics. Methods 54, 424–431 (2011).

    Article  CAS  Google Scholar 

  30. Weisbrod, C.R., Eng, J.K., Hoopmann, M.R., Baker, T. & Bruce, J.E. Accurate peptide fragment mass analysis: multiplexed peptide identification and quantification. J. Proteome Res. 11, 1621–1632 (2012).

    Article  CAS  Google Scholar 

  31. Selevsek, N. et al. Reproducible and consistent quantification of the Saccharomyces cerevisiae proteome by SWATH-MS. Mol. Cell. Proteomics, http://dx.doi.org/10.1074/mcp.M113.035550 (2015).

  32. Heller, M. et al. Added value for tandem mass spectrometry shotgun proteomics data validation through isoelectric focusing of peptides. J. Proteome Res. 4, 2273–2282 (2005).

    Article  CAS  Google Scholar 

  33. Stergachis, A.B., MacLean, B., Lee, K., Stamatoyannopoulos, J.A. & MacCoss, M.J. Rapid empirical discovery of optimal peptides for targeted proteomics. Nat. Methods 8, 1041–1043 (2011).

    Article  CAS  Google Scholar 

  34. Qeli, E. et al. Improved prediction of peptide detectability for targeted proteomics using a rank-based algorithm and organism-specific data. Proteomics 108, 269–283 (2014).

    Article  CAS  Google Scholar 

  35. Mallick, P. et al. Computational prediction of proteotypic peptides for quantitative proteomics. Nat. Biotechnol. 25, 125–131 (2006).

    Article  Google Scholar 

  36. Eyers, C.E. et al. CONSeQuence: prediction of reference peptides for absolute quantitative proteomics using consensus machine learning approaches. Mol. Cell. Proteomics 10, M110.003384 (2011).

    Article  Google Scholar 

  37. Fusaro, V.A., Mani, D.R., Mesirov, J.P. & Carr, S.A. Prediction of high-responding peptides for targeted protein assays by mass spectrometry. Nat. Biotechnol. 27, 190–198 (2009).

    Article  CAS  Google Scholar 

  38. Tang, H. et al. A computational approach toward label-free protein quantification using predicted peptide detectability. Bioinformatics 22, e481–e488 (2006).

    Article  CAS  Google Scholar 

  39. Webb-Robertson, B.-J.M. et al. A support vector machine model for the prediction of proteotypic peptides for accurate mass and time proteomics. Bioinformatics 24, 1503–1509 (2008).

    Article  CAS  Google Scholar 

  40. Li, S., Arnold, R.J., Tang, H. & Radivojac, P. On the accuracy and limits of peptide fragmentation spectrum prediction. Anal. Chem. 83, 790–796 (2011).

    Article  CAS  Google Scholar 

  41. Escher, C. et al. Using iRT, a normalized retention time for more targeted measurement of peptides. Proteomics 12, 1111–1121 (2012).

    Article  CAS  Google Scholar 

  42. Toprak, U.H. et al. Conserved peptide fragmentation as a benchmarking tool for mass spectrometers and a discriminating feature for targeted proteomics. Mol. Cell. Proteomics 13, 2056–2071 (2014).

    Article  CAS  Google Scholar 

  43. de Graaf, E.L., Altelaar, A.F.M., van Breukelen, B., Mohammed, S. & Heck, A.J.R. Improving SRM assay development: a global comparison between triple quadrupole, ion trap, and higher energy CID peptide fragmentation spectra. J. Proteome Res. 10, 4334–4341 (2011).

    Article  CAS  Google Scholar 

  44. Deutsch, E. mzML: a single, unifying data format for mass spectrometer output. Proteomics 8, 2776–2777 (2008).

    Article  CAS  Google Scholar 

  45. Keller, A., Eng, J., Zhang, N., Li, X.-J. & Aebersold, R. A uniform proteomics MS/MS analysis platform utilizing open XML file formats. Mol. Syst. Biol. 1, 2005.0017 (2005).

    Article  Google Scholar 

  46. Elias, J.E. & Gygi, S.P. Target-decoy search strategy for increased confidence in large-scale protein identifications by mass spectrometry. Nat. Methods 4, 207–214 (2007).

    Article  CAS  Google Scholar 

  47. Shteynberg, D. et al. iProphet: multi-level integrative analysis of shotgun proteomic data improves peptide and protein identification rates and error estimates. Mol. Cell. Proteomics 10, M111.007690 (2011).

    Article  Google Scholar 

  48. Shteynberg, D., Nesvizhskii, A.I., Moritz, R.L. & Deutsch, E.W. Combining results of multiple search engines in proteomics. Mol. Cell. Proteomics 12, 2383–2393 (2013).

    Article  CAS  Google Scholar 

  49. Picotti, P., Aebersold, R. & Domon, B. The implications of proteolytic background for shotgun proteomics. Mol. Cell. Proteomics 6, 1589–1598 (2007).

    Article  CAS  Google Scholar 

  50. Walmsley, S.J. et al. Comprehensive analysis of protein digestion using six trypsins reveals the origin of trypsin as a significant source of variability in proteomics. J. Proteome Res. 12, 5666–5680 (2013).

    Article  CAS  Google Scholar 

  51. Kim, J.-S., Monroe, M.E., Camp, D.G., Smith, R.D. & Qian, W.-J. In-source fragmentation and the sources of partially tryptic peptides in shotgun proteomics. J. Proteome Res. 12, 910–916 (2013).

    Article  CAS  Google Scholar 

  52. Eng, J.K., Searle, B.C., Clauser, K.R. & Tabb, D.L. A face in the crowd: recognizing peptides through database search. Mol. Cell. Proteomics 10, R111.009522 (2011).

    Article  Google Scholar 

  53. Keller, A., Nesvizhskii, A.I., Kolker, E. & Aebersold, R. Empirical statistical model to estimate the accuracy of peptide identifications made by MS/MS and database search. Anal. Chem. 74, 5383–5392 (2002).

    Article  CAS  Google Scholar 

  54. Reiter, L. et al. Protein identification false discovery rates for very large proteomics data sets generated by tandem mass spectrometry. Mol. Cell. Proteomics 8, 2405–2417 (2009).

    Article  CAS  Google Scholar 

  55. Liu, J. et al. Methods for peptide identification by spectral comparison. Proteome Sci. 5, 3 (2007).

    Article  Google Scholar 

  56. Röst, H.L., Malmström, L. & Aebersold, R. A computational tool to detect and avoid redundancy in selected reaction monitoring. Mol. Cell. Proteomics 11, 540–549 (2012).

    Article  Google Scholar 

  57. Deutsch, E.W. et al. TraML--a standard format for exchange of selected reaction monitoring transition lists. Mol. Cell. Proteomics 11, R111.015040 (2012).

    Article  Google Scholar 

  58. Vizcaíno, J.A. et al. ProteomeXchange provides globally coordinated proteomics data submission and dissemination. Nat. Biotechnol. 32, 223–226 (2014).

    Article  Google Scholar 

  59. Picotti, P., Bodenmiller, B., Mueller, L.N., Domon, B. & Aebersold, R. Full dynamic range proteome analysis of S. cerevisiae by targeted proteomics. Cell 138, 795–806 (2009).

    Article  CAS  Google Scholar 

  60. Nesvizhskii, A.I., Keller, A., Kolker, E. & Aebersold, R. A statistical model for identifying proteins by tandem mass spectrometry. Anal. Chem. 75, 4646–4658 (2003).

    Article  CAS  Google Scholar 

Download references

Acknowledgements

We thank C. Ludwig and S. Bader for their discussions and feedback on the manuscript, L. Blum for implementation of iProphet support in MAYU, H. Röst for packaging msproteomicstools, J. Slagel for including the qtofpeakpicker and the new MAYU version in the TPP, and the PRIDE Team for maintaining the ProteomeXchange platform. This work has been financially supported by the Framework Programme 7 of the European Commission through SysteMTb (241587), UNICELLSYS (201142), PRIME-XS (262067) and ProteomeXchange (260558), a European Research Council advanced grant Proteomics v3.0 (233226), the Federal Ministry of Education and Research (e:Bio Express2Present, 0316179C) and the Forschungszentrum Immunologie of the University Medical Center Mainz.

Author information

Authors and Affiliations

Authors

Contributions

O.T.S., L.C.G. and B.C.C. developed the workflow and wrote the manuscript; P.N. developed the tools spectrast2tsv.py and spectrast_cluster.py; G.R. and H.L. developed and implemented the retention time normalization and iRT calibration in SpectraST; W.E.W. developed the qtofpeakpicker; D.A., P.M. and B.M. implemented automated SWATH library import into Skyline; and R.A. directed the project and contributed to writing the manuscript.

Corresponding author

Correspondence to Ruedi Aebersold.

Ethics declarations

Competing interests

R.A. holds shares of Biognosys AG, which operates in the field covered by the article (products are Spectronaut software and iRT-kit).

Integrated supplementary information

Supplementary Figure 1 Comparison of converters.

(a) Violin plots show the intrinsic fragment ion spectrum variability of three yeast sample injections converted with three DDA centroiding algorithms.

(b) Violin plots show how well the relative intensities of the six most intense fragment ions obtained by a certain centroiding algorithm compare to the relative intensities of the same six fragments in SWATH MS data.

The spectrum similarity was determined using the normalised spectral contrast angle as described by Toprak and co-workers41. N indicates the number of pairwise comparisons of fragment ion spectra for the same peptide precursors between each data file and S indicates the estimated level of dissimilarity obtained by comparing the experimental score distribution to the score distribution of a perturbation benchmark dataset (the lower S, the more similar the spectra).

(c) Depending on the centroiding algorithm implemented in the different peak pickers, the number of peptide and protein identifications from a database search vary slightly (after filtering for protein-level FDR=1%). The size of the circles is proportional to the number of identifications.

Supplementary Figure 2 MAYU-estimated FDRs with respect to iProphet probability thresholds.

The software MAYU estimates the FDRs on PSM (mFDR), peptide (pepFDR), and protein (protFDR) level with respect to the applied iProphet probability threshold. The data is based on the case study described in the protocol, which consists of three whole cell lysates of yeast. As comparison the protein FDR of a very large data set of 331 runs of different human cells and tissues is shown (Pan Human Library as described by Rosenberger and colleagues23).

Supplementary Figure 3 Comparison of consensus and best replicate spectral libraries.

a) Violin plots show the intrinsic fragment ion spectrum variability of three injections of a yeast sample converted with the qtofpeakpicker and summarised using either the best replicate fragment ion spectrum or consensus algorithm implemented in SpectraST11.

(b) Violin plots show how well the relative intensities of the six most intense fragment ions compare to the relative intensities of the same six fragments in SWATH MS data.

The spectrum similarity was determined using the normalised spectral contrast angle as described by Toprak and co-workers41. N indicates the number of pairwise comparisons of fragment ion spectra for the same peptide precursors between each data file and S indicates the estimated level of dissimilarity obtained by comparing the experimental score distribution to the score distribution of a perturbation benchmark dataset (the lower S, the more similar the spectra). Common neutral losses were included in the comparison.

Supplementary information

Supplementary Text and Figures

Supplementary Figures 1–3, Supplementary Notes 1–5, Supplementary Tables 1–6 and Supplementary Tutorial (PDF 6958 kb)

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Schubert, O., Gillet, L., Collins, B. et al. Building high-quality assay libraries for targeted analysis of SWATH MS data. Nat Protoc 10, 426–441 (2015). https://doi.org/10.1038/nprot.2015.015

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1038/nprot.2015.015

This article is cited by

Comments

By submitting a comment you agree to abide by our Terms and Community Guidelines. If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Search

Quick links

Nature Briefing: Translational Research

Sign up for the Nature Briefing: Translational Research newsletter — top stories in biotechnology, drug discovery and pharma.

Get what matters in translational research, free to your inbox weekly. Sign up for Nature Briefing: Translational Research