Abstract
Mass spectrometry is used routinely for large-scale protein identification from complex biological mixtures. Recently, relative quantitation approach on the basis of spectra count has been applied in several cancer proteomic studies. In this review, we examine the mechanism of this technique and highlight several important parameters associated with its application.
Introduction
Today, liquid chromatography-coupled tandem mass spectrometry (LC-MS/MS) is used routinely for large-scale protein identification and global profiling of post-translational modifications (PTMs) from complex biological mixtures (1-4). To quantitatively characterize and compare two or more proteomes by MS, a variety of methods involving the incorporation of stable isotope labels have been developed (5). Because the stable isotope-labeled peptides possess similar physical and chemical properties as their unlabeled equivalents but with a different mass that can be recognized by a mass spectrometer, quantification is achieved by comparing their respective signal intensities. The labels can be introduced into samples by methods such as isotope coded affinity tag (ICAT) (6), isotope tags for relative and absolute quantitation (iTRAQ) (7), and stable isotope labeling by amino acids in cell culture (SILAC) (8). However, in many cases, clinical samples and most animal based samples may not be suitable for metabolic labeling. Moreover, chemical isotopic labeling of many samples can be extremely expensive and time prohibitive (9). As a result, alternative label-free quantitation approaches, either by measuring and comparing the MS signal intensity of peptide precursor ions or by counting and comparing the number of matched MS2 spectra of a given protein, have gained increasing popularity over the past several years. Investigators have validated this approach as a viable method for MS-based differential display and shown excellent correlation between spectral counts and relative quantitation (10-12). The pros and cons of these methods have been reported in previous publications (9, 13, 14). Here, we attempt to discuss the features of the commonly used spectra count label-free quantitation technique based on the authors' experience using ion trap mass spectrometers, focusing on both theoretical and practical aspects of the technique.
CID versus ETD
The shotgun MS strategy consists of enzymatic digestion of proteins by trypsin to yield peptides, which are analyzed by mass spectrometer either directly or after enrichment using a data-dependent MS/MS method (15, 16). Collision-induced dissociation (CID) and Electron-transfer dissociation (ETD) are the most widely used techniques to fragment peptide ions in a mass spectrometer and have proven to be extremely useful for amino acid sequence assignment and for PTMs studies. Notably, low-energy CID of peptides with labile PTMs such as phosphorylation and glycosylation typically results in the predominant neutral loss with inadequate fragmentation of the peptide backbone, whereby only limited sequence information is obtained, and large peptides (>2500 Da) are difficult to fragment by CID. On the other hand, ETD has the advantage of enabling efficient fragmentation of peptides with labile PTMs and larger peptides with charge states of 3+ or higher (17, 18). However, ETD can yield poorer fragmentation efficiency for smaller peptides with lower (1+, 2+) charge states (19), and it takes longer time to fragment peptides and acquire an MS2 spectrum by ETD. If the same complex biological mixture is analyzed by either CID or ETD, more CID spectra than ETD spectra can be collected within the same amount of time for peptide identification. As a result, CID is still the preferred method for general proteomic study with the goal to identify unique peptides, as many as possible and as quickly as possible, whereas ETD is mostly used as a complementary technique for characterization of multiple charged peptides or peptides with labile PTMs that could be missed by CID. The recent development of new mass spectrometers, such as LTQ-Orbitrap Velos which provides even faster CID scan (20, 21), further consolidates CID as a major player in general proteomic research.
Trypsin is commonly used for LC-MS/MS analysis, because tryptic peptides can be positively charged at both N and C-terminus, and most abundant fragmented ions are b and y ions that are well-suited to database search for peptide matching (22). In contrast, the C-terminus of non-tryptic peptides generated by enzyme, such as Glu-C, are not positively charged and the obtained CID spectra are not favored for database search algorithm such as Sequest, due to lack of abundant y series ions. Furthermore, some peptides generated by enzyme, such as Lys-C, are likely longer than tryptic peptides, and the long peptide does not fragment well by CID. Nonetheless, proteolytic digests by trypsin together with other proteases will help to increase the proteome coverage (23-25).
MS2 Spectra Count
The mass spectrometer, such as ion trap from ThermoFisher, is set to perform a full-scan (survey scan) to determine the m/z values and intensities of ionized peptides, and subsequently perform several MS/MS scans on the most intense ions in the full-scan spectrum with “Dynamic Exclusion” enabled. The Dynamic Exclusion function contains several important parameters, such as Repeat Count, Exclusion Mass Width, Exclusion List Size, and Exclusion Duration. Dynamic Exclusion temporarily puts a mass into an exclusion list after its MS2 spectrum is acquired, providing the opportunity to collect MS2 information on less intense ions in the next cycle. The obtained MS2 spectra are then processed by computer algorithms to deduce peptide sequence. It is noteworthy that (a) although thousands of MS2 spectra can be collected during a typical LC-MS/MS experiment, this number is small in relation to the number of peptides generated by tryptic digestion of a large proteome from human. The signal intensities of the tryptic peptides from an abundant protein are more likely to be above the threshold to trigger CID events; (b) by electrospray ionization (ESI), one tryptic peptide presents multiple charged-forms in the sample solution, carrying one or more protons. If their signal intensities are above the threshold and their m/z values are within full scan mass range, they all can trigger CID events. Consequently, for an abundant peptide, multiple MS2 spectra corresponding to different charged-forms can be obtained; (c) many proteins have PTMs, and the tryptic peptides are subject to variable modifications. In addition, if the peptide contains methionine (Met), variable methionine oxidation during sample preparation and MS analysis is not unusual; (d) abundant peptides have bigger peak width on liquid chromatogram. If its peak width is bigger than dynamic exclusion duration time, the peptide can be selected again for CID after its exclusion time expires, and more than one MS2 spectra from the same peptide can be acquired. Hence, the total MS2 spectra that matched to a protein is a combination of spectra from different partial tryptic peptides and full tryptic peptides, spectra from the same peptide with different charges, spectra of the same peptide with variable modifications, and repeated spectra from the same peptide due to expired dynamic exclusion (Figure 1).
Spectra Count Label-free Relative Quantitation
Based on the empirical observation that proteins existing in higher concentration are associated with a larger number of MS2 spectra for peptides, relative quantitation of the identified proteins can be achieved (5, 9). In general, the abundant proteins have more than 10 spectra counts, the medium-abundance proteins from 2 to 10, and low-abundance proteins less than 2. Although there is an approximate correlation between the number of spectra per protein and the amount of protein present in the mixture, such a correlation should be viewed in the context of the following sources of error: (a) if a protein was not identified (zero count), it still could be in the sample with low abundance; (b) a great deal of caution should be exercised in assessing the absolute quantity of two different proteins, because for the same number of molecules, larger protein can give rise to more tryptic peptides for CID; (c) the MS signal for any given peptide is determined by many factors, such as ionizability in electrospray and fragmentation efficiency, that have effect on the number of spectra matched with a protein. Frequently, the number of MS2 spectra contributing to the identification of a protein can be used as an indication of that protein's abundance in the samples from one dimensional or two dimensional electrophoresis gel pieces in which proteins have similar molecular weight (26). In order to quantify the proteins that have different molecular weight from complex biological mixture such as cell lysate, some researchers proposed methods using normalized spectra count that takes into account the length of the protein (27).
For LC-MS/MS analysis, the mass spectrometer is configured to sequentially select and fragment ions from high-intensity to low-intensity with dynamic exclusion; however, in very complex peptide mixtures the number of ions co-eluting can significantly exceed the number of ions for which tandem mass spectra can be acquired. As a result, the high-abundance and medium-abundance proteins will be repeatedly identified from multiple iterations, whereas some low-abundance proteins cannot. The newly identified proteins from succeeding iterations are mainly from relative low-abundance proteins, and the total number of identified proteins will be increased somewhat with repeated iterations (28). Since the obtained MS2 spectra count of each protein could be different through multiple iterations, measurement variability can be reduced by using the average spectra count. Based on the observation that the degree of spectra count variation from one iteration to another is relatively higher for low-abundance proteins, a great deal of caution should be exercised in assessing the relative quantity of a protein that is identified as low-abundance in both groups (28). Nevertheless, reliable relative quantitation can be obtained if the protein is identified as low-abundance in one group but high-abundance in another group, or if the protein is identified as high-abundance in both groups but the difference of spectra count is dramatic. For the proteins that are identified as low-abundance in both groups and are possibly differentially expressed, it is highly recommended to further verify the difference by targeted quantitative MS analysis after enrichment or by more sensitive technique such as Western blot.
The Factors Affecting MS2 Spectra Count
Many LC-MS/MS instrument parameters, such as “ion injection time”, “automatic gain control”, “micro scan” and “dynamic exclusion”, have a significant impact on the number of identified proteins and their spectra counts, and some of them have been described in previous publications (29-34).
Microscans. The parameter displays the number of microscans per scan. Each microscan is one mass analysis, containing the steps of ion injection, ion storage and ion detection. Microscans are summed, to produce one scan, to improve the signal-to-noise ratio of the mass spectral data. The number of microscans per scan is an important factor in determining the overall scan time. Apparently, increasing the number of microscans results longer scan time, and the instrument will acquire less MS2 spectra for peptide identification. For proteomic analysis, the number of “Microscans” is generally set to be “1” in order to increase proteome coverage (Table I), which has been confirmed by study from Kalli et al. (32).
Repeat count and repeat duration. When Dynamic Exclusion function is enabled, the mass of the ion that has been chosen for data dependent fragmentation is put on a “pre-exclusion list”. The Repeat Count displays the number of times (counts) that a mass may be selected as a data dependent mass before it goes to the dynamic exclusion list; the Repeat Duration displays the amount of time an ion stays on the “pre-exclusion list”. If an ion triggers a data dependent scan the number of times specified by Repeat Count within the repeat duration time, then it is removed from the pre-exclusion list and is added to the dynamic exclusion list. Obviously, if the Repeat Count is bigger than “1” and if the peptide's chromatographic peak width is bigger than MS analysis cycle time (the cycle time is ~3 s for commonly used instrument method such as “Top 8” by LTQ-Orbitrap), it is very likely that the instrument will perform a repeated MS2 scan of the same precursor ion before putting the mass on the dynamic exclusion list, favoring analysis of the abundant ions in the sample; however, the less abundant ions may lose the chance to be picked for fragmentation. On the contrary, if the Repeat Count is set to be “1”, the mass of an ion that has triggered a data dependent scan will be directly added to the dynamic exclusion list, allowing instrument to analyze less abundant ions and increasing the proteome coverage.
Exclusion mass width. This parameter displays the window for determining whether the mass of an ion matches a mass on the dynamic exclusion list. An ion will not trigger a dependent scan if its mass is within the window. For ion trap mass spectrometer LTQ-XL that has low mass accuracy, the width of the window is ~1 amu, excluding both the analyzed peptide and the likely isotopic forms from the peptide, whereas for hybrid mass spectrometer LTQ-Orbitrap that has high mass accuracy, the width of the window can be set to ~0.1 amu since isotopic forms will be rejected for CID by activation of “Monoisotopic precursor selection” mode (Figure 2). The merit of Dynamic Exclusion is that it temporarily puts the recently analyzed peptides into an exclusion list and allows the mass spectrometer to obtain CID spectra from other less intense peptides; however, the co-eluted other peptides that have similar m/z can fall into the “black window” and will be rejected for CID, decreasing and the spectra count of the corresponding proteins. If the Exclusion Mass Width is too big, the odd for other peptides to fall into these “black windows” will be higher and those peptides will be skipped for CID. Therefore, depending on the type of mass spectrometers, appropriate Exclusion Mass Width should be applied.
Exclusion list size. This parameter defines the maximum number of masses that can be on the dynamic exclusion list. It is usually in the order of 100-500. The mass of the newly analyzed peptide will queue in and the mass of previously analyzed peptide will queue out when the list is full. Since each mass on the list will create a “black window”, this parameter will also determine how many “black windows” may exist during MS analysis. A relatively large Exclusion List Size can ensure that the analyzed peptide will stay on the dynamic exclusion list before its exclusion duration time expires. If the Exclusion List Size is too big, the odd for other peptides to fall into these “black windows” will be higher and those peptides will be skipped for CID. In contrast, a relatively small Exclusion List Size can potentially remove the analyzed peptide from dynamic exclusion list when the list is full, even though its exclusion duration time does not expire. Thus, a relatively larger number of this parameter will be considered in favor of qualitative proteomic analysis, while a relatively small number can be selected in favor of quantitative analysis by allowing repeated analysis of the abundant peptides.
Exclusion duration. This parameter determines how long the analyzed peptide will stay on the dynamic exclusion list after its MS2 spectrum is acquired, and it determines the lifetime of each “black window” as well. After the exclusion time expires, the peptide can be selected again for CID if its intensity is above the threshold to trigger CID. Consequently, the longer exclusion duration time will prevent the mass spectrometer from repeatedly collecting MS2 spectrum from the same peptide, providing opportunity to obtain CID spectra from other unique peptides. As a result, the relatively longer exclusion duration time will be in favor of qualitative proteomic analysis by identifying as many unique peptides as possible, whereas it will be disadvantageous for quantification due to its unbiasedness on abundant and low-abundance peptide ions. In contrast, shorter exclusion duration time will favor quantitative analysis by collecting multiple MS2 spectra from the same peptide, whereas it will be disadvantageous for general proteomic analysis since the mass spectrometer will spend less time to collect MS2 spectra from other low-abundance peptides (Figure 3). Thus, a medium number of this parameter should be considered in order to achieve satisfactory result of both qualitative and quantitative MS analysis from complex mixture. Given that chromatographic peak widths of most peptides are in the order of 10-300 s by nano-flow liquid chromatography separations, the Exclusion Duration is generally set to 20-200 s. Notably, two research groups observed that the optimal Exclusion Duration time was around 90 s in their MS method development (29, 30).
HPLC gradient. High performance liquid chromatography (HPLC) is coupled to tandem mass spectrometry to separate the peptide mixture from trypsin digestion, and reproducible chromatography is essential for relative quantitation. For analyzing complex biological samples by LC-MS/MS, a relatively longer/shallower HPLC gradient is generally applied in order to improve resolution and acquire more MS2 spectra for protein identification; however, if the gradient is too long, it is challengeable to obtain reproducible elution profiles from multiple samples, and the MS sensitivity will drop due to bigger peak width and lower peak intensity of the eluted peptide. Given the same dynamic exclusion duration time, it is expected that more repeated MS2 spectra count from a peptide will be collected if a longer/shallower HPLC gradient is applied (Figure 4).
Other factors. The spectra count associated with the identified proteins varies depending on instrumentation, protein database, spectral interpretation software and algorithm to estimate false discovery rate (FDR) (35-38). Due to differential fragmentation efficiency of various peptides, the high quality MS2 spectrum, containing abundant b and y ions, will be matched to the same peptide by different spectral interpretation software such as Sequest and Mascot, whereas the relatively low quality MS2 spectrum, containing more fragment ions from neutral loss or fragment ions other than b and y ions, may be assigned to different peptides by these software, thus affecting the spectra count of identified proteins. Separately, a filter criterion is generally applied after database search to remove candidate peptides with low scores that could be false positive. More often than not, a loose filter criterion will generate more identified proteins with higher spectra count and higher FDR, whereas the stringent filter criterion will generate less identified proteins with lower spectra count and lower FDR. Furthermore, many human proteins have splicing variants and PTMs, and the spectra count of these identified proteins may be incomplete if the information is not included during database search for peptide matching.
Conclusion
Quantitative and qualitative methods are complementary, not mutually exclusive. Indeed, all assays have limits in regard to analytical sensitivity and precision, the dynamic range for quantitation and the lowest level of detection. An ongoing challenge for MS practitioners is to identify all the expressed proteins and their PTMs in a complex proteome such as human cell or body fluid. Frequently some proteins and PTMs are detected in one sample, but not in the other, due to low abundance. Thus, absolute quantitation can be hampered since this calculation requires that the analyte can be detected with measurable quantities in both samples, whereas relative quantitation by spectra count and qualitative comparison are not limited by that. For instance, on the basis of the spectra count label-free quantitation approach, we recently characterized the proteomes of pancreatic ductal adenocarcinoma (PDAC) cells and normal pancreatic duct cells and identified a large number of differentially expressed metabolic enzymes. Identification of this differential level of abundance can facilitate our understanding of cancer cell's survival in hypoxic and metabolic stresses (28). We also qualitatively characterized the phosphoproteomes of PDAC cells and normal duct cells. The analysis revealed differential phosphorylation of cell adhesion, cell junction, and structural proteins, providing clues to the complex dynamics of tumor invasion and metastasis in pancreatic cancer (39). Notably, more and more studies based on MS2 spectra count label-free relative quantitation have been carried out in the field of molecular cell biology and cancer research (40-43). At situations that there is no immediate need for absolute quantitation, relative quantitation technique by spectra count shows its merits to fulfill the objective of comparative analysis since it is fast, direct and simple.
Acknowledgements
The authors wish to thank the support from the College of Science in George Mason University.
- Received April 18, 2012.
- Accepted April 26, 2012.
- Copyright© 2012 International Institute of Anticancer Research (Dr. John G. Delinassios), All rights reserved