scieee Science in your language
[en] (orig)
In-Search Assignment of Monoisotopic Peaks Improves the
Identication of Cross-Linked Peptides
Swantje Lenz,
Sven H. Giese,
Lutz Fischer,
and Juri Rappsilber*
,,
Bioanalytics, Institute of Biotechnology, Technische Universitat Berlin, 13355 Berlin, Germany
Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom
*
SSupporting Information
ABSTRACT: Cross-linking/mass spectrometry has undergone
a maturation process akin to standard proteomics by adapting
key methods such as false discovery rate control and
quantication. A poorly evaluated search setting in proteomics
is the consideration of multiple (lighter) alternative values for
the monoisotopic precursor mass to compensate for possible
misassignments of the monoisotopic peak. Here, we show that
monoisotopic peak assignment is a major weakness of current
data handling approaches in cross-linking. Cross-linked
peptides often have high precursor masses, which reduces the
presence of the monoisotopic peak in the isotope envelope.
Paired with generally low peak intensity, this generates a challenge that may not be completely solvable by precursor mass
assignment routines. We therefore took an alternative route by ‘”in-search assignment of the monoisotopic peakin the cross-
link database search tool Xi (Xi-MPA), which considers multiple precursor masses during database search. We compare and
evaluate the performance of established preprocessing workows that partly correct the monoisotopic peak and Xi-MPA on
three publicly available data sets. Xi-MPA always delivered the highest number of identications with 2 to 4-fold increase of
PSMs without compromising identication accuracy as determined by FDR estimation and comparison to crystallographic
models.
KEYWORDS: cross-linking, mass spectrometry, data processing, proteomics, software, structural proteomics, BS3, SDA, peptides,
monoisotopic mass
INTRODUCTION
Several approaches have been utilized to increase the numbers
of identied cross-links, for example enriching for cross-linked
peptides,
14
using dierent proteases
1,5,6
or optimizing
fragmentation methods.
7,8
In parallel with experimental
developments, data analysis has also progressed to extract
even more cross-links to be used as distance restraints for
modeling of proteins and their complexes.
9,10
Search software
has been designed for the identication of cross-linked
peptides, for example Kojak,
11
xQuest,
12
pLink,
13
XlinkX,
14
or Xi.
5
In addition, cross-linking workows can make use of
preprocessing methods to improve data quality and reduce le
sizes,
15
as well as postprocessing methods to lter out false
identications
11,16
and custom-tailored false discovery rate
(FDR) estimation.
1719
Preprocessing can improve peptide
identication by correcting the MS1 precursor ion m/zand
simplifying MS2 fragment spectra. Established proteomics
software perform such preprocessing, including MaxQuant
20,21
and OpenMS.
22,23
For example, MaxQuant performs a variety
of preprocessing steps: it corrects the precursor m/zby an
intensity-weighted average if a suitable peptide feature is found,
reassigns the monoisotopic peak and contains options for
intensity ltering of MS2 peaks. Despite such correction of the
precursor mass, many linear search engines have integrated the
possibility of considering multiple monoisotopic peaks during
search.
2426
However, the benets of this search feature are
currently unclear. It seems that the assignment of mono-
isotopic mass for tryptic peptides is already achieved
adequately either during acquisition or as part of preprocess-
ing.
Cross-linked peptides have characteristics that may render
MS1 monoisotopic precursor mass assignment as used for
linear peptides nonoptimal: high-charge states, large masses,
and low abundances. Several cross-link search engines include
MS1 correction in their pipeline: pLink corrects monoisotopic
peaks based on previous work with linear peptides,
27
however
does not include a parameter for searching multiple precursor
masses. Kojak averages precursor ion signals of neighboring
scans to create a composite spectrum and infer the true
monoisotopic mass of the precursor. If this step fails, precursor
masses up to 2 Da lighter are searched.
11
For previous
searches in Xi, MaxQuant was used to perform preprocessing.
Neither xQuest nor XLinkX describe precursor correction in
their workow documentation and there is no option for
additionally searched masses available in the respective search
Received: August 4, 2018
Published: October 8, 2018
Article
pubs.acs.org/jpr
Cite This: J. Proteome Res. 2018, 17, 39233931
© 2018 American Chemical Society 3923 DOI: 10.1021/acs.jproteome.8b00600
J. Proteome Res. 2018, 17, 39233931
This is an open access article published under a Creative Commons Attribution (CC-BY)
License, which permits unrestricted use, distribution and reproduction in any medium,
provided the author and source are cited.
Downloaded via TU BERLIN on March 29, 2023 at 13:03:04 (UTC).
See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.
parameters. We are not aware of a detailed evaluation of the
impact of dierent preprocessing techniques for cross-link
identication, independent of the search software. Correcting
the monoisotopic mass of precursors, although acknowledged
as an issue,
11,28
awaits systematic evaluation.
In this study, we show that errors in assigning monoisotopic
peaks during data acquisition are frequent for cross-linked
peptides because of their size and generally low abundance.
This adversely aects their identication. We show that
standard software suites, MaxQuant and OpenMS correct
monoisotopic precursor masses of cross-linked peptides with
variable success. We then implement an option in Xi to
consider multiple precursor masses during search, to minimize
the impact of false monoisotopic precursor mass assignment
on the identication of cross-links.
METHODS
Data Sets
In this study, we used three publicly available data sets (Table
1). The three data sets were chosen to reect a range of
applications of cross-linking mass spectrometry as well as a
range of data complexity: the rst data set is Human Serum
Albumin (HSA) cross-linked with succinimidyl 4,4-azipenta-
noate (SDA) and fragmented using ve dierent methods
(PXD003737).
29
The second data set is a pooled pseudocom-
plex sample with seven separately cross-linked proteins with
bis(sulfosuccinimidyl) suberate (BS3) (PXD006131).
7
This
data set includes data from four dierent fragmentation
methods. The third data set is the most complex sample,
composed of 15 size exclusion chromatography fractions of
Chaetomium thermophilum lysate cross-linked with BS3 and
fragmented only with HCD (PXD006626).
30
The rst and last
size exclusion fractions were used to optimize the search
parameters for this data set. All samples were analyzed on an
Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo
Fisher Scientic, San Jose, CA) using Xcalibur (version 2.0 and
2.1).
Preprocessing
Raw les were preprocessed independently using MaxQuant
(1.5.5.30), OpenMS (2.0.1) and the ProteoWizard
31
tool
msconvert (3.0.9576) for comparison. Scripts automating the
preprocessing, search and evaluation were written in Python
(2.7).
The essential steps during the preprocessing can be divided
into two parts: (1) correction of the m/zor charge of the
precursor peak for MS2 spectra and (2) denoising of MS2
spectra. MaxQuant and OpenMS both try to correct the
precursor information via additional feature nding steps, i.e.
identifying a peptide feature from the retention time, m/zand
intensity domain of the LC-MS run. Additionally, denoising of
the MS2 spectra is performed by simply ltering the most
intense peaks in dened m/zwindows. The preprocessing is by
default enabled in MaxQuant and was run using the partial
processing option (steps 15) with default settings except for
inactivated deisotopingand top peaks per 100 Da, which
was set to 20. The OpenMS preprocessing workow includes
centroiding, feature nding,
32
precursor correction (mass and
charge) using the identied features and MS2 denoising as
described above (Supporting Information (SI) Figure S1).
Msconvert was used to convert the raw les to mgf les
without any correction. These peak les were denoted as
uncorrectedand used as our baseline to quantify improve-
ments in the subsequent database search. For the in-search
assignment of the monoisotopic peakin Xi (Xi-MPA), we
used msconvert to convert raw les to mgf les and included a
MS2 peak lter for the 20 most intense peaks in a 100 m/z
window.
Data Analysis
Peak les were searched separately in Xi (1.6.731) with the
following settings: MS accuracy 3 ppm, MS/MS accuracy 10
ppm, oxidation of methionine as variable modication, tryptic
digestion, two missed cleavages. For samples cross-linked with
SDA, linkage sites were allowed on lysine, serine, tyrosine,
threonine, and protein n-terminus on one end and all amino
acids on the other end of the cross-linker. Variable
modications were monolink SDA (110.048 Da), SDA loop-
links (82.0419 Da), SDA hydrolyzed (100.0524 Da), SDA
oxidized (98.0368 Da)
31
as well as carbamidomethylation on
cysteine. For searches with BS3, linkage sites were lysine,
serine, threonine, tyrosine, and the protein n-terminus.
Carbamidomethylation on cysteine was set as xed mod-
ication. Allowed variable modications of the cross-linker
were aminated BS3 (155.0946 Da), hydrolyzed BS3 (156.0786
Da) and loop-linked BS3 (138.0681 Da). For collision-induced
dissociation (CID) and beam-type CID, also referred to as
higher-energy C-trap dissociation (HCD), b- and y-ions were
searched for, whereas for electron transfer dissociation (ETD)
c- and z-ions were allowed. For ETciD and EThcD, b-, c-, z-,
and y-ions were allowed. The HSA and pseudocomplex data
sets were searched against the known proteins in the sample.
For each protein fraction of the C. thermophilum data set, the
databases of the original publication were used, where a
database was created for each fraction by taking the most
abundant proteins (iBAQ value above 106). For searches
employing Xi-MPA, the parameter "missing_isotope_peaks"
was set to the respective mass range searched. Data sets 1 and
2 were searched with a reversed decoy database, whereas data
set 3 was searched with a shued decoy database due to
palindromic sequences. For the reversed decoy database,
lysines and arginines were swapped with the preceding amino
acid before peptide generation.
17,20
For cross-linking, there are dierent information levels:
PSMs, peptide pairs, residue pairs (links) and protein pairs.
The false discovery rate (FDR) can be calculated on each one
of these levels and should be reported for the level at which the
information is given.
12
The FDR was calculated as described in
Fischer et al.
17
using xiFDR (1.0.14.34) according to the
following equation: =
FDR TD DD
TT . A 5% PSM level cuto
was imposed. The setting uniquePSMswas enabled and the
FDR was calculated separately on self-and between links.
Minimal peptide length was set to 6. In data set 2, identied
cross-linked residues were mapped to the crystal structure of
the respective protein and the Euclidian distance between the
alpha-carbons was calculated. Structures were downloaded
Table 1. Overview of Datasets Used
data set sample database size
a
reference
1 HSA 1 Giese et al. 2016
2 pseudocomplex 7 Kolbowski et al. 2017
3C. thermophilum 198400
b
Kastritis et al. 2017
a
Database size refers to the number of proteins in the database.
b
Multiple size exclusion chromatography fractions (n= 15).
Journal of Proteome Research Article
DOI: 10.1021/acs.jproteome.8b00600
J. Proteome Res. 2018, 17, 39233931
3924
from the PDB (IDs: 1AO6, 5GKN, 2CRK, 3NBS, 1OVT,
2FRJ). Kojak (1.5.5) was run via the Trans-Proteomic Pipeline
(5.1.0)
33
using default settings except: MS1 resolution 120
000, BS3 allowed on lysine, serine, threonine, tyrosine, and the
protein n-terminus, aminated BS3 (155.0946 Da) as variable
modication of the cross-linker, 3 ppm mass tolerance on MS1
level. For the uncorrected search, the isotope error was set to 0
and precursor renement was disabled. PSMs were validated
using PeptideProphet
34
and FDR calculated as described above
on the resulting probability.
The mass spectrometry data have been deposited to the
ProteomeXchange Consortium via the PRIDE
35
partner
repository with the data set identier PXD011121. For
transparency, python scripts are available on GitHub under:
https://github.com/Rappsilber-Laboratory/Xi-MPA_scripts.
RESULTS AND DISCUSSION
We evaluated the impact on cross-link identication in Xi of
changing the precursor monoisotopic mass that was initially
assigned during data acquisition (uncorrected). In this
analysis, MaxQuant and OpenMS were used as preprocessing
tools. We used three dierent data sets that dier in complexity
and fragmentation regimes. To measure the improvements
from using the preprocessing tools, a simple conversion from
raw les to mgf format was done with msconvert and used as a
baseline. Note that in the spectrum header, there are two m/z
values: the trigger mass of the MS2 and the assigned
monoisotopic peak of the isotope cluster. Msconvert extracts
the assigned monoisotopic mass. Processed data were searched
separately in Xi and evaluated on PSM level or link (= unique
residue pair) level, with a 5% FDR. Finally, the newly
implemented in-search assignment of monoisotopic peaks in Xi
was compared to the elaborate preprocessing pipelines in
OpenMS and MaxQuant.
Preprocessing Increases the Number of Cross-Link PSMs
by Finding the Correct Monoisotopic Peak
The data sets were preprocessed in MaxQuant and OpenMS
and numbers of identied PSMs were compared to those
obtained using uncorrected data. Data sets 1 (HSA) and 2
(pseudocomplex) were acquired with dierent acquisition
methods. For comparability to data set 3 (complex mixture),
we focused on the HCD acquired data. Cross-links between
proteins were excluded, either because they were experimen-
tally not possible (data set 2) or observed in too low numbers
for reliable FDR calculation (data set 3).
For uncorrected data, 672, 354, and 2157 cross-link PSMs
resulted for the HSA data set (data set 1), pseudocomplex
(data set 2), and rst and last fractions of C. thermophilum
respectively (data set 3). Both preprocessing approaches
improved numbers of identied PSMs for all data sets:
Preprocessing in MaxQuant led to 1127 (68% increase), 966
(173% increase), and 2966 (38% increase), while for OpenMS,
1044 (55% increase), 598 (69% increase) and 2394 (11%
increase) PSMs were identied (Figure 1A).
We assessed the gains in identied PSMs of preprocessed
data compared to uncorrected data (focusing on data set 2)
regarding three forms of precursor correction: (1) correction
of the monoisotopic mass, (2) charge state correction, and (3)
small corrections of the m/zvalue based on averaging the m/z
values across the peptide feature (Figure 1B). Precursor mass
and charge state of spectra identied solely in MaxQuant-Xi
were compared to their counterparts when searching
uncorrected data in Xi. Of the 756 newly identied spectra,
686 (91%) had a dierent monoisotopic precursor mass.
Precursors were primarily corrected to lighter masses by
MaxQuant, that is, the monoisotopic peak correction by 1
(208 spectra), 2 (215 spectra), 3 (149 spectra), and 4Da
(62 spectra). Greater shifts (5to7 Da) only occurred 30
times, and corrections to heavier masses were observed 22
times. Only 30 spectra (4%) were corrected in their charge
state. For the 60 spectra (8%) without correction in charge
state or monoisotopic peak, we only identied nine spectra
that had a higher error than 3 ppm before preprocessing,
indicating a small correction of the initial precursor m/z(by
averaging of peptide feature peaks). The main proportion of
these identications is likely a product of noise removal in MS2
spectra or small changes in the score distribution. Similarly, for
OpenMS-Xi, the monoisotopic peak correction had the
greatest impact: Of the 314 spectra that OpenMS added
over uncorrected data, 139 were precursor corrected by 1Da
and 108 to 2 Da. In contrast to MaxQuant, corrections to 3
or lighter were not observed, which might explain the higher
number of identications obtained with MaxQuant-Xi.
Figure 1. Correction of the monoisotopic peak is crucial in cross-link identication. (A) The data sets were preprocessed using MaxQuant and
OpenMS, leading to more identied PSMs in all cases. Fold changes from uncorrected data (msconvert conversion of Xcalibur data) were
calculated for each le separately and the mean plotted. Error bars represent the standard error of the mean between dierent acquisitions (HSA: n
= 3, pseudocomplex: n=3,C. thermophilum:n= 8). (B) The majority of additional identications after preprocessing are due to correction of the
precursor mass to lighter monoisotopic masses. Spectra that are unique to MaxQuant preprocessed searches of HCD acquisitions from data set 2
were evaluated in terms of precursor correction. The main proportion of the gain was corrected to lighter masses of up to 3 Da, while charge state
correction or correction to heavier masses rarely occurred. (C) Isotope cluster of a corrected precursor of m/z992.71 (z=5,m= 4958.6 Da) was
solely identied in MaxQuant preprocessed results. In OpenMS preprocessed and uncorrected data, the wrong monoisotopic mass was selected for
unknown reasons.
Journal of Proteome Research Article
DOI: 10.1021/acs.jproteome.8b00600
J. Proteome Res. 2018, 17, 39233931
3925
For data set 1 and 3, the gains of preprocessing are smaller
than for data set 2. The median peptide mass of data set 1
(3368 Da) is smaller than the median mass of data set 2 (3946
Da) and we later show that this is a major factor in precursor
mass assignment. This reects in the distribution of lighter
masses assigned: 46%, 33%, and 21% were corrected by 1 Da,
2 Da, and to even lighter peaks, respectively. Data set 3 was
acquired with a dierent version of Xcalibur, for which we saw
better mass assignment than for earlier versions (data not
shown). As an implication of this, already 67% of the lighter
corrected masses are shifted by 1 Da, 21% by 2 Da, 12% to
even lighter masses. However, we were not able to follow up
on this to our content, since the source code of the vendor
software is not available.
In summary, preprocessing, especially monoisotopic peak
correction, leads to a notable increase in identications. Using
the 3-dimensional peptide feature is advantageous compared to
on-the-y detection of the monoisotopic peak. If the preceding
MS1 spectrum was acquired during the beginning (or end) of
the elution prole of a peptide, the intensity will be low. Thus,
the monoisotopic peak might not even be detectable at the
time of fragmentation. For large (cross-linked) peptides, this
eect might be exacerbated by the monoisotopic peak usually
being less intense than other isotope peaks. Therefore, using
the additional information from the retention time domain will
be benecial. The same feature information can also be used to
determine or validate the assigned charge state of the
precursor. However, the instrument software almost always
assigned the same charge state as MaxQuant or OpenMS.
Thus, the main advantage for identifying cross-linked peptides
arises from monoisotopic peak correction.
Interestingly, OpenMS and MaxQuant did not always agree
on or nd the same monoisotopic peak (Figure 1C). Of the
total MaxQuant-corrected spectra with a dierent mono-
isotopic mass, 81% were not corrected and 6% corrected
dierently with OpenMS. Vice versa, 15% of the monoisotopic
peaks corrected by OpenMS were not corrected by MaxQuant
and 25% were corrected dierently. Both MaxQuant and
OpenMS have their own implementations for precursor
correction - therefore, there might be instances where
MaxQuant is able to nd a corresponding peptide feature
where OpenMS does not and vice versa. Although OpenMS
did not lead to the same improvements in the number of
identications as MaxQuant, it did correct some precursors
that the latter did not. We therefore suspect that there are also
precursors with a falsely assigned monoisotopic peak that were
corrected with neither algorithm. Furthermore, 3-dimensional
detection of peptide features is challenging for low intensity
peptides. In conclusion, there likely remain falsely assigned
monoisotopic peaks in the data, ultimately leading to missed or
false identications.
In-Search Monoisotopic Peak Assignment Increases the
Number of Identications
We observed multiple cases where MaxQuant and OpenMS
disagreed in their monoisotopic peak choice, indicating that
the problem of monoisotopic peak assignment (MPA) cannot
be solved easily at MS1 level. Indeed, we found instances
where the monoisotopic peak is simply not distinguishable
from noise, so a feature-based correction would not be feasible.
Nevertheless, the associated MS2 spectra could be matched to
a cross-linked peptide when considering multiple dierent
monoisotopic masses during search. This shows that the extra
information on obtaining a peptide-spectrum match is
advantageous to MPA over considering MS1 information
alone. Therefore, we implemented a monoisotopic peak
assignment in Xi: for each MS2 spectrum, multiple precursor
masses are considered during a single search and the highest
scoring peptide-pair assigns the precursor mass. Note that this
is dierent from simply searching with a wide mass error for
MS1. The mass accuracy of MS1 is minimally compromised as
multiple candidates for the monoisotopic mass are taken and
considered with the original mass accuracy of the measure-
ment.
To nd a good trade-obetween increased search space and
sensitivity, we tested dierent mass range settings on the data
sets. For data set 2 (HCD subset), the number of PSMs
increased with ranges up to 5 Da on the considered
monoisotopic masses (Figure 2A). However, the increase in
identications from 4to5 Da was only 3% and considering
the increase in search time, we continued with a maximal
correction to 4 Da as the optimal setting for this data set. Xi-
MPA yielded 1508 PSMs, which is a 326% increase compared
to searching uncorrected data and a 56% increase compared to
MaxQuant-Xi. Similar improvements are observed for the
other fragmentation methods in this data set (SI Figure S2).
Additionally, we corrected up to 7 Da to test if a large
increase in search space increases random spectra matches as
measured by the target-decoy approach. The number of
identications at 5% FDR decreased only slightly compared to
5Da(1%), but still led to more identications than up to
4 Da (3%). In the HSA data set, Xi-MPA with up to 4Da
increased the number of identied PSMs by 170% compared
to uncorrected data (Figure S3).
As a nal evaluation of in-search monoisotopic peak
assignment, we searched the complete data set of C.
thermophilum. We used 0 to 3 Da as the range of Xi-MPA,
since an initial analysis of the rst and last fraction of the C.
thermophilum data set returned a similar number of
identications when running Xi-MPA up to 4Daor3
Da (Figure S4). As a comparison, we took the original peak
les obtained from PRIDE. The FDR was calculated separately
on self-and between links, enabled boosting (automatic
preltering on PSM and peptide pair level
17
), with a minimum
of three fragments per peptide and a minimal delta score of 0.5.
For the original peak les, which were preprocessed in
MaxQuant, we identied 3848 PSMs, 2594 peptide pairs and
1653 cross-links, with a 5% FDR on each respective level
(Figure 2B). Xi-MPA resulted in 4952 PSMs (29% increase),
3566 peptide pairs (37% increase), and 2273 cross-links (38%
increase).
Next, we looked into the complementarity of search results
with the dierent approaches, using data set 2 at 5% link-FDR.
Preprocessing via MaxQuant and OpenMS led to 172 and 158
links, respectively, while Xi-MPA resulted in 243 links. While
the overlap between links of OpenMS-Xi and MaxQuant-Xi is
only 50%, Xi-MPA identications cover 76% of both searches
(Figure 2C). Nineteen and 23 links are uniquely found in
MaxQuant and OpenMS preprocessed data, respectively.
However, there are ve decoy links as well in each unique
set (resulting in a link-FDR of 26% and 22%). For Xi-MPA,
there are 75 unique target links with 12% link-FDR.
Identication-based monoisotopic peak assignment as
employed by Xi-MPA results in more identications than the
feature-based assignment algorithms of OpenMS and Max-
Quant. Neither OpenMS nor MaxQuant correct all precursor
Journal of Proteome Research Article
DOI: 10.1021/acs.jproteome.8b00600
J. Proteome Res. 2018, 17, 39233931
3926
masses that are incorrectly assigned during data acquisition. In
Xi-MPA, spectra are searched with multiple monoisotopic
masses, thereby relying less on the MS1 information. The
quality of the precursor isotope cluster does not contribute to
the decision of monoisotopic mass and spectra for which
correction failed will be identiable. One could hypothesize
that increasing the search space by considering multiple masses
will lead to more false positives, thereby reducing the number
of true identications. This is not the case, as we match
substantially more PSMs at constant FDR by considering
alternative monoisotopic masses. As a second plausible caveat,
this approach increases the search time. However, the use of
relatively cheap computational time appears balanced by the
notable increase in identied cross-links. The optimal range of
additional monoisotopic peaks to search will however be
dependent on complexity and quality of MS1 acquisition and
the instrument software. To reduce the mass range considered
in Xi-MPA, we developed a MS1 level-based approach. For
each precursor, we search lighter isotope peaks in MS1 and use
this to narrow the search space (explained in detail in the SI).
This led to an average of 24% less values to be considered,
while only reducing the number of identications by 3%. We
hope that our observation of the monoisotopic peak detection
challenge in cross-linking together with our publicly available
data sets will lead to further improvements in monoisotopic
peak-assignment algorithms in the future, possibly tailored to
cross-link data.
The cross-link search engine Kojak employs a precursor
correction in its pipeline.
11
As we could not nd a detailed
evaluation of the impact of precursor correction in Kojak, we
searched the HCD data of the pseudocomplex data set without
correction as well as with their default correction settings. We
focused on FDR 10% data as there were too few identications
for a reliable calculation of FDR 5%. Just 171 cross-linked
PSMs passed for the unprocessed data, whereas for the default
search, 1088 PSMs passed (536% increase). Of those, 862
(79%) were corrected in their monoisotopic precursor peak.
These results support our observations with Xi.
In-Search Monoisotopic Peak Assignment Does Not
Compromise Search Accuracy
Changing the search could lead to several problems. We
already excluded that the increased search space leads to high-
scoring decoy matches that in turn reduce the number of
identications at a given FDR cuto. As an additional
validation, we assessed our results against known PDB
structures using the HCD data from the pseudocomplex data
set (data set 2), at 5% link-FDR. Assuming a crystal structure is
correct, a cross-link can be unexpectedly long either because
the link is false or because of in-solution structural dynamics.
If, however, the proportion of long-distance links in results of
two approaches is identical, then at least the two results have
equal quality.
We rst tested the results of all three approaches against
crystal structures. Residue pairs were mapped to PDB
structures and the distance between the two alpha-carbons
was calculated (see Methods). Thirty Å was set as the maximal
distance for BS3, links with a greater distance were classied as
long-distance. In this evaluation, we excluded the protein C3B
because its exible regions make it unsuitable for this analysis.
For MaxQuant and OpenMS preprocessed results, 11.8% and
6.1% long-distance cross-links were identied, respectively. In
Xi-MPA, 8.1% long distance cross-links were identied (Figure
3A). Of the links uniquely identied through Xi-MPA, only
5.3% were long distance links. Therefore, Xi-MPA as such does
not lead to an enrichment in long-distance cross-links.
However, it could be that mass-corrected precursors tend to
have a higher proportion of long-distance links. We therefore
split the Xi-MPA results into ve groups corresponding to the
monoisotopic mass change (0, 1, 2, 3, 4 Da) and looked
at their match to crystal structures. If a link originated from
PSMs with dierent mass corrections, all of those were
considered. We conducted a nonparametric ANOVA
(KruskalWallis test) to detect any signicant changes in the
distance distributions of Xi-MPA identications with dierent
Figure 2. In-search monoisotopic peak assignment outperforms
preprocessing. (A) Performance of Xi-MPA on data set 2. HCD data
from the pseudocomplex data set were searched assuming dierent
ranges of missing monoisotopic peaks. With increased ranges, the
number of identied PSMs also increases. (B) Performance of Xi-
MPA on the complete C. thermophilum data set. All 15 fractions were
searched with the original preprocessed data as well as with Xi-MPA.
(C) Overlap of identied residue pairs of MaxQuant-Xi and OpenMS-
Xi to residue pairs gained from Xi-MPA (data set 2). Numbers in
brackets are the proportion of decoys in the respective regions.
Journal of Proteome Research Article
DOI: 10.1021/acs.jproteome.8b00600
J. Proteome Res. 2018, 17, 39233931
3927
shifts and decoy distribution. However, we fail to reject the
null hypothesis at the predetermined signicance level of α=
0.05 (p-value: 0.13), indicating that the distance distributions
for all subsets are similar. This matches the visual inspection of
distance distributions (Figure 3B). Furthermore, all individual
distance distributions were signicantly smaller than the
derived reference distribution (one-sided Wilcoxon test, see
SI Table S5). In conclusion, we do not see any evidence of in-
search monoisotopic mass assignment leading to increased
conicts with crystal structures. We then evaluated the eect of
in-search monoisotopic mass assignment on PSM quality as
assessed by the search score. First, we compared the scores of
PSMs with a mass shift (Xi-MPA identications) to the scores
of the same spectrum without a mass shift (uncorrected data).
While scores with shifted mass have a median of 6.7, the
median score is 2.3 when using the uncorrected masses (Figure
3C). As one would expect from an increased search space, the
scores of decoy hits also improve, albeit only marginally. We
nd that the score dierence of target PSMs is signicantly
larger than of decoy PSMs (one-sided Wilcoxon test, p-value:
<2.2 ×1016). We then turned to a decoy mass searchfor
which we not only searched the range from 0 Da to 4 Da, but
also +1 Da to +4 Da. Assuming the monoisotopic peak in the
uncorrected data is rarely lighter than the true monoisotopic
peak, the new identications should score like decoy
identications. Indeed, the resulting score distributions for
targets with a positive mass shift follow the decoy distribution
(Figure 3D). In contrast, identications with a negative shift
are distributed like the identications without mass shift. In
conclusion, in-search monoisotopic mass change leads to
signicantly improved scores with a distribution that resembles
that of precursors that did not see a mass change (0 Da).
Importantly, these improvements are not random since an
equally large search space increase (+1 Da to +4 Da) results in
a completely dierent score distribution that resembles the
decoy distribution but not the distribution of identications
without a mass shift.
Heavy and Low Intensity Peptides Are Corrected More
Frequently
One would especially expect to observe shifted mass
assignment for peptides of high mass and low abundance.
For large peptides (approximately >2000 Da), the mono-
isotopic peak will not be the most intense peak in the isotope
cluster. If the peptide is of low abundance, the monoisotopic
Figure 3. Matches with and without in-search mass shift show similar
quality metrics. (A) Evaluation of Xi-MPA derived links on crystal
structures (data set 2). Distances between αcarbon atoms of
identied cross-linked residues in the crystal structure of the proteins
are shown in light gray while a reference distribution of all possible
Figure 3. continued
pairwise C-alpha distances of cross-linkable residues is shown in dark
gray. Thirty Å is set as a limit, above which links are dened as long
distance. (B) Distance distribution of identications with dierent
mass corrections. There was no signicant dierence between the
dierent mass shifts, while all had a signicant dierence to the decoy
distribution. (C) PSM scores of spectra identied with a mass shift are
signicantly higher than the corresponding score in uncorrected data.
Shown are the score distributions of uncorrected and Xi-MPA results,
as well as the corresponding decoy distribution. (D) Score
distribution of PSM matches of the decoy mass search.
Identications with a positive mass shift generally follow the decoy
distribution (note that there are correct identications with a positive
mass shift, albeit few, see Figure 1B) while identications with a
negative shift resemble unshifted identications. The scores of
negative-shifted PSMs are signicantly higher than those of positive-
shifted PSMs (one-sided Wilcoxon test, p-value: <2.2×1016).
Journal of Proteome Research Article
DOI: 10.1021/acs.jproteome.8b00600
J. Proteome Res. 2018, 17, 39233931
3928
peak may be of too low intensity to be detected. We therefore
analyzed the monoisotopic peak assignment in Xi-MPA
regarding the precursor mass and intensity. Indeed, precursors
with higher masses are more often corrected to lighter
monoisotopic peaks (Figure 4A). While the median precursor
mass for uncorrected matches is 2952 Da, for matches
corrected by 2 Da it is already 4062 Da and for 4Dait
is 4684 Da. Of the identications with a mass above 3000 Da,
88% were identied with a lighter mass. For precursors lighter
than 3000 Da, the proportion was 42%. Like for mass
dependency, there is a trend toward larger correction ranges
for lower intensity peptides (SI Figure S5). However, this is
less strong than it is for precursor mass.
When evaluating the newly matched precursors of Xi-MPA,
the advantage of not having to rely on MS1 identication is
evident. Matches not made through any of the preprocessing
methods are generally much less intense (Figure 4B) and larger
(SI Figure S6) than matches that are common to all
approaches. Manual analysis of isotope clusters of corrected
precursors from data set 2 revealed many cases where the
monoisotopic peak was present in the MS1 spectrum but was
not recognized during acquisition. For some, this might be due
to the peak being of low intensity and discarded as noise, or
because of other interfering peaks (Figure 4C). However, there
are also cases where the cluster is well resolved (Figure 1C).
Without details of how the instrument software determines the
monoisotopic peak, a full evaluation is dicult. For a complete
list of precursor m/zfor Xi-MPA identications and
corresponding m/zof uncorrected, MaxQuant and OpenMS
data, see SI Table S1S3.
Note that in many acquisition methods, the machine only
fragments peaks where it can successfully identify a full isotope
cluster. Therefore, there might be instances of cross-linked
peptides not being fragmented because of insucient isotopic
cluster quality, leading to lost identications.
CONCLUSION
The size and low abundance of cross-linked peptides leads to
frequent misassignment of the monoisotopic mass by instru-
ment software, which in some instances even escapes
correction by sophisticated correction approaches employed
by MaxQuant and OpenMS. Considering multiple mono-
isotopic masses during search increases the number of cross-
link PSMs 1.84.2-fold, without compromising search
accuracy as judged by multiple assessment strategies including
comparison of the gains against solved protein structures. The
problem of wrongly assigned monoisotopic peaks will have an
impact on most cross-link search engines since these all rely in
some part on the precursor mass. The extent of the
misassignment will however be sample and software-depend-
ent. Even with improved acquisition or correction software,
there will remain instances where the monoisotopic peak
cannot be determined correctly before searching due to low
intensity. Our search-assisted monoisotopic peak assignment
provides a general solution to this problem by relying on MS2
identication in addition to precursor information.
ASSOCIATED CONTENT
*
SSupporting Information
The Supporting Information is available free of charge on the
ACS Publications website at DOI: 10.1021/acs.jproteo-
me.8b00600.
Table S4: Xi-MPA mass range reduction. Figure S1:
OpenMS preprocessing workow. Figure S2: Perform-
ance of Xi-MPA on EThcD, CID, and ETciD
acquisitions of the pseudocomplex data set. Table S5:
Summary for conducted signicance tests for Figure 3B
in the main text. Figure S3: Performance of Xi-MPA on
HCD data of the HSA data set. Figure S4: Performance
of Xi-MPA on the test fractions of the C. thermophi-
lumdata set. Figure S5: Dependency of the monoisotopic
mass correction on precursor intensity. Figure S6:
Dependency of identications in Xi-MPA on mass
(PDF)
Table S1: Precursor m/zof dierent processing methods
of data set 1. Table S2: Precursor m/zof dierent
processing methods of data set 2. Table S3: Precursor
m/zof dierent processing methods of data set 3.
Supporting Information: MS1 based mass range
reduction (XLS)
Figure 4. Correction is dependent on precursor mass and intensity. (A) Box plot of the precursor mass and monoisotopic mass correction of
identied PSMs after Xi-MPA. PSMs with higher mass more often require monoisotopic mass correction to lighter masses. Whiskers show the 5
and 95% quantiles of the data. Asterisks denote the signicance calculated by a one tailed ttest (****:p-value: <0.0001). (B) Precursors of cross-
linked PSMs identied in all three approaches, MaxQuant-Xi, OpenMS-Xi, and Xi-MPA (common), are more intense than precursors of PSMs
that are only identied in Xi-MPA. In other words, successful correction happens more often for abundant precursors, whereas Xi-MPA identies
precursors of lower intensity. (C) MS1 isotope cluster of a cross-linked peptide. The monoisotopic peak of m/z758.16 (z=4,m= 3028.6 Da) was
falsely assigned during acquisition and not corrected in any preprocessing approach. Xi-MPA identies a PSM for a precursor with a mass that is 3
Da lighter.
Journal of Proteome Research Article
DOI: 10.1021/acs.jproteome.8b00600
J. Proteome Res. 2018, 17, 39233931
3929
AUTHOR INFORMATION
Corresponding Author
ORCID
Swantje Lenz: 0000-0002-8839-5371
Sven H. Giese: 0000-0002-9886-2447
Lutz Fischer: 0000-0003-4978-0864
Juri Rappsilber: 0000-0001-5999-1310
Notes
The authors declare no competing nancial interest.
ACKNOWLEDGMENTS
We thank Dr Francis OReilly for comments and helpful
discussions. This work was supported by the Einstein
Foundation, the DFG [RA 2365/4-1], and the Wellcome
Trust through a Senior Research Fellowship to JR [103139]
and a multiuser equipment grant [108504]. The Wellcome
Centre for Cell Biology is supported by core funding from the
Wellcome Trust [203149].
REFERENCES
(1) Leitner, A.; Reischl, R.; Walzthoeni, T.; Herzog, F.; Bohn, S.;
Forster, F.; Aebersold, R. Expanding the chemical cross-linking
toolbox by the use of multiple proteases and enrichment by size
exclusion chromatography. Mol. Cell. Proteomics 2012,11,
M111.014126.
(2) Chen, Z. A.; Jawhari, A.; Fischer, L.; Buchen, C.; Tahir, S.;
Kamenski, T.; Rasmussen, M.; Lariviere, L.; Bukowski-Wills, J.-C.;
Nilges, M.; et al. Architecture of the RNA polymerase II-TFIIF
complex revealed by cross-linking and mass spectrometry. EMBO J.
2010,29, 717726.
(3) Tan, D.; Li, Q.; Zhang, M.-J.; Liu, C.; Ma, C.; Zhang, P.; Ding,
Y.-H.; Fan, S.-B.; Tao, L.; Yang, B. et al. Trifunctional cross-linker for
mapping protein-protein interaction networks and comparing protein
conformational states. Elife 2016,5,DOI: DOI: 10.7554/
eLife.12509.
(4) Rampler, E.; Stranzl, T.; Orban-Nemeth, Z.; Hollenstein, D. M.;
Hudecz, O.; Schlogelhofer, P.; Mechtler, K. Comprehensive Cross-
Linking Mass Spectrometry Reveals Parallel Orientation and Flexible
Conformations of Plant HOP2-MND1. J. Proteome Res. 2015,14,
50485062.
(5) Mendes, M. L.; Fischer, L.; Chen, Z. A.; Barbon, M.; OReilly, F.
J.; Bohlke-Schneider, M.; Belsom, A.; Dau, T.; Combe, C. W.;
Graham, M. et al. An integrated workow for cross-linking/mass
spectrometry. 2018.
(6) Belsom, A.; Schneider, M.; Fischer, L.; Mabrouk, M.; Stahl, K.;
Brock, O.; Rappsilber, J. Blind testing cross-linking/mass spectrom-
etry under the auspices of the 11thcritical assessment of methods of
protein structure prediction (CASP11). Wellcome open research 2016,
1, 24.
(7) Kolbowski, L.; Mendes, M. L.; Rappsilber, J. Optimizing the
Parameters Governing the Fragmentation of Cross-Linked Peptides in
a Tribrid Mass Spectrometer. Anal. Chem. 2017,89, 53115318.
(8) Liu, F.; Lossl, P.; Scheltema, R.; Viner, R.; Heck, A. J. R.
Optimized fragmentation schemes and data analysis strategies for
proteome-wide cross-link identification. Nat. Commun. 2017,8,
15473.
(9) Orba
n-Ne
meth, Z.; Beveridge, R.; Hollenstein, D. M.; Rampler,
E.; Stranzl, T.; Hudecz, O.; Doblmann, J.; Schlogelhofer, P.; Mechtler,
K. Structural prediction of protein models using distance restraints
derived from cross-linking mass spectrometry data. Nat. Protoc. 2018,
13, 478494.
(10) Schneider, M.; Belsom, A.; Rappsilber, J. Protein Tertiary
Structure by Crosslinking/Mass Spectrometry. Trends Biochem. Sci.
2018,43, 157169.
(11) Hoopmann, M. R.; Zelter, A.; Johnson, R. S.; Riffle, M.;
MacCoss, M. J.; Davis, T. N.; Moritz, R. L. Kojak: efficient analysis of
chemically cross-linked protein complexes. J. Proteome Res. 2015,14,
21902198.
(12) Leitner, A.; Walzthoeni, T.; Aebersold, R. Lysine-specific
chemical cross-linking of protein complexes and identification of
cross-linking sites using LC-MS/MS and the xQuest/xProphet
software pipeline. Nat. Protoc. 2014,9, 120137.
(13) Yang, B.; Wu, Y.-J.; Zhu, M.; Fan, S.-B.; Lin, J.; Zhang, K.; Li,
S.; Chi, H.; Li, Y.-X.; Chen, H.-F.; et al. Identification of cross-linked
peptides from complex samples. Nat. Methods 2012,9, 904906.
(14) Liu, F.; Rijkers, D. T. S.; Post, H.; Heck, A. J. R. Proteome-wide
profiling of protein assemblies by cross-linking mass spectrometry.
Nat. Methods 2015,12, 11791184.
(15) Renard, B. Y.; Kirchner, M.; Monigatti, F.; Ivanov, A. R.;
Rappsilber, J.; Winter, D.; Steen, J. A. J.; Hamprecht, F. A.; Steen, H.
When less can yield more - Computational preprocessing of MS/MS
spectra for peptide identification. Proteomics 2009,9, 49784984.
(16) Kall, L.; Canterbury, J. D.; Weston, J.; Noble, W. S.; MacCoss,
M. J. Semi-supervised learning for peptide identification from shotgun
proteomics datasets. Nat. Methods 2007,4, 923925.
(17) Fischer, L.; Rappsilber, J. Quirks of Error Estimation in Cross-
Linking/Mass Spectrometry. Anal. Chem. 2017,89, 38293833.
(18) Walzthoeni, T.; Claassen, M.; Leitner, A.; Herzog, F.; Bohn, S.;
Forster, F.; Beck, M.; Aebersold, R. False discovery rate estimation for
cross-linked peptides identified by mass spectrometry. Nat. Methods
2012,9, 901903.
(19) Maiolica, A.; Cittaro, D.; Borsotti, D.; Sennels, L.; Ciferri, C.;
Tarricone, C.; Musacchio, A.; Rappsilber, J. Structural analysis of
multiprotein complexes by cross-linking, mass spectrometry, and
database searching. Mol. Cell. Proteomics 2007,6, 22002211.
(20) Cox, J.; Mann, M. MaxQuant enables high peptide
identification rates, individualized p.p.b.-range mass accuracies and
proteome-wide protein quantification. Nat. Biotechnol. 2008,26,
13671372.
(21) Tyanova, S.; Temu, T.; Cox, J. The MaxQuant computational
platform for mass spectrometry-based shotgun proteomics. Nat.
Protoc. 2016,11, 23012319.
(22) Sturm, M.; Bertsch, A.; Gropl, C.; Hildebrandt, A.; Hussong,
R.; Lange, E.; Pfeifer, N.; Schulz-Trieglaff, O.; Zerck, A.; Reinert, K.;
et al. OpenMS - an open-source software framework for mass
spectrometry. BMC Bioinf. 2008,9, 163.
(23) Rost, H. L.; Sachsenberg, T.; Aiche, S.; Bielow, C.; Weisser, H.;
Aicheler, F.; Andreotti, S.; Ehrlich, H.-C.; Gutenbrunner, P.; Kenar,
E.; et al. OpenMS: a flexible open-source software platform for mass
spectrometry data analysis. Nat. Methods 2016,13, 741748.
(24) Craig, R.; Beavis, R. C. TANDEM: matching proteins with
tandem mass spectra. Bioinformatics 2004,20, 14661467.
(25) Eng, J. K.; Jahan, T. A.; Hoopmann, M. R. Comet: an open-
source MS/MS sequence database search tool. Proteomics 2013,13,
2224.
(26) Geer, L. Y.; Markey, S. P.; Kowalak, J. A.; Wagner, L.; Xu, M.;
Maynard, D. M.; Yang, X.; Shi, W.; Bryant, S. H. Open mass
spectrometry search algorithm. J. Proteome Res. 2004,3, 958964.
(27) Yuan, Z.-F.; Liu, C.; Wang, H.-P.; Sun, R.-X.; Fu, Y.; Zhang, J.-
F.; Wang, L.-H.; Chi, H.; Li, Y.; Xiu, L.-Y.; et al. pParse: a method for
accurate determination of monoisotopic peaks in high-resolution mass
spectra. Proteomics 2012,12, 226235.
(28) Iacobucci, C.; Sinz, A. To Be or Not to Be? Five Guidelines to
Avoid Misassignments in Cross-Linking/Mass Spectrometry. Anal.
Chem. 2017,89, 78327835.
(29) Giese, S. H.; Belsom, A.; Rappsilber, J. Optimized
Fragmentation Regime for Diazirine Photo-Cross-Linked Peptides.
Anal. Chem. 2016,88, 82398247.
(30) Kastritis, P. L.; OReilly, F. J.; Bock, T.; Li, Y.; Rogon, M. Z.;
Buczak, K.; Romanov, N.; Betts, M. J.; Bui, K. H.; Hagen, W. J.; et al.
Capturing protein communities by structural proteomics in a
thermophilic eukaryote. Mol. Syst. Biol. 2017,13, 936.
Journal of Proteome Research Article
DOI: 10.1021/acs.jproteome.8b00600
J. Proteome Res. 2018, 17, 39233931
3930
(31) Kessner, D.; Chambers, M.; Burke, R.; Agus, D.; Mallick, P.
ProteoWizard: open source software for rapid proteomics tools
development. Bioinformatics 2008,24, 25342536.
(32) Weisser, H.; Nahnsen, S.; Grossmann, J.; Nilse, L.; Quandt, A.;
Brauer, H.; Sturm, M.; Kenar, E.; Kohlbacher, O.; Aebersold, R.; et al.
An automated pipeline for high-throughput label-free quantitative
proteomics. J. Proteome Res. 2013,12, 16281644.
(33) Deutsch, E. W.; Mendoza, L.; Shteynberg, D.; Slagel, J.; Sun, Z.;
Moritz,R.L.Trans-ProteomicPipeline,astandardizeddata
processing pipeline for large-scale reproducible proteomics infor-
matics. Proteomics: Clin. Appl. 2015,9, 745754.
(34) Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R. Empirical
Statistical Model To Estimate the Accuracy of Peptide Identifications
Made by MS/MS and Database Search. Anal. Chem. 2002,74, 5383
5392.
(35) Vizcaíno, J. A.; Csordas, A.; del-Toro, N.; Dianes, J. A.; Griss, J.;
Lavidas, I.; Mayer, G.; Perez-Riverol, Y.; Reisinger, F.; Ternent, T.;
et al. 2016 update of the PRIDE database and its related tools. Nucleic
Acids Res. 2016,44, D44756.
Journal of Proteome Research Article
DOI: 10.1021/acs.jproteome.8b00600
J. Proteome Res. 2018, 17, 39233931
3931