Document [original]

In-Search Assignment of Monoisotopic Peaks Improves the

Identiﬁcation of Cross-Linked Peptides

Swantje Lenz,

†

Sven H. Giese,

†

Lutz Fischer,

‡

and Juri Rappsilber*

,†,‡

†

Bioanalytics, Institute of Biotechnology, Technische Universitat Berlin, 13355 Berlin, Germany

‡

Wellcome Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom

SSupporting Information

ABSTRACT: Cross-linking/mass spectrometry has undergone

a maturation process akin to standard proteomics by adapting

key methods such as false discovery rate control and

quantiﬁcation. A poorly evaluated search setting in proteomics

is the consideration of multiple (lighter) alternative values for

the monoisotopic precursor mass to compensate for possible

misassignments of the monoisotopic peak. Here, we show that

monoisotopic peak assignment is a major weakness of current

data handling approaches in cross-linking. Cross-linked

peptides often have high precursor masses, which reduces the

presence of the monoisotopic peak in the isotope envelope.

Paired with generally low peak intensity, this generates a challenge that may not be completely solvable by precursor mass

assignment routines. We therefore took an alternative route by ‘”in-search assignment of the monoisotopic peak”in the cross-

link database search tool Xi (Xi-MPA), which considers multiple precursor masses during database search. We compare and

evaluate the performance of established preprocessing workﬂows that partly correct the monoisotopic peak and Xi-MPA on

three publicly available data sets. Xi-MPA always delivered the highest number of identiﬁcations with ∼2 to 4-fold increase of

PSMs without compromising identiﬁcation accuracy as determined by FDR estimation and comparison to crystallographic

models.

KEYWORDS: cross-linking, mass spectrometry, data processing, proteomics, software, structural proteomics, BS3, SDA, peptides,

monoisotopic mass

■INTRODUCTION

Several approaches have been utilized to increase the numbers

of identiﬁed cross-links, for example enriching for cross-linked

peptides,

1−4

using diﬀerent proteases

1,5,6

or optimizing

fragmentation methods.

7,8

In parallel with experimental

developments, data analysis has also progressed to extract

even more cross-links to be used as distance restraints for

modeling of proteins and their complexes.

9,10

Search software

has been designed for the identiﬁcation of cross-linked

peptides, for example Kojak,

xQuest,

pLink,

XlinkX,

or Xi.

In addition, cross-linking workﬂows can make use of

preprocessing methods to improve data quality and reduce ﬁle

sizes,

as well as postprocessing methods to ﬁlter out false

identiﬁcations

11,16

and custom-tailored false discovery rate

(FDR) estimation.

17−19

Preprocessing can improve peptide

identiﬁcation by correcting the MS1 precursor ion m/zand

simplifying MS2 fragment spectra. Established proteomics

software perform such preprocessing, including MaxQuant

20,21

and OpenMS.

22,23

For example, MaxQuant performs a variety

of preprocessing steps: it corrects the precursor m/zby an

intensity-weighted average if a suitable peptide feature is found,

reassigns the monoisotopic peak and contains options for

intensity ﬁltering of MS2 peaks. Despite such correction of the

precursor mass, many linear search engines have integrated the

possibility of considering multiple monoisotopic peaks during

search.

24−26

However, the beneﬁts of this search feature are

currently unclear. It seems that the assignment of mono-

isotopic mass for tryptic peptides is already achieved

adequately either during acquisition or as part of preprocess-

ing.

Cross-linked peptides have characteristics that may render

MS1 monoisotopic precursor mass assignment as used for

linear peptides nonoptimal: high-charge states, large masses,

and low abundances. Several cross-link search engines include

MS1 correction in their pipeline: pLink corrects monoisotopic

peaks based on previous work with linear peptides,

however

does not include a parameter for searching multiple precursor

masses. Kojak averages precursor ion signals of neighboring

scans to create a composite spectrum and infer the true

monoisotopic mass of the precursor. If this step fails, precursor

masses up to −2 Da lighter are searched.

For previous

searches in Xi, MaxQuant was used to perform preprocessing.

Neither xQuest nor XLinkX describe precursor correction in

their workﬂow documentation and there is no option for

additionally searched masses available in the respective search

Received: August 4, 2018

Published: October 8, 2018

Article

pubs.acs.org/jpr

Cite This: J. Proteome Res. 2018, 17, 3923−3931

J. Proteome Res. 2018, 17, 3923−3931

This is an open access article published under a Creative Commons Attribution (CC-BY)

License, which permits unrestricted use, distribution and reproduction in any medium,

provided the author and source are cited.

Downloaded via TU BERLIN on March 29, 2023 at 13:03:04 (UTC).

See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

parameters. We are not aware of a detailed evaluation of the

impact of diﬀerent preprocessing techniques for cross-link

identiﬁcation, independent of the search software. Correcting

the monoisotopic mass of precursors, although acknowledged

as an issue,

11,28

awaits systematic evaluation.

In this study, we show that errors in assigning monoisotopic

peaks during data acquisition are frequent for cross-linked

peptides because of their size and generally low abundance.

This adversely aﬀects their identiﬁcation. We show that

standard software suites, MaxQuant and OpenMS correct

monoisotopic precursor masses of cross-linked peptides with

variable success. We then implement an option in Xi to

consider multiple precursor masses during search, to minimize

the impact of false monoisotopic precursor mass assignment

on the identiﬁcation of cross-links.

■METHODS

Data Sets

In this study, we used three publicly available data sets (Table

1). The three data sets were chosen to reﬂect a range of

applications of cross-linking mass spectrometry as well as a

range of data complexity: the ﬁrst data set is Human Serum

Albumin (HSA) cross-linked with succinimidyl 4,4-azipenta-

noate (SDA) and fragmented using ﬁve diﬀerent methods

(PXD003737).

The second data set is a pooled pseudocom-

plex sample with seven separately cross-linked proteins with

bis(sulfosuccinimidyl) suberate (BS3) (PXD006131).

This

data set includes data from four diﬀerent fragmentation

methods. The third data set is the most complex sample,

composed of 15 size exclusion chromatography fractions of

Chaetomium thermophilum lysate cross-linked with BS3 and

fragmented only with HCD (PXD006626).

The ﬁrst and last

size exclusion fractions were used to optimize the search

parameters for this data set. All samples were analyzed on an

Orbitrap Fusion Lumos Tribrid mass spectrometer (Thermo

Fisher Scientiﬁc, San Jose, CA) using Xcalibur (version 2.0 and

2.1).

Preprocessing

Raw ﬁles were preprocessed independently using MaxQuant

(1.5.5.30), OpenMS (2.0.1) and the ProteoWizard

tool

msconvert (3.0.9576) for comparison. Scripts automating the

preprocessing, search and evaluation were written in Python

(2.7).

The essential steps during the preprocessing can be divided

into two parts: (1) correction of the m/zor charge of the

precursor peak for MS2 spectra and (2) denoising of MS2

spectra. MaxQuant and OpenMS both try to correct the

precursor information via additional feature ﬁnding steps, i.e.

identifying a peptide feature from the retention time, m/zand

intensity domain of the LC-MS run. Additionally, denoising of

the MS2 spectra is performed by simply ﬁltering the most

intense peaks in deﬁned m/zwindows. The preprocessing is by

default enabled in MaxQuant and was run using the partial

processing option (steps 1−5) with default settings except for

inactivated “deisotoping”and “top peaks per 100 Da”, which

was set to 20. The OpenMS preprocessing workﬂow includes

centroiding, feature ﬁnding,

precursor correction (mass and

charge) using the identiﬁed features and MS2 denoising as

described above (Supporting Information (SI) Figure S1).

Msconvert was used to convert the raw ﬁles to mgf ﬁles

without any correction. These peak ﬁles were denoted as

“uncorrected”and used as our baseline to quantify improve-

ments in the subsequent database search. For the “in-search

assignment of the monoisotopic peak”in Xi (Xi-MPA), we

used msconvert to convert raw ﬁles to mgf ﬁles and included a

MS2 peak ﬁlter for the 20 most intense peaks in a 100 m/z

window.

Data Analysis

Peak ﬁles were searched separately in Xi (1.6.731) with the

following settings: MS accuracy 3 ppm, MS/MS accuracy 10

ppm, oxidation of methionine as variable modiﬁcation, tryptic

digestion, two missed cleavages. For samples cross-linked with

SDA, linkage sites were allowed on lysine, serine, tyrosine,

threonine, and protein n-terminus on one end and all amino

acids on the other end of the cross-linker. Variable

modiﬁcations were monolink SDA (110.048 Da), SDA loop-

links (82.0419 Da), SDA hydrolyzed (100.0524 Da), SDA

oxidized (98.0368 Da)

as well as carbamidomethylation on

cysteine. For searches with BS3, linkage sites were lysine,

serine, threonine, tyrosine, and the protein n-terminus.

Carbamidomethylation on cysteine was set as ﬁxed mod-

iﬁcation. Allowed variable modiﬁcations of the cross-linker

were aminated BS3 (155.0946 Da), hydrolyzed BS3 (156.0786

Da) and loop-linked BS3 (138.0681 Da). For collision-induced

dissociation (CID) and beam-type CID, also referred to as

higher-energy C-trap dissociation (HCD), b- and y-ions were

searched for, whereas for electron transfer dissociation (ETD)

c- and z-ions were allowed. For ETciD and EThcD, b-, c-, z-,

and y-ions were allowed. The HSA and pseudocomplex data

sets were searched against the known proteins in the sample.

For each protein fraction of the C. thermophilum data set, the

databases of the original publication were used, where a

database was created for each fraction by taking the most

abundant proteins (iBAQ value above 106). For searches

employing Xi-MPA, the parameter "missing_isotope_peaks"

was set to the respective mass range searched. Data sets 1 and

2 were searched with a reversed decoy database, whereas data

set 3 was searched with a shuﬄed decoy database due to

palindromic sequences. For the reversed decoy database,

lysines and arginines were swapped with the preceding amino

acid before peptide generation.

17,20

For cross-linking, there are diﬀerent information levels:

PSMs, peptide pairs, residue pairs (links) and protein pairs.

The false discovery rate (FDR) can be calculated on each one

of these levels and should be reported for the level at which the

information is given.

The FDR was calculated as described in

Fischer et al.

using xiFDR (1.0.14.34) according to the

following equation: =

−

FDR TD DD

TT . A 5% PSM level cutoﬀ

was imposed. The setting “uniquePSMs”was enabled and the

FDR was calculated separately on self-and between links.

Minimal peptide length was set to 6. In data set 2, identiﬁed

cross-linked residues were mapped to the crystal structure of

the respective protein and the Euclidian distance between the

alpha-carbons was calculated. Structures were downloaded

Table 1. Overview of Datasets Used

data set sample database size

reference

1 HSA 1 Giese et al. 2016

2 pseudocomplex 7 Kolbowski et al. 2017

3C. thermophilum 198−400

Kastritis et al. 2017

Database size refers to the number of proteins in the database.

Multiple size exclusion chromatography fractions (n= 15).

Journal of Proteome Research Article

DOI: 10.1021/acs.jproteome.8b00600

J. Proteome Res. 2018, 17, 3923−3931

3924

from the PDB (IDs: 1AO6, 5GKN, 2CRK, 3NBS, 1OVT,

2FRJ). Kojak (1.5.5) was run via the Trans-Proteomic Pipeline

(5.1.0)

using default settings except: MS1 resolution 120

000, BS3 allowed on lysine, serine, threonine, tyrosine, and the

protein n-terminus, aminated BS3 (155.0946 Da) as variable

modiﬁcation of the cross-linker, 3 ppm mass tolerance on MS1

level. For the uncorrected search, the isotope error was set to 0

and precursor reﬁnement was disabled. PSMs were validated

using PeptideProphet

and FDR calculated as described above

on the resulting probability.

The mass spectrometry data have been deposited to the

ProteomeXchange Consortium via the PRIDE

partner

repository with the data set identiﬁer PXD011121. For

transparency, python scripts are available on GitHub under:

https://github.com/Rappsilber-Laboratory/Xi-MPA_scripts.

■RESULTS AND DISCUSSION

We evaluated the impact on cross-link identiﬁcation in Xi of

changing the precursor monoisotopic mass that was initially

assigned during data acquisition (“uncorrected”). In this

analysis, MaxQuant and OpenMS were used as preprocessing

tools. We used three diﬀerent data sets that diﬀer in complexity

and fragmentation regimes. To measure the improvements

from using the preprocessing tools, a simple conversion from

raw ﬁles to mgf format was done with msconvert and used as a

baseline. Note that in the spectrum header, there are two m/z

values: the trigger mass of the MS2 and the assigned

monoisotopic peak of the isotope cluster. Msconvert extracts

the assigned monoisotopic mass. Processed data were searched

separately in Xi and evaluated on PSM level or link (= unique

residue pair) level, with a 5% FDR. Finally, the newly

implemented in-search assignment of monoisotopic peaks in Xi

was compared to the elaborate preprocessing pipelines in

OpenMS and MaxQuant.

Preprocessing Increases the Number of Cross-Link PSMs

by Finding the Correct Monoisotopic Peak

The data sets were preprocessed in MaxQuant and OpenMS

and numbers of identiﬁed PSMs were compared to those

obtained using uncorrected data. Data sets 1 (HSA) and 2

(pseudocomplex) were acquired with diﬀerent acquisition

methods. For comparability to data set 3 (complex mixture),

we focused on the HCD acquired data. Cross-links between

proteins were excluded, either because they were experimen-

tally not possible (data set 2) or observed in too low numbers

for reliable FDR calculation (data set 3).

For uncorrected data, 672, 354, and 2157 cross-link PSMs

resulted for the HSA data set (data set 1), pseudocomplex

(data set 2), and ﬁrst and last fractions of C. thermophilum

respectively (data set 3). Both preprocessing approaches

improved numbers of identiﬁed PSMs for all data sets:

Preprocessing in MaxQuant led to 1127 (68% increase), 966

(173% increase), and 2966 (38% increase), while for OpenMS,

1044 (55% increase), 598 (69% increase) and 2394 (11%

increase) PSMs were identiﬁed (Figure 1A).

We assessed the gains in identiﬁed PSMs of preprocessed

data compared to uncorrected data (focusing on data set 2)

regarding three forms of precursor correction: (1) correction

of the monoisotopic mass, (2) charge state correction, and (3)

small corrections of the m/zvalue based on averaging the m/z

values across the peptide feature (Figure 1B). Precursor mass

and charge state of spectra identiﬁed solely in MaxQuant-Xi

were compared to their counterparts when searching

uncorrected data in Xi. Of the 756 newly identiﬁed spectra,

686 (91%) had a diﬀerent monoisotopic precursor mass.

Precursors were primarily corrected to lighter masses by

MaxQuant, that is, the monoisotopic peak correction by −1

(208 spectra), −2 (215 spectra), −3 (149 spectra), and −4Da

(62 spectra). Greater shifts (−5to−7 Da) only occurred 30

times, and corrections to heavier masses were observed 22

times. Only 30 spectra (4%) were corrected in their charge

state. For the 60 spectra (8%) without correction in charge

state or monoisotopic peak, we only identiﬁed nine spectra

that had a higher error than 3 ppm before preprocessing,

indicating a small correction of the initial precursor m/z(by

averaging of peptide feature peaks). The main proportion of

these identiﬁcations is likely a product of noise removal in MS2

spectra or small changes in the score distribution. Similarly, for

OpenMS-Xi, the monoisotopic peak correction had the

greatest impact: Of the 314 spectra that OpenMS added

over uncorrected data, 139 were precursor corrected by −1Da

and 108 to −2 Da. In contrast to MaxQuant, corrections to −3

or lighter were not observed, which might explain the higher

number of identiﬁcations obtained with MaxQuant-Xi.

Figure 1. Correction of the monoisotopic peak is crucial in cross-link identiﬁcation. (A) The data sets were preprocessed using MaxQuant and

OpenMS, leading to more identiﬁed PSMs in all cases. Fold changes from uncorrected data (msconvert conversion of Xcalibur data) were

calculated for each ﬁle separately and the mean plotted. Error bars represent the standard error of the mean between diﬀerent acquisitions (HSA: n

= 3, pseudocomplex: n=3,C. thermophilum:n= 8). (B) The majority of additional identiﬁcations after preprocessing are due to correction of the

precursor mass to lighter monoisotopic masses. Spectra that are unique to MaxQuant preprocessed searches of HCD acquisitions from data set 2

were evaluated in terms of precursor correction. The main proportion of the gain was corrected to lighter masses of up to −3 Da, while charge state

correction or correction to heavier masses rarely occurred. (C) Isotope cluster of a corrected precursor of m/z992.71 (z=5,m= 4958.6 Da) was

solely identiﬁed in MaxQuant preprocessed results. In OpenMS preprocessed and uncorrected data, the wrong monoisotopic mass was selected for

unknown reasons.

Journal of Proteome Research Article

DOI: 10.1021/acs.jproteome.8b00600

J. Proteome Res. 2018, 17, 3923−3931

3925

For data set 1 and 3, the gains of preprocessing are smaller

than for data set 2. The median peptide mass of data set 1

(3368 Da) is smaller than the median mass of data set 2 (3946

Da) and we later show that this is a major factor in precursor

mass assignment. This reﬂects in the distribution of lighter

masses assigned: 46%, 33%, and 21% were corrected by −1 Da,

−2 Da, and to even lighter peaks, respectively. Data set 3 was

acquired with a diﬀerent version of Xcalibur, for which we saw

better mass assignment than for earlier versions (data not

shown). As an implication of this, already 67% of the lighter

corrected masses are shifted by −1 Da, 21% by −2 Da, 12% to

even lighter masses. However, we were not able to follow up

on this to our content, since the source code of the vendor

software is not available.

In summary, preprocessing, especially monoisotopic peak

correction, leads to a notable increase in identiﬁcations. Using

the 3-dimensional peptide feature is advantageous compared to

on-the-ﬂy detection of the monoisotopic peak. If the preceding

MS1 spectrum was acquired during the beginning (or end) of

the elution proﬁle of a peptide, the intensity will be low. Thus,

the monoisotopic peak might not even be detectable at the

time of fragmentation. For large (cross-linked) peptides, this

eﬀect might be exacerbated by the monoisotopic peak usually

being less intense than other isotope peaks. Therefore, using

the additional information from the retention time domain will

be beneﬁcial. The same feature information can also be used to

determine or validate the assigned charge state of the

precursor. However, the instrument software almost always

assigned the same charge state as MaxQuant or OpenMS.

Thus, the main advantage for identifying cross-linked peptides

arises from monoisotopic peak correction.

Interestingly, OpenMS and MaxQuant did not always agree

on or ﬁnd the same monoisotopic peak (Figure 1C). Of the

total MaxQuant-corrected spectra with a diﬀerent mono-

isotopic mass, 81% were not corrected and 6% corrected

diﬀerently with OpenMS. Vice versa, 15% of the monoisotopic

peaks corrected by OpenMS were not corrected by MaxQuant

and 25% were corrected diﬀerently. Both MaxQuant and

OpenMS have their own implementations for precursor

correction - therefore, there might be instances where

MaxQuant is able to ﬁnd a corresponding peptide feature

where OpenMS does not and vice versa. Although OpenMS

did not lead to the same improvements in the number of

identiﬁcations as MaxQuant, it did correct some precursors

that the latter did not. We therefore suspect that there are also

precursors with a falsely assigned monoisotopic peak that were

corrected with neither algorithm. Furthermore, 3-dimensional

detection of peptide features is challenging for low intensity

peptides. In conclusion, there likely remain falsely assigned

monoisotopic peaks in the data, ultimately leading to missed or

false identiﬁcations.

In-Search Monoisotopic Peak Assignment Increases the

Number of Identiﬁcations

We observed multiple cases where MaxQuant and OpenMS

disagreed in their monoisotopic peak choice, indicating that

the problem of monoisotopic peak assignment (MPA) cannot

be solved easily at MS1 level. Indeed, we found instances

where the monoisotopic peak is simply not distinguishable

from noise, so a feature-based correction would not be feasible.

Nevertheless, the associated MS2 spectra could be matched to

a cross-linked peptide when considering multiple diﬀerent

monoisotopic masses during search. This shows that the extra

information on obtaining a peptide-spectrum match is

advantageous to MPA over considering MS1 information

alone. Therefore, we implemented a monoisotopic peak

assignment in Xi: for each MS2 spectrum, multiple precursor

masses are considered during a single search and the highest

scoring peptide-pair assigns the precursor mass. Note that this

is diﬀerent from simply searching with a wide mass error for

MS1. The mass accuracy of MS1 is minimally compromised as

multiple candidates for the monoisotopic mass are taken and

considered with the original mass accuracy of the measure-

ment.

To ﬁnd a good trade-oﬀbetween increased search space and

sensitivity, we tested diﬀerent mass range settings on the data

sets. For data set 2 (HCD subset), the number of PSMs

increased with ranges up to −5 Da on the considered

monoisotopic masses (Figure 2A). However, the increase in

identiﬁcations from −4to−5 Da was only 3% and considering

the increase in search time, we continued with a maximal

correction to −4 Da as the optimal setting for this data set. Xi-

MPA yielded 1508 PSMs, which is a 326% increase compared

to searching uncorrected data and a 56% increase compared to

MaxQuant-Xi. Similar improvements are observed for the

other fragmentation methods in this data set (SI Figure S2).

Additionally, we corrected up to −7 Da to test if a large

increase in search space increases random spectra matches as

measured by the target-decoy approach. The number of

identiﬁcations at 5% FDR decreased only slightly compared to

−5Da(−1%), but still led to more identiﬁcations than up to

−4 Da (3%). In the HSA data set, Xi-MPA with up to −4Da

increased the number of identiﬁed PSMs by 170% compared

to uncorrected data (Figure S3).

As a ﬁnal evaluation of in-search monoisotopic peak

assignment, we searched the complete data set of C.

thermophilum. We used 0 to −3 Da as the range of Xi-MPA,

since an initial analysis of the ﬁrst and last fraction of the C.

thermophilum data set returned a similar number of

identiﬁcations when running Xi-MPA up to −4Daor−3

Da (Figure S4). As a comparison, we took the original peak

ﬁles obtained from PRIDE. The FDR was calculated separately

on self-and between links, enabled boosting (automatic

preﬁltering on PSM and peptide pair level

), with a minimum

of three fragments per peptide and a minimal delta score of 0.5.

For the original peak ﬁles, which were preprocessed in

MaxQuant, we identiﬁed 3848 PSMs, 2594 peptide pairs and

1653 cross-links, with a 5% FDR on each respective level

(Figure 2B). Xi-MPA resulted in 4952 PSMs (29% increase),

3566 peptide pairs (37% increase), and 2273 cross-links (38%

increase).

Next, we looked into the complementarity of search results

with the diﬀerent approaches, using data set 2 at 5% link-FDR.

Preprocessing via MaxQuant and OpenMS led to 172 and 158

links, respectively, while Xi-MPA resulted in 243 links. While

the overlap between links of OpenMS-Xi and MaxQuant-Xi is

only 50%, Xi-MPA identiﬁcations cover 76% of both searches

(Figure 2C). Nineteen and 23 links are uniquely found in

MaxQuant and OpenMS preprocessed data, respectively.

However, there are ﬁve decoy links as well in each unique

set (resulting in a link-FDR of 26% and 22%). For Xi-MPA,

there are 75 unique target links with 12% link-FDR.

Identiﬁcation-based monoisotopic peak assignment as

employed by Xi-MPA results in more identiﬁcations than the

feature-based assignment algorithms of OpenMS and Max-

Quant. Neither OpenMS nor MaxQuant correct all precursor

Journal of Proteome Research Article

DOI: 10.1021/acs.jproteome.8b00600

J. Proteome Res. 2018, 17, 3923−3931

3926

masses that are incorrectly assigned during data acquisition. In

Xi-MPA, spectra are searched with multiple monoisotopic

masses, thereby relying less on the MS1 information. The

quality of the precursor isotope cluster does not contribute to

the decision of monoisotopic mass and spectra for which

correction failed will be identiﬁable. One could hypothesize

that increasing the search space by considering multiple masses

will lead to more false positives, thereby reducing the number

of true identiﬁcations. This is not the case, as we match

substantially more PSMs at constant FDR by considering

alternative monoisotopic masses. As a second plausible caveat,

this approach increases the search time. However, the use of

relatively cheap computational time appears balanced by the

notable increase in identiﬁed cross-links. The optimal range of

additional monoisotopic peaks to search will however be

dependent on complexity and quality of MS1 acquisition and

the instrument software. To reduce the mass range considered

in Xi-MPA, we developed a MS1 level-based approach. For

each precursor, we search lighter isotope peaks in MS1 and use

this to narrow the search space (explained in detail in the SI).

This led to an average of 24% less values to be considered,

while only reducing the number of identiﬁcations by 3%. We

hope that our observation of the monoisotopic peak detection

challenge in cross-linking together with our publicly available

data sets will lead to further improvements in monoisotopic

peak-assignment algorithms in the future, possibly tailored to

cross-link data.

The cross-link search engine Kojak employs a precursor

correction in its pipeline.

As we could not ﬁnd a detailed

evaluation of the impact of precursor correction in Kojak, we

searched the HCD data of the pseudocomplex data set without

correction as well as with their default correction settings. We

focused on FDR 10% data as there were too few identiﬁcations

for a reliable calculation of FDR 5%. Just 171 cross-linked

PSMs passed for the unprocessed data, whereas for the default

search, 1088 PSMs passed (536% increase). Of those, 862

(79%) were corrected in their monoisotopic precursor peak.

These results support our observations with Xi.

In-Search Monoisotopic Peak Assignment Does Not

Compromise Search Accuracy

Changing the search could lead to several problems. We

already excluded that the increased search space leads to high-

scoring decoy matches that in turn reduce the number of

identiﬁcations at a given FDR cutoﬀ. As an additional

validation, we assessed our results against known PDB

structures using the HCD data from the pseudocomplex data

set (data set 2), at 5% link-FDR. Assuming a crystal structure is

correct, a cross-link can be unexpectedly long either because

the link is false or because of in-solution structural dynamics.

If, however, the proportion of long-distance links in results of

two approaches is identical, then at least the two results have

equal quality.

We ﬁrst tested the results of all three approaches against

crystal structures. Residue pairs were mapped to PDB

structures and the distance between the two alpha-carbons

was calculated (see Methods). Thirty Å was set as the maximal

distance for BS3, links with a greater distance were classiﬁed as

long-distance. In this evaluation, we excluded the protein C3B

because its ﬂexible regions make it unsuitable for this analysis.

For MaxQuant and OpenMS preprocessed results, 11.8% and

6.1% long-distance cross-links were identiﬁed, respectively. In

Xi-MPA, 8.1% long distance cross-links were identiﬁed (Figure

3A). Of the links uniquely identiﬁed through Xi-MPA, only

5.3% were long distance links. Therefore, Xi-MPA as such does

not lead to an enrichment in long-distance cross-links.

However, it could be that mass-corrected precursors tend to

have a higher proportion of long-distance links. We therefore

split the Xi-MPA results into ﬁve groups corresponding to the

monoisotopic mass change (0, −1, −2, −3, −4 Da) and looked

at their match to crystal structures. If a link originated from

PSMs with diﬀerent mass corrections, all of those were

considered. We conducted a “nonparametric ANOVA”

(Kruskal−Wallis test) to detect any signiﬁcant changes in the

distance distributions of Xi-MPA identiﬁcations with diﬀerent

Figure 2. In-search monoisotopic peak assignment outperforms

preprocessing. (A) Performance of Xi-MPA on data set 2. HCD data

from the pseudocomplex data set were searched assuming diﬀerent

ranges of missing monoisotopic peaks. With increased ranges, the

number of identiﬁed PSMs also increases. (B) Performance of Xi-

MPA on the complete C. thermophilum data set. All 15 fractions were

searched with the original preprocessed data as well as with Xi-MPA.

Xi to residue pairs gained from Xi-MPA (data set 2). Numbers in

brackets are the proportion of decoys in the respective regions.

Journal of Proteome Research Article

DOI: 10.1021/acs.jproteome.8b00600

J. Proteome Res. 2018, 17, 3923−3931

3927

shifts and decoy distribution. However, we fail to reject the

null hypothesis at the predetermined signiﬁcance level of α=

0.05 (p-value: 0.13), indicating that the distance distributions

for all subsets are similar. This matches the visual inspection of

distance distributions (Figure 3B). Furthermore, all individual

distance distributions were signiﬁcantly smaller than the

derived reference distribution (one-sided Wilcoxon test, see

SI Table S5). In conclusion, we do not see any evidence of in-

search monoisotopic mass assignment leading to increased

conﬂicts with crystal structures. We then evaluated the eﬀect of

in-search monoisotopic mass assignment on PSM quality as

assessed by the search score. First, we compared the scores of

PSMs with a mass shift (Xi-MPA identiﬁcations) to the scores

of the same spectrum without a mass shift (uncorrected data).

While scores with shifted mass have a median of 6.7, the

median score is 2.3 when using the uncorrected masses (Figure

3C). As one would expect from an increased search space, the

scores of decoy hits also improve, albeit only marginally. We

ﬁnd that the score diﬀerence of target PSMs is signiﬁcantly

larger than of decoy PSMs (one-sided Wilcoxon test, p-value:

<2.2 ×10−16). We then turned to a “decoy mass search”for

which we not only searched the range from 0 Da to −4 Da, but

also +1 Da to +4 Da. Assuming the monoisotopic peak in the

uncorrected data is rarely lighter than the true monoisotopic

peak, the new identiﬁcations should score like decoy

identiﬁcations. Indeed, the resulting score distributions for

targets with a positive mass shift follow the decoy distribution

(Figure 3D). In contrast, identiﬁcations with a negative shift

are distributed like the identiﬁcations without mass shift. In

conclusion, in-search monoisotopic mass change leads to

signiﬁcantly improved scores with a distribution that resembles

that of precursors that did not see a mass change (0 Da).

Importantly, these improvements are not random since an

equally large search space increase (+1 Da to +4 Da) results in

a completely diﬀerent score distribution that resembles the

decoy distribution but not the distribution of identiﬁcations

without a mass shift.

Heavy and Low Intensity Peptides Are Corrected More

Frequently

One would especially expect to observe shifted mass

assignment for peptides of high mass and low abundance.

For large peptides (approximately >2000 Da), the mono-

isotopic peak will not be the most intense peak in the isotope

cluster. If the peptide is of low abundance, the monoisotopic

Figure 3. Matches with and without in-search mass shift show similar

quality metrics. (A) Evaluation of Xi-MPA derived links on crystal

structures (data set 2). Distances between αcarbon atoms of

identiﬁed cross-linked residues in the crystal structure of the proteins

are shown in light gray while a reference distribution of all possible

Figure 3. continued

pairwise C-alpha distances of cross-linkable residues is shown in dark

gray. Thirty Å is set as a limit, above which links are deﬁned as long

distance. (B) Distance distribution of identiﬁcations with diﬀerent

mass corrections. There was no signiﬁcant diﬀerence between the

diﬀerent mass shifts, while all had a signiﬁcant diﬀerence to the decoy

distribution. (C) PSM scores of spectra identiﬁed with a mass shift are

signiﬁcantly higher than the corresponding score in uncorrected data.

Shown are the score distributions of uncorrected and Xi-MPA results,

as well as the corresponding decoy distribution. (D) Score

distribution of PSM matches of the “decoy mass search”.

Identiﬁcations with a positive mass shift generally follow the decoy

distribution (note that there are correct identiﬁcations with a positive

mass shift, albeit few, see Figure 1B) while identiﬁcations with a

negative shift resemble unshifted identiﬁcations. The scores of

negative-shifted PSMs are signiﬁcantly higher than those of positive-

shifted PSMs (one-sided Wilcoxon test, p-value: <2.2×10−16).

Journal of Proteome Research Article

DOI: 10.1021/acs.jproteome.8b00600

J. Proteome Res. 2018, 17, 3923−3931

3928

peak may be of too low intensity to be detected. We therefore

analyzed the monoisotopic peak assignment in Xi-MPA

regarding the precursor mass and intensity. Indeed, precursors

with higher masses are more often corrected to lighter

monoisotopic peaks (Figure 4A). While the median precursor

mass for uncorrected matches is 2952 Da, for matches

corrected by −2 Da it is already 4062 Da and for −4Dait

is 4684 Da. Of the identiﬁcations with a mass above 3000 Da,

88% were identiﬁed with a lighter mass. For precursors lighter

than 3000 Da, the proportion was 42%. Like for mass

dependency, there is a trend toward larger correction ranges

for lower intensity peptides (SI Figure S5). However, this is

less strong than it is for precursor mass.

When evaluating the newly matched precursors of Xi-MPA,

the advantage of not having to rely on MS1 identiﬁcation is

evident. Matches not made through any of the preprocessing

methods are generally much less intense (Figure 4B) and larger

(SI Figure S6) than matches that are common to all

approaches. Manual analysis of isotope clusters of corrected

precursors from data set 2 revealed many cases where the

monoisotopic peak was present in the MS1 spectrum but was

not recognized during acquisition. For some, this might be due

to the peak being of low intensity and discarded as noise, or

because of other interfering peaks (Figure 4C). However, there

are also cases where the cluster is well resolved (Figure 1C).

Without details of how the instrument software determines the

monoisotopic peak, a full evaluation is diﬃcult. For a complete

list of precursor m/zfor Xi-MPA identiﬁcations and

corresponding m/zof uncorrected, MaxQuant and OpenMS

data, see SI Table S1−S3.

Note that in many acquisition methods, the machine only

fragments peaks where it can successfully identify a full isotope

cluster. Therefore, there might be instances of cross-linked

peptides not being fragmented because of insuﬃcient isotopic

cluster quality, leading to lost identiﬁcations.

■CONCLUSION

The size and low abundance of cross-linked peptides leads to

frequent misassignment of the monoisotopic mass by instru-

ment software, which in some instances even escapes

correction by sophisticated correction approaches employed

by MaxQuant and OpenMS. Considering multiple mono-

isotopic masses during search increases the number of cross-

link PSMs 1.8−4.2-fold, without compromising search

accuracy as judged by multiple assessment strategies including

comparison of the gains against solved protein structures. The

problem of wrongly assigned monoisotopic peaks will have an

impact on most cross-link search engines since these all rely in

some part on the precursor mass. The extent of the

misassignment will however be sample and software-depend-

ent. Even with improved acquisition or correction software,

there will remain instances where the monoisotopic peak

cannot be determined correctly before searching due to low

intensity. Our search-assisted monoisotopic peak assignment

provides a general solution to this problem by relying on MS2

identiﬁcation in addition to precursor information.

■ASSOCIATED CONTENT

SSupporting Information

The Supporting Information is available free of charge on the

ACS Publications website at DOI: 10.1021/acs.jproteo-

me.8b00600.

Table S4: Xi-MPA mass range reduction. Figure S1:

OpenMS preprocessing workﬂow. Figure S2: Perform-

ance of Xi-MPA on EThcD, CID, and ETciD

acquisitions of the pseudocomplex data set. Table S5:

Summary for conducted signiﬁcance tests for Figure 3B

in the main text. Figure S3: Performance of Xi-MPA on

HCD data of the HSA data set. Figure S4: Performance

of Xi-MPA on the test fractions of the C. thermophi-

lumdata set. Figure S5: Dependency of the monoisotopic

mass correction on precursor intensity. Figure S6:

Dependency of identiﬁcations in Xi-MPA on mass

(PDF)

Table S1: Precursor m/zof diﬀerent processing methods

of data set 1. Table S2: Precursor m/zof diﬀerent

processing methods of data set 2. Table S3: Precursor

m/zof diﬀerent processing methods of data set 3.

Supporting Information: MS1 based mass range

reduction (XLS)

Figure 4. Correction is dependent on precursor mass and intensity. (A) Box plot of the precursor mass and monoisotopic mass correction of

identiﬁed PSMs after Xi-MPA. PSMs with higher mass more often require monoisotopic mass correction to lighter masses. Whiskers show the 5

and 95% quantiles of the data. Asterisks denote the signiﬁcance calculated by a one tailed ttest (****:p-value: <0.0001). (B) Precursors of cross-

linked PSMs identiﬁed in all three approaches, MaxQuant-Xi, OpenMS-Xi, and Xi-MPA (“common”), are more intense than precursors of PSMs

that are only identiﬁed in Xi-MPA. In other words, successful correction happens more often for abundant precursors, whereas Xi-MPA identiﬁes

precursors of lower intensity. (C) MS1 isotope cluster of a cross-linked peptide. The monoisotopic peak of m/z758.16 (z=4,m= 3028.6 Da) was

falsely assigned during acquisition and not corrected in any preprocessing approach. Xi-MPA identiﬁes a PSM for a precursor with a mass that is 3

Da lighter.

Journal of Proteome Research Article

DOI: 10.1021/acs.jproteome.8b00600

J. Proteome Res. 2018, 17, 3923−3931

3929

■AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected].

ORCID

Swantje Lenz: 0000-0002-8839-5371

Sven H. Giese: 0000-0002-9886-2447

Lutz Fischer: 0000-0003-4978-0864

Juri Rappsilber: 0000-0001-5999-1310

Notes

The authors declare no competing ﬁnancial interest.

■ACKNOWLEDGMENTS

We thank Dr Francis O’Reilly for comments and helpful

discussions. This work was supported by the Einstein

Foundation, the DFG [RA 2365/4-1], and the Wellcome

Trust through a Senior Research Fellowship to JR [103139]

and a multiuser equipment grant [108504]. The Wellcome

Centre for Cell Biology is supported by core funding from the

Wellcome Trust [203149].

■REFERENCES

(1) Leitner, A.; Reischl, R.; Walzthoeni, T.; Herzog, F.; Bohn, S.;

Forster, F.; Aebersold, R. Expanding the chemical cross-linking

toolbox by the use of multiple proteases and enrichment by size

exclusion chromatography. Mol. Cell. Proteomics 2012,11,

M111.014126.

(2) Chen, Z. A.; Jawhari, A.; Fischer, L.; Buchen, C.; Tahir, S.;

Kamenski, T.; Rasmussen, M.; Lariviere, L.; Bukowski-Wills, J.-C.;

Nilges, M.; et al. Architecture of the RNA polymerase II-TFIIF

complex revealed by cross-linking and mass spectrometry. EMBO J.

2010,29, 717−726.

(3) Tan, D.; Li, Q.; Zhang, M.-J.; Liu, C.; Ma, C.; Zhang, P.; Ding,

Y.-H.; Fan, S.-B.; Tao, L.; Yang, B. et al. Trifunctional cross-linker for

mapping protein-protein interaction networks and comparing protein

conformational states. Elife 2016,5,DOI: DOI: 10.7554/

eLife.12509.

(4) Rampler, E.; Stranzl, T.; Orban-Nemeth, Z.; Hollenstein, D. M.;

Hudecz, O.; Schlogelhofer, P.; Mechtler, K. Comprehensive Cross-

Linking Mass Spectrometry Reveals Parallel Orientation and Flexible

Conformations of Plant HOP2-MND1. J. Proteome Res. 2015,14,

5048−5062.

(5) Mendes, M. L.; Fischer, L.; Chen, Z. A.; Barbon, M.; O’Reilly, F.

J.; Bohlke-Schneider, M.; Belsom, A.; Dau, T.; Combe, C. W.;

Graham, M. et al. An integrated workﬂow for cross-linking/mass

spectrometry. 2018.

(6) Belsom, A.; Schneider, M.; Fischer, L.; Mabrouk, M.; Stahl, K.;

Brock, O.; Rappsilber, J. Blind testing cross-linking/mass spectrom-

etry under the auspices of the 11thcritical assessment of methods of

protein structure prediction (CASP11). Wellcome open research 2016,

1, 24.

(7) Kolbowski, L.; Mendes, M. L.; Rappsilber, J. Optimizing the

Parameters Governing the Fragmentation of Cross-Linked Peptides in

a Tribrid Mass Spectrometer. Anal. Chem. 2017,89, 5311−5318.

(8) Liu, F.; Lossl, P.; Scheltema, R.; Viner, R.; Heck, A. J. R.

Optimized fragmentation schemes and data analysis strategies for

proteome-wide cross-link identification. Nat. Commun. 2017,8,

15473.

(9) Orba

n-Ne

meth, Z.; Beveridge, R.; Hollenstein, D. M.; Rampler,

E.; Stranzl, T.; Hudecz, O.; Doblmann, J.; Schlogelhofer, P.; Mechtler,

K. Structural prediction of protein models using distance restraints

derived from cross-linking mass spectrometry data. Nat. Protoc. 2018,

13, 478−494.

(10) Schneider, M.; Belsom, A.; Rappsilber, J. Protein Tertiary

Structure by Crosslinking/Mass Spectrometry. Trends Biochem. Sci.

2018,43, 157−169.

(11) Hoopmann, M. R.; Zelter, A.; Johnson, R. S.; Riffle, M.;

MacCoss, M. J.; Davis, T. N.; Moritz, R. L. Kojak: efficient analysis of

chemically cross-linked protein complexes. J. Proteome Res. 2015,14,

2190−2198.

(12) Leitner, A.; Walzthoeni, T.; Aebersold, R. Lysine-specific

chemical cross-linking of protein complexes and identification of

cross-linking sites using LC-MS/MS and the xQuest/xProphet

software pipeline. Nat. Protoc. 2014,9, 120−137.

(13) Yang, B.; Wu, Y.-J.; Zhu, M.; Fan, S.-B.; Lin, J.; Zhang, K.; Li,

S.; Chi, H.; Li, Y.-X.; Chen, H.-F.; et al. Identification of cross-linked

peptides from complex samples. Nat. Methods 2012,9, 904−906.

(14) Liu, F.; Rijkers, D. T. S.; Post, H.; Heck, A. J. R. Proteome-wide

profiling of protein assemblies by cross-linking mass spectrometry.

Nat. Methods 2015,12, 1179−1184.

(15) Renard, B. Y.; Kirchner, M.; Monigatti, F.; Ivanov, A. R.;

Rappsilber, J.; Winter, D.; Steen, J. A. J.; Hamprecht, F. A.; Steen, H.

When less can yield more - Computational preprocessing of MS/MS

spectra for peptide identification. Proteomics 2009,9, 4978−4984.

(16) Kall, L.; Canterbury, J. D.; Weston, J.; Noble, W. S.; MacCoss,

M. J. Semi-supervised learning for peptide identification from shotgun

proteomics datasets. Nat. Methods 2007,4, 923−925.

(17) Fischer, L.; Rappsilber, J. Quirks of Error Estimation in Cross-

Linking/Mass Spectrometry. Anal. Chem. 2017,89, 3829−3833.

(18) Walzthoeni, T.; Claassen, M.; Leitner, A.; Herzog, F.; Bohn, S.;

Forster, F.; Beck, M.; Aebersold, R. False discovery rate estimation for

cross-linked peptides identified by mass spectrometry. Nat. Methods

2012,9, 901−903.

(19) Maiolica, A.; Cittaro, D.; Borsotti, D.; Sennels, L.; Ciferri, C.;

Tarricone, C.; Musacchio, A.; Rappsilber, J. Structural analysis of

multiprotein complexes by cross-linking, mass spectrometry, and

database searching. Mol. Cell. Proteomics 2007,6, 2200−2211.

(20) Cox, J.; Mann, M. MaxQuant enables high peptide

identification rates, individualized p.p.b.-range mass accuracies and

proteome-wide protein quantification. Nat. Biotechnol. 2008,26,

1367−1372.

(21) Tyanova, S.; Temu, T.; Cox, J. The MaxQuant computational

platform for mass spectrometry-based shotgun proteomics. Nat.

Protoc. 2016,11, 2301−2319.

(22) Sturm, M.; Bertsch, A.; Gropl, C.; Hildebrandt, A.; Hussong,

R.; Lange, E.; Pfeifer, N.; Schulz-Trieglaff, O.; Zerck, A.; Reinert, K.;

et al. OpenMS - an open-source software framework for mass

spectrometry. BMC Bioinf. 2008,9, 163.

(23) Rost, H. L.; Sachsenberg, T.; Aiche, S.; Bielow, C.; Weisser, H.;

Aicheler, F.; Andreotti, S.; Ehrlich, H.-C.; Gutenbrunner, P.; Kenar,

E.; et al. OpenMS: a flexible open-source software platform for mass

spectrometry data analysis. Nat. Methods 2016,13, 741−748.

(24) Craig, R.; Beavis, R. C. TANDEM: matching proteins with

tandem mass spectra. Bioinformatics 2004,20, 1466−1467.

(25) Eng, J. K.; Jahan, T. A.; Hoopmann, M. R. Comet: an open-

source MS/MS sequence database search tool. Proteomics 2013,13,

22−24.

(26) Geer, L. Y.; Markey, S. P.; Kowalak, J. A.; Wagner, L.; Xu, M.;

Maynard, D. M.; Yang, X.; Shi, W.; Bryant, S. H. Open mass

spectrometry search algorithm. J. Proteome Res. 2004,3, 958−964.

(27) Yuan, Z.-F.; Liu, C.; Wang, H.-P.; Sun, R.-X.; Fu, Y.; Zhang, J.-

F.; Wang, L.-H.; Chi, H.; Li, Y.; Xiu, L.-Y.; et al. pParse: a method for

accurate determination of monoisotopic peaks in high-resolution mass

spectra. Proteomics 2012,12, 226−235.

(28) Iacobucci, C.; Sinz, A. To Be or Not to Be? Five Guidelines to

Avoid Misassignments in Cross-Linking/Mass Spectrometry. Anal.

Chem. 2017,89, 7832−7835.

(29) Giese, S. H.; Belsom, A.; Rappsilber, J. Optimized

Fragmentation Regime for Diazirine Photo-Cross-Linked Peptides.

Anal. Chem. 2016,88, 8239−8247.

(30) Kastritis, P. L.; O’Reilly, F. J.; Bock, T.; Li, Y.; Rogon, M. Z.;

Buczak, K.; Romanov, N.; Betts, M. J.; Bui, K. H.; Hagen, W. J.; et al.

Capturing protein communities by structural proteomics in a

thermophilic eukaryote. Mol. Syst. Biol. 2017,13, 936.

Journal of Proteome Research Article

DOI: 10.1021/acs.jproteome.8b00600

J. Proteome Res. 2018, 17, 3923−3931

3930

(31) Kessner, D.; Chambers, M.; Burke, R.; Agus, D.; Mallick, P.

ProteoWizard: open source software for rapid proteomics tools

development. Bioinformatics 2008,24, 2534−2536.

(32) Weisser, H.; Nahnsen, S.; Grossmann, J.; Nilse, L.; Quandt, A.;

Brauer, H.; Sturm, M.; Kenar, E.; Kohlbacher, O.; Aebersold, R.; et al.

An automated pipeline for high-throughput label-free quantitative

proteomics. J. Proteome Res. 2013,12, 1628−1644.

(33) Deutsch, E. W.; Mendoza, L.; Shteynberg, D.; Slagel, J.; Sun, Z.;

Moritz,R.L.Trans-ProteomicPipeline,astandardizeddata

processing pipeline for large-scale reproducible proteomics infor-

matics. Proteomics: Clin. Appl. 2015,9, 745−754.

(34) Keller, A.; Nesvizhskii, A. I.; Kolker, E.; Aebersold, R. Empirical

Statistical Model To Estimate the Accuracy of Peptide Identifications

Made by MS/MS and Database Search. Anal. Chem. 2002,74, 5383−

5392.

(35) Vizcaíno, J. A.; Csordas, A.; del-Toro, N.; Dianes, J. A.; Griss, J.;

Lavidas, I.; Mayer, G.; Perez-Riverol, Y.; Reisinger, F.; Ternent, T.;

et al. 2016 update of the PRIDE database and its related tools. Nucleic

Acids Res. 2016,44, D447−56.

Journal of Proteome Research Article

DOI: 10.1021/acs.jproteome.8b00600

J. Proteome Res. 2018, 17, 3923−3931

3931