Document [original]

Optimized Fragmentation Regime for Diazirine Photo-Cross-Linked

Peptides

Sven H. Giese,

†,‡

Adam Belsom,

‡

and Juri Rappsilber*

,†,‡

†

Chair of Bioanalytics, Institute of Biotechnology, Technische Universitat Berlin, 13355 Berlin, Germany

‡

Wellcome Trust Centre for Cell Biology, University of Edinburgh, Edinburgh EH9 3BF, United Kingdom

SSupporting Information

ABSTRACT: Cross-linking/mass spectrometry has evolved

into a robust technology that reveals structural insights into

proteins and protein complexes. We leverage a new tribrid

instrument with improved fragmentation capacities in a

systematic comparison to identify which fragmentation

method would be best for the identiﬁcation of cross-linked

peptides. Speciﬁcally, we explored three fragmentation

methods and two combinations: collision-induced dissociation

(CID), beam-type CID (HCD), electron-transfer dissociation

(ETD), ETciD, and EThcD. Trypsin-digested, SDA-cross-

linked human serum albumin (HSA) served as a test sample,

yielding over all methods and in triplicate analysis in total 2602 matched PSMs and 1390 linked residue pairs at 5% false

discovery rate, as conﬁrmed by the crystal structure. HCD wins in number of matched peptide-spectrum-matches (958 PSMs)

and identiﬁed links (446). CID is most complementary, increasing the number of identiﬁed links by 13% (58 links). HCD wins

together with EThcD in cross-link site calling precision, with approximately 62% of sites having adjacent backbone cleavages that

unambiguously locate the link in both peptides, without assuming any cross-linker preference for amino acids. Overall quality of

spectra, as judged by sequence coverage of both peptides, is best for EThcD for the majority of peptides. Sequence coverage

might be of particular importance for complex samples, for which we propose a data dependent decision tree, else HCD is the

method of choice. The mass spectrometric raw data has been deposited in PRIDE (PXD003737).

Current methods of structural biology have left a systematic

and large gap in our knowledge of protein structures.

Cross-linking/mass spectrometry (CLMS) is an emerging tool

that helps to gain structural information for challenging

proteins and protein complexes. In CLMS experiments, protein

complexes are chemically cross-linked, digested into peptides,

and then analyzed via mass spectrometry and bioinformatics.

2−5

Identifying a cross-linked peptide pair or the linked residues

within, deﬁnes their maximal distance in the folded protein.

The derived distance constraints can then be used to determine

the low-resolution arrangement of protein complexes

4,6,7

even the high-resolution structure of a protein by the help of

computational modeling.

To identify cross-linked peptides, fragmentation spectra have

to be matched with peptide sequences by database search. For

this purpose, a number of tools have been developed,

9,10

for

example, pLINK,

Protein Prospector,

12,13

StavroX,

xQuest,

Kojak,

Xi,

6,17

or even search engines

based on

linear peptide identiﬁcation search paradigms such as Mascot.

One of the challenges in identifying cross-linked peptides is the

unequal fragmentation of the two linked peptides,

13,17

that is,

often one of the two peptides is better fragmented and thus also

better characterized by fragment ions. Under collision-induced

dissociation (CID) conditions this has been investigated in

more detail, revealing that the intensity of observed fragment

ions is also aﬀected.

This is important for the scoring of cross-

linked peptides since in general the number of identiﬁed

fragment ions and their intensity is used for spectra matching.

Despite the obvious disadvantage of the unequal fragmentation,

scoring mechanisms managed to successfully exploit this fact:

To judge the complete cross-linked peptide-spectrum match

(PSM), the two individual peptide scores are weighted

diﬀerently.

13,16

However, this should only be an ad hoc

solution; ideally the experimental setup can be changed in such

a way that the sequence coverage for both peptides is increased.

It is plausible that one of the available fragmentation methods

performs better than the others, and a comparative analysis into

the behavior of cross-linked peptides might reveal options for a

reﬁned acquisition strategy.

Throughout the manuscript we use CID for resonant

excitation CID in the linear ion trap and HCD as the

abbreviation for beam-type CID (HCD is also often referred to

as higher-energy collisional dissociation). CID is one of the

standard methods of fragmenting peptides in proteomics and

has been used in many CLMS studies.

6,20−24

The details of

CID of cross-linked peptides have recently been systematically

Received: May 27, 2016

Accepted: July 25, 2016

Published: July 25, 2016

Article

pubs.acs.org/ac

Anal. Chem. 2016, 88, 8239−8247

This is an open access article published under a Creative Commons Attribution (CC-BY)

License, which permits unrestricted use, distribution and reproduction in any medium,

provided the author and source are cited.

Downloaded via TU BERLIN on April 22, 2020 at 22:34:53 (UTC).

See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

assessed,

but a systematic comparison to other fragmentation

methods such as HCD is lacking. HCD has also been used in

many CLMS studies.

11,13,25,26

Neither a systematic analysis of

cross-linked peptides under HCD exists nor under electron-

transfer dissociation (ETD). ETD-based fragmentation, that is,

ETD with and without supplemental activation of CID

(ETciD) or HCD (EThcD)

has neither routinely been

applied to cross-linked peptides nor investigated in much detail.

A sequential fragmentation scheme of CID and ETD is

reported to increase the identiﬁcation and conﬁdence levels of

cross-linked peptides.

Another study acquired sequential CID

and ETD fragmentation spectra as an optimized method for

CID cleavable cross-linkers with signature peaks. Both spectra

are then matched with their appropriate ion types and scored

together, yielding an improved sequence coverage compared to

CID alone.

Search strategies for noncleavable cross-linkers,

however, do not rely on the detection of signature peaks, and

thus, the time for the reisolation of the precursor can be saved

by simply using ETD with supplemental activation. It was also

shown that ETD alone can generate good ion coverage for both

peptides using a novel cross-linker,

albeit the eﬀect on

peptides cross-linked with another cross-linker remains to be

investigated. In contrast, ETD has been used frequently for

complete proteins

or to characterize post-translational

modiﬁcations (since it leaves the often labile peptide

modiﬁcations intact

). Earlier studies stated that ETD

fragment peptides with charge states higher than two, more

extensively than CID.

However, the underlying eﬀect seems

to correlate with the mass-to-charge ratio (m/z) of the

precursors.

For cross-linked peptides, we expect highly

charged precursors

6,15,18

and, thus, potentially well-suited

targets for ETD.

High-sequence coverage is important to ensure selectivity

during database search when trying to identify the two cross-

linked peptides from the large choice of alternatives oﬀered by

the database. Good backbone fragmentation should also be

beneﬁcial to pinpoint the exact location of the cross-link.

Despite being the intuitive expectation, sequence coverage and

site calling precision do not necessarily have to be linked

directly. Properties of the linkage site might direct fragmenta-

tion toward neighboring backbone bonds or away from them.

Also, for amine-reactive cross-linkers, pinpointing the exact

position of the cross-link is assisted by the restricted chemical

reactivity toward lysine, serine, threonine, tyrosine, or the

protein N-terminus. Hence, depending on the peptide

sequence there might only be a single amino acid amenable

to the cross-linker reaction. For highly reactive cross-linkers

such as succinimidyl 4,4-azipentanoate (SDA) each residue in a

peptide needs to be considered when locating the linkage site.

Therefore, pinpointing the cross-link sites potentially requires

more complete backbone fragmentation than for more speciﬁc

cross-linkers.

In this study we compared three diﬀerent fragmentation

techniques and two combined fragmentation schemes available

on a novel tribrid mass spectrometer (Orbitrap Fusion Lumos,

Thermo Fisher Scientiﬁc), CID, HCD, ETD, ETciD, and

EThcD, on cross-linked peptides obtained by tryptic cleavage of

SDA-cross-linked human serum albumin (HSA). The three-

dimensional structure of HSA has been resolved by X-ray

crystallography

and is used as ground-truth to evaluate the

identiﬁcation results. The right choice of fragmentation method

allows the number of identiﬁed linkage sites to be increased;

increasing the sequence coverage of both linked peptides

boosts the conﬁdence of the matches and also the correct

localization of the cross-link site.

■METHODS

Sample Preparation. Puriﬁed HSA (Sigma-Aldrich, St.

Louis, MO) was cross-linked using diﬀerent cross-linker-to-

protein, weight-to-weight (w/w) ratios: 0.152:1, 0.203:1,

0.303:1, 0.406:1, 0.606:1, 0.811:1, 1.21:1, and 1.62:1. Aliquots

of puriﬁed HSA (15 μg, 0.75 mg/mL) in cross-linking buﬀer

(20 mM HEPES−OH, 20 mM NaCl, 5 mM MgCl2, pH 7.8)

were mixed with sulfo-SDA (Thermo ScientiﬁcPierce,

Rockford, IL) to initiate incomplete reaction of the protein

with the sulfo-NHS ester component of the cross-linker.

Human blood serum from a healthy donor (20 μg, 1.0 mg/mL)

was cross-linked in a similar manner, using cross-linker-to-

protein ratios (w/w) of 0.5:1, 1:1, 2:1, and 4:1. Total reaction

volume in each case was 30 μL. For the second step of the

cross-linking procedure, photoactivation of the diazirine group

was carried out using UV irradiation from a UVP CL-1000 UV

Cross-linker (UVP Inc.). Samples were irradiated for either 25

or 50 min for puriﬁed HSA samples, and either 10, 20, 40, or 60

min in the case of blood serum samples and separated using gel

electrophoresis. Bands corresponding to monomeric HSA were

excised from gels and the proteins reduced with DTT, alkylated

using IAA, and digested using trypsin following standard

protocols.

Peptides were loaded onto self-made C18

StageTips

and eluted using 80% acetonitrile and 20%, 0.1%

TFA in water. The eluates from blood serum HSA and puriﬁed

HSA digests were mixed 0.33:1 as a master mix to be used

throughout this study. The two samples originally used in our

structural analysis of HSA

were mixed here to gain enough

material to perform the experiments of this study in triplicates.

Data Acquisition. Peptides were loaded directly (2% B,

500 nL/min) onto a spray emitter analytical column (75 μm

inner diameter, 8 μm opening, 250 mm length; New

Objectives) packed with C18 material (ReproSil-Pur C18-AQ

3μm; Dr Maisch GmbH, Ammerbuch-Entringen, Germany)

using an air pressure pump (Proxeon Biosystems).

The 0.1%

formic acid served as mobile phase A and 0.1% formic acid/

80% acetonitrile as mobile phase B. Peptides were eluted (200

nL/min, linear gradient of 2−40% B over 139 min) directly

into an Orbitrap Fusion Lumos Tribrid mass spectrometer

(Thermo Fisher Scientiﬁc, San Jose, CA). Survey spectra were

recorded in the Orbitrap at 120000 resolution. Spectra for all

fragmentation methods were acquired using a scan range of

300−1700 m/z. Precursor ion isolation was performed with the

quadrupole and an m/zwindow of 1.6 Th. The precursor

automatic gain control (AGC) target value was 4 ×105,

maximum injection time 50 ms. For CID only, CID collision

energy was set to 30%. For HCD only, HCD collision energy

was set to 35%. For ETD only, the option to inject ions for all

available parallelizable time was selected (anion AGC 5 ×104,

60 ms maximum injection time). Supplemental activation (SA)

collision energy was set to 10% for ETciD, and 25% for EThcD.

Data Analysis. Raw ﬁles were preprocessed with MaxQuant

(v. 1.5.2.8) with “Top MS/MS peaks per 100 Da”set to 100.

Resulting peak ﬁles (APL format) were subjected to Xi (ERI

Edinburgh, v. 1.5.584) and searched with the following settings:

MS accuracy, 6 ppm; MS/MS accuracy, 20 ppm; enzyme,

trypsin; max. missed cleavages, 4; max. number of modiﬁca-

tions, 3; ﬁxed modiﬁcation, none; variable modiﬁcations,

carbamidomethylation on cysteine; oxidation on methionine;

cross-linker, SDA (mass modiﬁcation: 109.0396 Da). In

Analytical Chemistry Article

DOI: 10.1021/acs.analchem.6b02082

Anal. Chem. 2016, 88, 8239−8247

8240

addition, variable modiﬁcations by the hydrolyzed cross-linker

(“SDA-hyd”, mass modiﬁcation: 82.0413 Da) and loop-links

(“SDA-loop”, mass modiﬁcation: 83.0491 Da) were allowed.

SDA cross-link reactions were assumed to connect lysine,

serine, threonine, tyrosine, or the protein N-terminus on the

one end of the spacer with any other amino acid on the other

end. FDR was estimated using XiFDR (v. 1.0.6.14)

on a 5%

peptide spectrum match (PSM) level and 5% link-level only

including unique PSMs. The reference database consisted of a

single entry with the protein sequence of HSA (Uniprot:

P02768). For further analysis, PSM information (precursor m/

z, annotated fragments, score, peptide sequences, etc.) were

extracted from a local PostgreSQL database. The annotated

spectra are available in the Supporting Information (Figures

S4−S8).

To derive a decision tree for an optimized fragmentation

scheme for cross-linked peptides we divided the acquisition

range into a grid of m/zbins of size 200 for each charge state

from 3 to 7. After sorting all PSMs into this theoretical grid we

assigned each cell the best performing and second best

performing fragmentation method. The performance was

assessed through the median achieved sequence coverage of

the complete cross-linked peptide. Note, sequence coverage

does not depend on the possible fragment ions but rather on

the actual evidence (fragment ions) for speciﬁc n-terminal or c-

terminal sequences. To decide whether or not a fragmentation

method is favorable over another we conducted a simple, one-

sided permutation test

with label swaps and 10.000 iterations.

Pvalues lower than 0.05 were regarded as signiﬁcant.

Permutation tests were only performed if more than 15

observations were in the best performing class. If the best and

second best were too similar to give signiﬁcant results the best

performing method was also compared to all other methods.

All raw ﬁles are available via the PRIDE repository

(PDX:

PXD003737) along with PSM results and the reference

FASTA.

■RESULTS AND DISCUSSION

We investigated the impact of ﬁve fragmentation techniques

(CID, HCD, ETD, EThcD, ETciD) on the analysis of cross-

linked peptides using a latest generation Orbitrap mass

spectrometer (Orbitrap Fusion Lumos, Thermo Fisher

Scientiﬁc). HSA was used as a model protein with a known

crystal structure. Cross-linking experiments suﬀer under CID

conditions from the underrepresentation of fragment ions from

one of the two peptides.

13,17

Here we deﬁne the peptide with

more intense ions among the ten most intense fragment ions as

the α-peptide and the remaining peptide as the β-peptide.

Note that the nomenclature for the two peptides in a cross-link

is not standardized; other deﬁnitions using the achieved search

score

or the peptide’s chain length or mass

are used. We

hypothesized that the usage of other fragmentation techniques

has an impact on the fragmentation pattern of cross-linked

peptides and subsequently on the success rate of identiﬁcation.

In our analysis we applied two diﬀerent FDR-levels according

to the descriptive features that we evaluated.

For the

evaluation of identiﬁcation results on the crystal structure, a

link-level FDR is used. For the evaluation of PSM properties

(e.g., sequence coverage), a regular PSM FDR is used. An

overview is available in Table S1.

HCD Fragmentation Gives the Highest Number of

Identiﬁed Cross-Links. We compared the number of

identiﬁed cross-links that passed a 5% link-level FDR and a

5% PSM-level FDR to assess which fragmentation approach

leads to the highest identiﬁcation success. The results,

accumulating the three technical replicates for all fragmentation

techniques, show that HCD (958 PSMs) gives the highest

number of identiﬁcations followed by CID (604 PSMs, Figure

1A). ETciD fragmentation achieves the lowest number of

identiﬁed cross-links with 296 PSMs. This order is closely

related to the number of acquired spectra in all replicates.

While HCD is the fastest acquisition technique producing

∼109000 MS2 spectra ETciD and ETD only produce ∼80000

spectra (Table S1). While the number of PSMs is only a proxy

for the success of CLMS experiments, the true value of CLMS

data comes from the corresponding distance constraints.

Therefore, for the comparison of cross-linking data it makes

sense to compare the results on the link-level. For the

comparison on the link-level only unique links are regarded for

further analysis. A unique link is deﬁned by the combination of

residues involved in a cross-link, that is, a unique residue pair.

As is the case for PSMs, HCD fragmentation also returns the

highest number of identiﬁed links (Figure 1A). In total 1390

links (972 unique) were identiﬁed with the various methods:

Of the unique links HCD observed 446 links (46%), CID 297

links (31%), EThCD 240 links (25%), ETciD 205 links (21%),

and ETD 202 (21%). Note, the comparison of the links is not

straightforward if the cross-link site is ambiguous. We applied a

simple heuristic that assigns the linkage site to the c-terminal

Figure 1. Number of SDA-induced cross-links identiﬁed in HSA using

diﬀerent fragmentation techniques. (A) Identiﬁed PSMs and links

were computed for 5% FDR-level on the respective category. (B)

Evaluation of the identiﬁed cross-links against the crystal structure of

HSA. The light gray distribution reﬂects the distance measurement

between identiﬁed residues in a cross-link mapped to the crystal

structure (the median is shown above the vertical line). The dark gray

distribution reﬂects all pairwise combinations of cross-linkable residues

in the crystal structure. The black vertical line at 25 Å is used to classify

cross-links as long distance or not.

Analytical Chemistry Article

DOI: 10.1021/acs.analchem.6b02082

Anal. Chem. 2016, 88, 8239−8247

8241

residue in ambiguous linkage windows. As HSA’s three-

dimensional structure has been resolved, it is possible to utilize

it as ground truth and further evaluate the quality of the

identiﬁed links. We used 25 Å here for SDA as the maximal α-

carbon distance of two linkable amino acids in the three-

dimensional structure. This provides a clear distinction between

true positive and false positive identiﬁcations. Each identiﬁed

link that lies within 25 Å in the crystal structure is plausibly a

true positive. Accordingly, every link that is further than 25 Å

apart is plausibly a false positive. This is a simpliﬁed approach,

as links shorter than 25 Å will also contain false positives as a

result of random matching, and conversely, longer links may be

true and result from protein structural ﬂexibility. Comparing

the link information from all ﬁve fragmentation techniques

shows that the overall quality of the results is comparable across

all fragmentation modes and distinctly diﬀerent from random

results. The derived distance distributions have a median of

12−13 Å and are very distinct from the random distance

distribution (Figure 1B). In addition, the results are comparable

in meeting the approximated 5% FDR. FDR analysis for the

HCD data and the ETciD data slightly underestimates the

number of false positives by 1% and 2.5%, respectively (Figure

S1). These can be partially explained by the deﬁnition of the

FDR itself, which only gives an approximation of the true false

discovery rate. Furthermore, the hard cutoﬀthat was used has a

large impact on the computed FDR. For example, the ETciD

distances showed a larger peak just to the right of the desired

distance cutoﬀ, indicating that a small increase in the maximal

allowed distance would give an FDR closer to the desired 5%.

The HCD distance distribution looks similar to a small

enrichment of false positives just outside the maximal allowed

distance. Thus, accounting for more ﬂexibility would change

the FDR and suggests that the diﬀerent methods lead to data of

comparable quality but diﬀerent quantity.

Having a preranking of the individual fragmentation

techniques in terms of number of PSMs and unique cross-

links is desirable to maximize the information content in a

single run. Depending on the peptide properties, some

fragmentation methods might be more suited for a certain

group of peptides, and thus, using two (or more) orthogonal

fragmentation techniques may increase the overall yield in

peptide identiﬁcations and thus distance constraints. Disregard-

ing the link information to focus ﬁrst on the identiﬁed peptide

pairs shows that HCD fragmentation also yields the largest

number of unique peptide pairs (Figure 2A). A total of 43%

(201 peptide pairs) are shared between at least two

fragmentation techniques. The remaining 57% (269 peptide

pairs) are unique to one of the ﬁve fragmentation techniques.

To maximize the information content, HCD should be

combined with CID fragmentation to increase the number of

unique links by 58 (Figure 2B). Interestingly, ETD

fragmentation can maximally increase the number of unique

links by 41 by using EThcD. We suspect that the diﬀerence in

the number of acquired spectra and actually identiﬁed PSMs is

the main driver for this eﬀect. We deﬁne the identiﬁcation rate,

IR, as

=IR

acq

, where Nid is the number of identiﬁed unique

PSMs and Nacq is the total number of acquired MS2 spectra

(Table S1). The IR reveals that HCD not only acquires most

spectra, but also has the highest success rate of 0.88% compared

to CID (0.61%), EThcD (0.52%), and ETD/ETciD (0.38%). If

speed and reliability of ETD-based fragmentation should

change in the future, this order of complementarity may

change. In comparison with linear peptide identiﬁcations,

where the IR reaches up to 54%

(depending on the

instrumentation), the success rate of cross-link identiﬁcation

is much lower. A contributing factor will be the generally low

abundance of cross-linked peptides when compared to linear

peptides, which will reduce their frequency of selection for

MS2, especially in competition with the linear peptides also

present. Other factors will include poorer database matching

due to often lower intensity, but also more complex spectra and

a larger search space.

ETD-Aided Fragmentation Improves the Coverage of

the Second Peptide. The identiﬁcation of cross-linked

peptides poses two challenges: First, ﬁnding the correct peptide

pair, and second, assigning the correct cross-link site. High

peptide sequence coverage for both individual peptides should

be beneﬁcial to assigning the correct site. Site calling will be

Figure 2. Pairwise result overlaps of fragmentation techniques. (A)

Overlap of identiﬁed peptide pairs (disregarding link-site positions)

between fragmentation techniques (Venn diagram generated with

Jvenn49). (B) Set diﬀerence matrix shows the number of uniquely

identiﬁed peptide pairs (disregarding link-site positions) by one

fragmentation technique (y-axis) when compared to another one (x-

axis).

Analytical Chemistry Article

DOI: 10.1021/acs.analchem.6b02082

Anal. Chem. 2016, 88, 8239−8247

8242

especially challenging when considering cross-linkers such as

SDA, where the number of cross-link target sites is large.

Under HCD conditions the coverage distribution for the α-

peptide is the lowest, with a mean coverage around 50%

(Figure 3A). The other four fragmentation techniques perform

very similar to only small improvements in the coverage of the

α-peptide with CID or EThcD fragmentation. Interestingly,

ETD involving fragmentation schemes do not increase the

fragmentation eﬃciency (measured by the peptide coverage)

much for the α-peptide. In fact, the highest coverage values for

the α-peptide were observed with CID fragmentation. In

contrast, the sequence coverage for the beta peptide largely

depends on the fragmentation method (Figure 3B). ETD,

ETciD, EThcD, and HCD show a much better fragmentation

compared to CID. Previously, ETD was reported to improve

the sequence coverage compared to CID.

32,43

We observe here

that for cross-linked peptides this eﬀect is very pronounced for

the β-peptide, but not for the α-peptide.

In general, in cross-linked peptides, one peptide matches

more and with higher intense fragment ions than the other. All

fragmentation methods yield at least an average coverage of

around 50% for the α-peptide. For the β-peptide, the average

coverage lies between 39% and 50%. CID would be the method

of choice for high α-peptide coverage. However, CID is

systematically disadvantaging the β-peptide. For the β-peptide,

the other fragmentation methods perform much better: EThcD

and HCD almost reach the same fragmentation eﬃciency as for

the α-peptide. In numbers, the largest discrepancy between α-

and β-peptide coverage was observed with CID, with a mean

coverage diﬀerence (MCD) of 19%. EThCD and HCD show

the lowest MCD of 8%. The overall best coverage is observed

with EThcD fragmentation (Figure 3C). ETciD seems to be

less eﬀective, presumably as ETD in the ﬁrst stage leads to

charge reduction, and CID then fragments a single precursor,

while HCD fragments all. Nevertheless, ETciD greatly

improves the coverage of the second peptide when compared

to CID.

To compare the fragmentation eﬃciency on both peptides in

a cross-link more systematically, we deﬁne the symmetry factor

(SF) as

=| − |

αβ

Fcovcov

(1)

where covαand covβrefer to the sequence coverage of the α-

and β-beta peptide, respectively. For convenience, we use the

negation SF′of SF deﬁned as

′= −

F1S

(2)

A large SF′means that α- and β-peptide coverage are very

similar and vice versa. CID shows the smallest among the ﬁve

fragmentation methods of ∼0.8. The other four methods

perform better than CID, with a median of ∼0.9 (Figure 3D).

In addition, ETD, ETciD, EThcD, and HCD have a smaller

spread than CID. In summary, CID exasperates the second

peptide problem. Nevertheless, CID still slightly outperforms

HCD in overall cross-linked peptide sequence coverage. In

order to maximize overall cross-linked peptide coverage ETD,

ETciD, and EThcD are recommended, based on median

coverage of the complete cross-linked peptide.

Precursor m/zHas a Large Eﬀect on the Eﬃciency of

the Fragmentation. To follow-up on the diﬀerent

fragmentation behavior of cross-linked peptides we investigated

how the precursor properties inﬂuence the fragmentation

eﬃciency. We ﬁrst divided the m/zacquisition range into bins

of m/z150 (starting from m/z550). For each bin we then

collected the peptide identiﬁcations of all diﬀerent fragmenta-

tion methods and investigated the sequence coverage based on

the m/zof the precursor.

ETD and EThcD lead to the highest sequence coverage

between m/z500−800 (Figure 4A,B).However,ETD

Figure 3. Achieved sequence coverage comparison. Coverage

distribution of the α-peptide (A; more matches among the 10 most

intense fragment ions) and the β-peptide (B). The vertical line in

(A)−(C) reﬂects a reference value of 50% sequence coverage, meaning

fragments (b, c, y, or z) match to half of the backbone links between

residues along the sequence of the peptide. (C) Coverage distribution

for the complete cross-linked peptide. (D) Symmetry (absolute

coverage diﬀerence between alpha and beta peptide) distributions for

the diﬀerent fragmentation techniques. The data in (A)−(D) were

analyzed using a 5% PSM FDR.

Analytical Chemistry Article

DOI: 10.1021/acs.analchem.6b02082

Anal. Chem. 2016, 88, 8239−8247

8243

eﬃciency decreases steeply with higher m/z, making HCD and

CID the better choice for precursors larger than m/z1000. The

same trend is observed for all ETD-based methods. These

diﬀerences are more pronounced on the individual α- and β-

peptides. When the complete peptide coverage is compared

(Figure 4C), all methods stick more closely together but

EThcD and ETD still outperform all other methods for

precursors smaller than m/z850. In higher m/zareas, only CID

and HCD are able to still produce enough peptide

identiﬁcations (data in the ﬁgure was limited to only include

m/zbins with at least ﬁve observations).

As demonstrated in the sections above, there are diﬀerences

in the eﬃciency of the fragmentation of cross-linked peptides.

In a more detailed comparison, we divided the acquisition

range into a grid made of charge bins of size one and m/zbins

of size 200. In each of these cells we then tested how well the

ﬁve diﬀerent fragmentation methods performed. The perform-

ance was evaluated on the cross-linked peptide sequence

coverage. For the majority of peptides, EThcD achieved

signiﬁcantly higher sequence coverage (Figure 4D) than the

second best method between 600 and 800 m/z(precursor

charge 3−6). In addition, the m/zcells 400−600 (z= 4) and

800−1000 (z= 5) are also favored by EThcD fragmentation.

Since the majority of cross-linked peptides (71%) lie within

600−1000 m/z, the most important area is dominated by

EThcD fragmentation. However, evaluated by pure numbers of

identiﬁcations, EThcD is not the best performing method. On

average, ∼35 PSMs are missed if EThcD is chosen over the

method that achieves the highest number of identiﬁcations. If

the evaluation metric is changed to the highest number of

identiﬁcations, HCD is outperforming the other fragmentation

methods for all m/zbins (Figure S3). Therefore, HCD was

selected as default method for regions where no signiﬁcant

improvement could be observed by any of the other methods

(Figure 4D, HCD written in gray).

HCD, EThcD, and ETD fragmentation deﬁne the cross-

link site most unambiguously. The overall sequence

coverage is a valuable feature to assess the quality of peptide

identiﬁcations. However, for cross-linked peptides those

fragments ﬂanking the cross-linked residues are important to

deﬁne the linkage site. This resembles the localization of post-

translational modiﬁcations such as phosphorylation, which

greatly beneﬁted from the usage of combined fragmentation

methods.

Limited information about the cross-link site is

available when none of the fragments next to a cross-linked

residue are observed; the cross-link site can then only be

assigned by prior assumptions or to larger sequence windows,

which becomes problematic if the site call is oﬀby ±5 residues

(at least in HSA and using current ab initio structure

computation).

Given the information from correct fragment

identiﬁcations, a combination of one c-terminal and one n-

terminal ion is enough to locate the cross-link site

unambiguously. Utilizing the high-resolution/accurate mass

measurement in our experimental design, we thus assumed that

each assigned fragment is correct for peptides passing the

speciﬁed FDR.

The cross-link site in α-peptides could be assigned to a single

residue in ∼65% of all PSMs identiﬁed with EThcD or HCD

(Figure 5A). The second best performing method was ETD,

with approximately 60% of PSMs where the cross-link could be

assigned to a single residue. CID and ETciD PSMs show the

lowest number of accurate site localizations to a single residue

(below 50% of all PSMs). All methods placed the cross-link site

on average within the critical 5 residue window for 97.2% ±

1.17 (α-peptides) and 95.6% ±1.3 (β-peptides) of all PSMs.

For the β-peptide, this looks very similar; EThcD and HCD

show the best fragmentation behavior to localize the cross-link

Figure 4. Sequence coverage depending on precursor m/zand charge.

The average coverage values from (A) α-peptides, (B) β-peptides, and

z. Each dot represents the median of all identiﬁed peptides in a

window of m/z150. Error bars show the standard deviation. (D)

Decision surface to optimize the sequence coverage of cross-linked

peptide. The acquisition range was divided into bins of 200 m/zper

charge state. In each bin the best performing fragmentation method

(judged by median achieved sequence coverage) is used to color that

particular bin. The “*”denotes a signiﬁcant improvement in sequence

coverage by using the best performing fragmentation method over the

second best. Areas with less than 15 observations are colored in light

red, falling back to HCD as standard fragmentation technique. Gray

annotations show areas where no signiﬁcant improvement could be

obtained by choosing one method over the others.

Analytical Chemistry Article

DOI: 10.1021/acs.analchem.6b02082

Anal. Chem. 2016, 88, 8239−8247

8244

site (Figure 5B). With approximately 50% of precisely localized

cross-links in the β-peptide, the link-localization is less well for

the β-peptide than for the α-peptide. However, this is not as

pronounced as would be expected from the sequence coverage

asymmetry. This is counterintuitive since the coverage

distributions for HCD is among the lowest of all ﬁve

fragmentation techniques for the β-peptide. For EThcD, the

results for the determination of the cross-link site are more in

line with the observed coverage distributions. Still, the large

diﬀerence in the coverage distribution of the α- and β-peptides

seems not to be as pronounced for the distribution of correct

localizations of the cross-link site. One of the possible reasons is

that the cleavage of the peptides before and after the cross-link

site is preferred. For CID a statistical trend was reported that

cross-linked fragments outnumber linear fragments and tend to

have a higher intensity.

We encounter the opposite for HCD,

linear fragments visibly outnumber cross-linked fragments

(Figure S8).

Data-Dependent Decision Tree for Optimized Acquis-

ition of Cross-Linked Peptides. CLMS studies vary in the

degree of complexity: single proteins, multiple protein

complexes or complete proteomes can be analyzed to generate

protein−protein interaction information or the three-dimen-

sional structure. Depending on the speciﬁc case we propose

two diﬀerent acquisition strategies (Figure 6A): First, for single

proteins or small protein complexes, we recommend HCD as

the method of choice. Since the complexity of the sample is not

very high, cross-linked peptides can often be matched by

precursor mass alone. In addition, HCD fragmentation

generates enough fragments to precisely localize the cross-

link site in the majority of cases. For the second case, that is,

complex samples with many proteins not only the search space

becomes an issue but also the associated random matches. A

fragmentation scheme that generates highly discriminative

scores for target and decoy peptides will identify more peptides

under the same FDR threshold. The optimal fragmentation

scheme for such an experiment is shown in Figure 6B. Earlier

studies on the development of data dependent decision trees

(DDDT) for the acquisition of linear peptides mainly support

our conclusions: HCD gives the highest number of

identiﬁcations, but ETD gives higher search engine scores

or, as in our case, higher sequence coverage. Compared to a

DDDT for linear peptides our results are slightly diﬀerent but

still comparable. For example, linear DDDTs precursors with

charge state 3+ have been analyzed with ETD up to 750 m/z

or 650 m/z,

we only use ETD from 600−800 m/z.In

addition, instead of using ETD alone for 4+, 5+ precursors

below 1000 m/zand 800 m/z, respectively, EThcD is used. In

this study we investigated SDA-cross-linked, tryptic peptides.

Other cross-linkers or enzymes may lead to peptide populations

Figure 5. Cross-link site localization precision. (A) Cumulative

precision curve for the α-peptide. (B) Cumulative precision curve for

the β-peptide. With a precision value of one the cross-link site is

unambiguously located by adjacent backbone fragments (b, c, y, or z)

in the peptide. A value of two limits the cross-link site to two eligible

residues.

Figure 6. Acquisition strategy for cross-linked peptides. (A)

Recommended acquisition scheme for cross-linking samples. (B)

Data-dependent decision tree (DDDT) for cross-linked peptides.

Depending on the precursor charge state (3+, 4+, 5+, 6+, and other)

and the m/z, the appropriate fragmentation technique is selected.

Analytical Chemistry Article

DOI: 10.1021/acs.analchem.6b02082

Anal. Chem. 2016, 88, 8239−8247

8245

with distinct fragmentation behavior due to diﬀerences in size

or amino acid composition. Note, however, that the proposed

fragmentation scheme is similar to the decision tree for linear

peptides

45,46

and may therefore be of more general value.

■CONCLUSION

For the majority of the peptides EThcD is the method of

choice to achieve the highest sequence coverage. HCD is an

important alternative because of its superior speed, with only

somewhat reduced peptide sequence coverage. CID, ETD, and

ETciD only play minor roles. We advise to adjust the

acquisition scheme to follow the experimental setup: simple

protein samples should be analyzed using only HCD to

maximize number of observed links, which starts having value

in protein structure determination.

8,47

For complex samples, we

propose a decision tree that is mainly based on EThcD and

HCD to maximize search speciﬁcity.

■ASSOCIATED CONTENT

SSupporting Information

The Supporting Information is available free of charge on the

ACS Publications website at DOI: 10.1021/acs.anal-

chem.6b02082.

Quality control results, details regarding the decision tree

areas, the relative performance of the individual

fragmentation techniques, and annotated spectra of all

PSMs (PDF).

■AUTHOR INFORMATION

Corresponding Author

*E-mail: [email protected].

Notes

The authors declare no competing ﬁnancial interest.

■ACKNOWLEDGMENTS

The Wellcome Trust generously funded this work through a

Senior Research Fellowship to J.R. (103139), a Centre Core

Grant (092076), and an Instrument Grant (108504).

■REFERENCES

(1) Perdigao, N.; Heinrich, J.; Stolte, C.; Sabir, K. S.; Buckley, M. J.;

Tabor, B.; Signal, B.; Gloss, B. S.; Hammang, C. J.; Rost, B.; et al. Proc.

Natl. Acad. Sci. U. S. A. 2015,112, 15898−15903.

(2) Rappsilber, J. J. Struct. Biol. 2011,173, 530−540.

(3) Holding, A. N. Methods 2015,89,54−63.

(4) Sinz, A. Mass Spectrom. Rev. 2006,25, 663−682.

(5) Schmidt, C.; Robinson, C. V. Nat. Protoc. 2014,9, 2224−2236.

(6) Chen, Z. A.; Jawhari, A.; Fischer, L.; Buchen, C.; Tahir, S.;

Kamenski, T.; Rasmussen, M.; Lariviere, L.; Bukowski-Wills, J.-C.;

Nilges, M.; et al. EMBO J. 2010,29, 717−726.

(7) Walzthoeni, T.; Leitner, A.; Stengel, F.; Aebersold, R. Curr. Opin.

Struct. Biol. 2013,23, 252−260.

(8) Belsom, A.; Schneider, M.; Fischer, L.; Brock, O.; Rappsilber, J.

Mol. Cell. Proteomics 2016,15, 1105−1116.

(9) Mayne, S. L. N.; Patterton, H.-G. Briefings Bioinf. 2011,12, 660−

671.

(10) Sinz, A.; Arlt, C.; Chorev, D.; Sharon, M. Protein Sci. 2015,24,

1193−1209.

(11) Yang, B.; Wu, Y.-J.; Zhu, M.; Fan, S.-B.; Lin, J.; Zhang, K.; Li, S.;

Chi, H.; Li, Y.-X.; Chen, H.-F.; et al. Nat. Methods 2012,9, 904−906.

(12) Chalkley, R. J.; Baker, P. R.; Medzihradszky, K. F.; Lynn, A. J.;

Burlingame, A. L. Mol. Cell. Proteomics 2008,7, 2386−2398.

(13) Trnka, M. J.; Baker, P. R.; Robinson, P. J. J.; Burlingame, a L.;

Chalkley, R. J. Mol. Cell. Proteomics 2014,13, 420−434.

(14) Gotze, M.; Pettelkau, J.; Schaks, S.; Bosse, K.; Ihling, C. H.;

Krauth, F.; Fritzsche, R.; Kuhn, U.; Sinz, A. J. Am. Soc. Mass Spectrom.

2012,23,76−87.

(15) Rinner, O.; Seebacher, J.; Walzthoeni, T.; Mueller, L. N.; Beck,

M.; Schmidt, A.; Mueller, M.; Aebersold, R. Nat. Methods 2008,5,

315−318.

(16) Hoopmann, M. R.; Zelter, A.; Johnson, R. S.; Riffle, M.;

MacCoss, M. J.; Davis, T. N.; Moritz, R. L. J. Proteome Res. 2015,14,

2190−2198.

(17) Giese, S. H.; Fischer, L.; Rappsilber, J. Mol. Cell. Proteomics

2016,15, 1094−1104.

(18) Maiolica, A.; Cittaro, D.; Borsotti, D.; Sennels, L.; Ciferri, C.;

Tarricone, C.; Musacchio, A.; Rappsilber, J. Mol. Cell. Proteomics 2007,

6, 2200−2211.

(19) Perkins, D. N.; Pappin, D. J.; Creasy, D. M.; Cottrell, J. S.

Electrophoresis 1999,20, 3551−3567.

(20) Fischer, L.; Chen, Z. A.; Rappsilber, J. J. Proteomics 2013,88,

120−128.

(21) Pearson, K. M.; Pannell, L. K.; Fales, H. M. Rapid Commun.

Mass Spectrom. 2002,16, 149−159.

(22) Chu, F.; Maynard, J. C.; Chiosis, G.; Nicchitta, C. V.;

Burlingame, A. L. Protein Sci. 2006,15, 1260−1269.

(23) Muller, M. Q.; Dreiocker, F.; Ihling, C. H.; Schafer, M.; Sinz, A.

Anal. Chem. 2010,82, 6958−6968.

(24) Leitner, A.; Reischl, R.; Walzthoeni, T.; Herzog, F.; Bohn, S.;

Forster, F.; Aebersold, R. Mol. Cell. Proteomics 2012,11, M111.014126.

(25) Shi, Y.; Fernandez-Martinez, J.; Tjioe, E.; Pellarin, R.; Kim, S. J.;

Williams, R.; Schneidman-Duhovny, D.; Sali, A.; Rout, M. P.; Chait, B.

T. Mol. Cell. Proteomics 2014,13, 2927−2943.

(26) Nguyen-Huynh, N.-T.; Sharov, G.; Potel, C.; Fichter, P.;

Trowitzsch, S.; Berger, I.; Lamour, V.; Schultz, P.; Potier, N.; Leize-

Wagner, E. Protein Sci. 2015,24, 1232−1246.

(27) Frese, C. K.; Altelaar, A. F. M.; van den Toorn, H.; Nolting, D.;

Griep-Raming, J.; Heck, A. J. R.; Mohammed, S. Anal. Chem. 2012,84,

9668−9673.

(28) Chowdhury, S. M.; Du, X.; Tolic

, N.; Wu, S.; Moore, R. J.;

Mayer, M. U.; Smith, R. D.; Adkins, J. N. Anal. Chem. 2009,81, 5524−

5532.

(29) Liu, F.; Rijkers, D. T. S.; Post, H.; Heck, A. J. R. Nat. Methods

2015,12, 1179−1184.

(30) Trnka, M. J.; Burlingame, A. L. Mol. Cell. Proteomics 2010,9,

2306−2317.

(31) Syka, J. E. P.; Coon, J. J.; Schroeder, M. J.; Shabanowitz, J.;

Hunt, D. F. Proc. Natl. Acad. Sci. U. S. A. 2004,101, 9528−9533.

(32) Mikesh, L. M.; Ueberheide, B.; Chi, A.; Coon, J. J.; Syka, J. E. P.;

Shabanowitz, J.; Hunt, D. F. Biochim. Biophys. Acta, Proteins Proteomics

2006,1764, 1811−1822.

(33) Kim, M.-S.; Pandey, A. Proteomics 2012,12, 530−542.

(34) Good, D. M.; Wirtala, M.; McAlister, G. C.; Coon, J. J. Mol. Cell.

Proteomics 2007,6, 1942−1951.

(35) Sugio, S.; Kashima, A.; Mochizuki, S.; Noda, M.; Kobayashi, K.

Protein Eng., Des. Sel. 1999,12, 439−446.

(36) Rappsilber, J.; Ishihama, Y.; Mann, M. Anal. Chem. 2003,75,

663−670.

(37) Ishihama, Y.; Rappsilber, J.; Andersen, J. S.; Mann, M. J.

Chromatogr. A 2002,979, 233−239.

(38) Renard, B. Y.; Kirchner, M.; Monigatti, F.; Ivanov, A. R.;

Rappsilber, J.; Winter, D.; Steen, J. A. J.; Hamprecht, F. A.; Steen, H.

Proteomics 2009,9, 4978−4984.

(39) Fischer, L.; Rappsilber, J. 2016, in preparation.

(40) EDGINGTON, E. S. J. Psychol. 1964,57, 445−449.

(41) Vizcaíno, J. A.; Cote

, R. G.; Csordas, A.; Dianes, J. A.; Fabregat,

A.; Foster, J. M.; Griss, J.; Alpi, E.; Birim, M.; Contell, J.; et al. Nucleic

Acids Res. 2013,41, D1063−D1069.

(42) Hebert, A. S.; Richards, A. L.; Bailey, D. J.; Ulbrich, A.;

Coughlin, E. E.; Westphall, M. S.; Coon, J. J. Mol. Cell. Proteomics

2014,13, 339−347.

(43) Molina, H.; Matthiesen, R.; Kandasamy, K.; Pandey, A. Anal.

Chem. 2008,80, 4825−4835.

Analytical Chemistry Article

DOI: 10.1021/acs.analchem.6b02082

Anal. Chem. 2016, 88, 8239−8247

8246

(44) Frese, C. K.; Zhou, H.; Taus, T.; Altelaar, A. F. M.; Mechtler, K.;

Heck, A. J. R.; Mohammed, S. J. Proteome Res. 2013,12, 1520−1525.

(45) Frese, C. K.; Altelaar, A. F. M.; Hennrich, M. L.; Nolting, D.;

Zeller, M.; Griep-Raming, J.; Heck, A. J. R.; Mohammed, S. J. Proteome

Res. 2011,10, 2377−2388.

(46) Swaney, D. L.; McAlister, G. C.; Coon, J. J. Nat. Methods 2008,

5, 959−964.

(47) Belsom, A.; Schneider, M.; Brock, O.; Rappsilber, J. Trends

Biochem. Sci. 2016,41, 564−567.

Analytical Chemistry Article

DOI: 10.1021/acs.analchem.6b02082

Anal. Chem. 2016, 88, 8239−8247

8247