RESEARCH PAPER
Ma a Pawluczyk &Julia Weiss &Ma hew G. Links &
Mikel Egaña A angu en &Ma k D. Wilkinson &
Ma cos Egea-Co ines
Recei ed: 17 Oc obe 2014 /Re ised: 15 Decembe 2014 /Accep ed: 18 Decembe 2014 /Published online: 11 Janua y 2015
#Sp inge -Ve lag Be lin Heidelbe g 2015
Abs ac Unbiased iden i ica ion o o ganisms by PCR e-
ac ions using uni e sal p ime s ollowed by DNA sequenc-
ing assumes posi i e ampli ica ion. We used six uni e sal
loci spanning 48 plan species and quan i ied he bias a
each s ep o he iden i ica ion p ocess om end poin PCR
o nex -gene a ion sequencing. End poin ampli ica ion was
signi ican ly di e en o single loci and be ween species.
Quan i a i e PCR e ealed ha Cq h eshold o a ious
loci, e en wi hin a single DNA ex ac ion, showed 2,000-
old di e ences in DNA quan i y a e ampli ica ion. Nex -
gene a ion sequencing (NGS) expe imen s in nine species
showed signi ican biases owa ds species and speci ic loci
using adap o -speci ic p ime s. NGS sequencing bias may
be p edic ed o some ex en by he Cq alues o qPCR
ampli ica ion.
Keywo ds Me aba coding .Nex -gene a ion sequencing .
Ion o en .Cq alue .PCR e iciency
In oduc ion
Sequence analysis o complex DNA samples is an impo an
app oach o moni o ing species dis ibu ion in biodi e si y
and popula ion s udies. Gene ic ma e ial is assessed using uni-
e sal genomic sequences “ba codes” ha a e in o ma i e e-
ga ding he species composi ion o he sample, as hey con ain
su icien polymo phisms be ween species ha axonomic dis-
c imina ion becomes possible [1]. The ba coding app oach
has become a mains eam echnique o iden i y species in
insec s [2], e y closely ela ed plan species o hyb ids [3],
o ungi [4] and bac e ia [5].
In plan s, se en chlo oplas loci ha e been analyzed as
po en ial ba codes, he space s a p -a ph, nH-psbA,and
psbK-psbL and he genes ma K, bcL, poB,and poC1 [6,
7]. Me aba coding in ol es DNA ampli ica ion o ba code
loci om mixed-popula ion samples, ollowed by nex -
gene a ion sequencing (NGS). Sequenced agmen s a e hen
ei he assembled de no o and hen aligned o known genome
sequences [8] o a e di ec ly aligned o hese genomic da a-
bases, hus becoming connec ed o speci ic axa [9]. Mos
o en, he objec i e o hese analyses is o a i e a a quan i a-
i e measu e o he ela i e abundance o he a ious species
in he sample.
Despi e being a p o en ool o axonomic iden i ica ion,
he app oach o PCR is subjec o a wide a ie y o po en ial
biases h oughou he p ocesses o ampli ica ion and sequence
M. Pawluczyk :J. Weiss :M. Egea-Co ines (*)
Gene ics, Ins i u o de Bio ecnología Vege al, Uni e sidad Poli écnica
de Ca agena, 30202 Ca agena, Spain
e-mail: ma cos.egea@upc .es
M. G. Links
Depa men o Compu e Science, Uni e si y o Saska chewan,
Saska oon Resea ch Cen e, 107 Science Place, Saska oon,
SK S7N OX2, Canada
M. Egaña A angu en :M. D. Wilkinson
Cen o de Bio ecnología y Genómica de Plan as UPM-INIA
(CBGP), Campus Mon egancedo, Au opis a M-40 (Km 38),
28223 Pozuelo de Ala cón Mad id, Spain
M. Egaña A angu en
Genomic Resou ces, Depa men o Gene ics, Physical
An h opology and Animal Physiology, Facul y o Science and
Technology, Uni e si y o Basque Coun y (UPV/EHU), Sa iena
auzoa z/g, 48940 Leioa-Bilbo, Spain
Anal Bioanal Chem (2015) 407:1841–1848
DOI 10.1007/s00216-014-8435-y
Quan i a i e e alua ion o bias in PCR ampli ica ion
and nex -gene a ion sequencing de i ed om me aba coding
samples
analysis, pa icula ly when applied o mixed-popula ion sam-
ples. These biases all in o h ee main ca ego ies. The i s
ela es o di e en ial ba code ampli ica ion success as a esul
o he ba code’s uni e sal p ime s. Depending on he ma ke /
species combina ion, alse-nega i e esul s can occu when
sequence a ia ion a he uni e sal p iming si es in one o
he species p e en s e icien annealing o he uni e sal
ba code p ime o ha species. A second ype o bias ela es
o he e iciency o he ampli ica ion eac ion, which may
di e om species o species based on he sequence compo-
si ion o hei speci ic a ian o he ba code. As a esul , he
p opo ion o sequences ep esen ing each species in he o ig-
inal sample may bea li le esemblance o he p opo ion o
ha species in ha popula ion. Finally, he e may also be
biases in oduced du ing he p epa a ion o DNA lib a ies
o sequencing. Fo ins ance, sample dilu ion has a s ong
e ec on he co ela ion be ween biological and ead quan i-
ies in bac e ial samples [10]. A combina ion o ba coding and
NGS has been in some cases con i med by qPCR, showing
ha while he exac quan i ica ion is no p ecise, ends in he
popula ion s uc u e a e ai h ul [11].
Despi e knowing ha hese po en ial biases exis , he de-
g ee o which each sou ce o bias a ec s he ou come o a
me aba coding expe imen and hei ela i e impo ance ha e
no been well quan i ied. Mo eo e , by quan i ying hese
biases and ela ing hem o he speci ic sequences being s ud-
ied, i may be possible o o mula e app oaches o pos ac o
no maliza ion o me aba code da a o be e e lec he popu-
la ion makeup. Fo example, PCR e iciency is an impo an
pa ame e o quan i a i e PCR analysis o gene exp ession
[12–14], and while a a ie y o algo i hms exis ha p edic
he e iciency o PCR ampli ica ion, hese a e cu en ly no
conside ed in any o he no mal ba coding o me aba coding
pipelines. Ampli ica ion e iciency o a gi en DNA sequence
depends hea ily on he G + C con en o he amplicon [14],
DNA seconda y s uc u e [15], and p e ious sample ea men
[16]. Unde op imal PCR condi ions wi h 100 % ampli ica ion
e iciency, wo copies o DNA a e gene a ed om each em-
pla e du ing exponen ial phase o ampli ica ion, and such a
eac ion is said o ha e an e iciency o 2. This e iciency can
also a ec ano he impo an s a is ic, namely Cq a ela i e
measu e o he p edic ed concen a ion o he a ge amplicon
in a PCR eac ion and a measu emen ha is widely used in
qPCR analysis [17,18]. These kinds o s a is ics will be e en
mo e ele an o NGS echnologies ha in oduce addi ional
PCR ampli ica ion s eps, such as Ion To en o 454/Roche
ha u ilizes an emulsion PCR du ing lib a y cons uc ion [19].
The p esen s udy, he e o e, aims o i s quan i a i ely
analyze PCR success and e alua e ampli ica ion e iciency
and Cq alues as a ool o p edic ing ampli ica ion success.
In his s udy, we unde ake a su ey o six well-known plan
ba coding ma ke s and apply hem o 48 species om 34
di e en plan amilies. In addi ion, we apply he Ion To en
sequencing me hod simul aneously o mixed-species PCR
p oduc s o h ee ba coding p ime s bcL, poB,and poC1
s a ing wi h equal amoun s o PCR p oduc s, o quan i a i ely
measu e he bias in oduced by his s ep o he me aba coding
s udy.
Ou esul s e eal ha quan i a i e and e en quali a i e
in e p e a ion o me aba coding da a based on ead abundance
is augh wi h po en ial, se ious biases. We p esen , in de ail, a
dissec ion o he deg ee o bias in oduced a each s ep in he
ypical labo a o y p ac ice o ba code ma ke analysis om
mixed DNA samples.
Ma e ials and me hods
Plan ma e ial
Plan ma e ial, 48 plan species belonging o 33 di e en am-
ilies, was ga he ed om he local ui ma ke , ield sampling,
bo anical eco ds, and ou own collec ions (Table 1).
DNA ex ac ion and eal- ime PCR
Two independen genomic DNA samples we e ex ac ed
om esh lea using he comme cial ki “Plan
NucleoSpin”(Mache y and Nagel, Dü en, Ge many). All
ex ac ed samples we e quan i ied wi h a Nanod op 2000
and, a e isop opanol-e hanol p ecipi a ion, all samples
we edilu ed o50ng/μl in o de o ha e iden ical concen-
a ions. Single species eac ions we e pe o med om he
wo independen DNA ex ac ions wi h h ee echnical ep-
licas o a o al o six PCR eac ions pe species using 100-
ng DNA/ eac ion. Real- ime PCR eac ions we e pe o med
as desc ibed p e iously [14]. The p ime s used in his ex-
pe imen ( bcL-a, ma K, poB, poC1, nL-F, nH-psbA)
ha e been desc ibed p e iously [6].
Equal amoun s o genomic DNA om h ee species
we e used o c ea e he mixed-species me aba coding em-
pla es. Ampli ica ions we e pe o med using an ini ial
DNA quan i y o 150 ng co esponding o 50 ng o each
o he h ee genomes. Sequencing eac ions comp ised
nine species.
qPCR e iciency and Cq calcula ion
qPCR e iciency and Cq we e compu ed using qpcR, R pack-
age [20]. E iciency alue (E) was calcula ed as E
cpD2
=
F(cpD2)/F(cpD2)−1, in which Fis aw luo escence a cycle
x, and cpD2 is cycle numbe a second de i a i e maximum o
he cu e [21].
1842 M. Pawluczyk e al.
De e mina ion o ela i e abundance o sequences om PCR
p oduc s o mixed genomic DNA by semiconduc o
sequencing
PCR p oduc s gene a ed by ampli ying, sepa a ely, he chlo-
oplas ba coding sequences bcL-a, poC1,and poB om
mixed genomic DNAs (100 ng each) we e pooled equi alen -
ly o yield a inal amoun o 100 ng. Ini ial ime o diges ion
was adjus ed o yield 300-bp agmen s. P epa a ion o sam-
ples o lib a y cons uc ion and sequencing we e pe o med
using he Ion To en Nex -Gene a ion Sequencing Ki s (Li e
Technologies, CA, USA) acco ding o he manu ac u e ’sin-
s uc ions. B ie ly, PCR p oduc s we e agmen ed using he
Ion Shea Plus eagen o a agmen size o 200 bp. The
co esponding agmen s we e liga ed o adap o s and size
ac iona ed using E-Gel elec opho esis, ob aining agmen s
o a e age 330 bp. Emulsion PCR was pe o med using one-
ouch sys em acco ding o he manu ac u e ’s p o ocol, and
sequencing was pe o med using 314 Ion To en chips. A
o al o 333,274 eads wi h a mean ead leng h o 159 bp we e
compu a ionally analyzed in o de o iden i y species o igin o
each agmen by aligning he eads wi h a lib a y o known
chlo oplas sequences using Bow ie2 [22]. We ex ac ed om
he esul ing SAM ile a map o eads o he known chlo oplas
sequences using a Pe l sc ip om he mPuma pipeline [8].
The analysis can be ep oduced, wi h he same pa ame e s and
da a, a he ollowing Galaxy ins alla ion (page: h p://bio d .
o g:8983/u/mikel-egana-a angu en/p/sou ces-o -bias-in-
applying-ba coding-ma ke s- o -sequence-analysis-o -
en i onmen al-samples).
Resul s
This wo k aimed o e eal and quan i y he biases ha can
occu du ing me aba coding analyses. We execu ed ou anal-
yses using he mos widely accep ed plan ba codes, quan i-
a ed ou esul s using widely accep ed p ac ices such as
qPCR, and ollowed no mal p o ocols o lib a y cons uc ion
and NGS. A each s age, we e-no malized he samples such
ha we knew he p ecise quan i ies and ela i e abundances o
he inpu DNA. In addi ion, al hough i is known ha he size
Table 1 Lis o plan species analyzed
Plan species Family Loca ion/dono popula ion
Spinacia ole acea Ama an haceae Mu cia, Spain/comme cial
Pis acia len iscus Anaca diaceae Mu cia, Spain/na u al
Daucus ca o a Apiaceae Mu cia, Spain/comme cial
Ne ium oleande Apocynaceae Mu cia, Spain/a i icial
A isa um ulga e A aceae Mu cia, Spain/na u al
Phoenix dac yli e a A ecaceae Mu cia, Spain/comme cial
Aloe e a Asphodelaceae Mu cia, Spain/a i icial
Lac uca sa i a As e aceae Mu cia, Spain/comme cial
Cyna a scolymus As e aceae Mu cia, Spain/comme cial
B assica ole acea
bo y is
B assicaceae Mu cia, Spain/comme cial
B assica ole acea
i alica
B assicaceae Mu cia, Spain/comme cial
Diplo axis e ucoides B assicaceae Mu cia, Spain/na u al
Lobula ia ma i ima B assicaceae Mu cia, Spain/na u al
A abidopsis haliana B assicaceae Mu cia, Spain/a i icial
Silene ulga is Ca yophyllaceae Mu cia, Spain/na u al
Cis us albidus Cis aceae Mu cia, Spain/na u al
Cis us he e ophyllus Cis aceae Mu cia, Spain/na u al
Aeonium a bo eum C assulaceae Mu cia, Spain/na u al
Cucumis sa i us Cucu bi aceae Biala Podlaska, Poland/
comme cial
Ecballium ela e ium Cucu bi aceae Mu cia, Spain/na u al
Chamaecypa is sp. Cup essaceae Mu cia, Spain/a i icial
A bu us unedo E icaceae Mu cia, Spain/a i icial
Ricinus communis Eupho biaceae Mu cia, Spain/a i icial
Ce a onia siliqua Fabaceae Mu cia, Spain/na u al
Pisum sa i um Fabaceae Mu cia, Spain/a i icial
Vicia aba Fabaceae Mu cia, Spain/a i icial
Que cus cocci e a Fagaceae Mu cia, Spain/na u al
Pela gonium ×
ho o um
Ge aniaceae Mu cia, Spain/a i icial
Leucob yum glaucum Leucob yaceae Biala Podlaska, Poland/
na u al
Anagallis a ensis My sinaceae Mu cia, Spain/na u al
Callis emos sp.My aceae Mu cia, Spain/a i icial
Olea eu opaea Oleaceae Mu cia, Spain/a i icial
Oxalis pes-cap ae Oxalidaceae Mu cia, Spain/na u al
Pinus sil es es Pinaceae Biala Podlaska, Poland/
na u al
An i hinum majus Plan aginaceae Mu cia, Spain/a i icial
Zea mays Poaceae Mu cia, Spain/comme cial
O yza sa i a Poaceae Mu cia, Spain/a i icial
Ho deum ulga e Poaceae Mu cia, Spain/comme cial
Pip a he um
miliaceum
Poaceae Mu cia, Spain/na u al
Po ulaca ia a a Po ulacaceae Mu cia, Spain/a i icial
Galium e ucosum Rubiaceae Mu cia, Spain/na u al
Populus alba Salicaceae Mu cia, Spain/a i icial
Pe unia hyb ida Solanaceae Mu cia, Spain/a i icial
Solanum ube osum Solenaceae Mu cia, Spain/comme cial
Table 1 (con inued)
Plan species Family Loca ion/dono popula ion
Solanum
lycope sicum
Solenaceae Mu cia, Spain/comme cial
Thymelaea hi su a Thymelaeaceae Mu cia, Spain/na u al
Vi is ini e a Vi aceae Mu cia, Spain/comme cial
Asphodelus is ulosus Xan ho hoeaceae Mu cia, Spain/na u al
Quan i ica ion o biases in QPCR and NGS 1843
o he PCR ampli ica ion p oduc plays a majo ole in bias
wi hin bac e ial communi y py osequencing p ojec s [23], he
size o he amplicons analyzed he e is below he 1-Kb h esh-
old iden i ied in hose s udies. Thus, we should be able o
sa ely exclude ha as a possible cause o bias in his s udy.
Sui abili y o ba codes depending on plan species
The wo s possible ou come o a me aba code analysis is
alse-nega i e, i.e., lack o ampli ica ion o a species ba code
despi e p esence o ha axon in he popula ion. As such, ou
i s analysis assessed PCR success. As expec ed, i a ied
bo h be ween ba code ma ke s and be ween he 48 plan spe-
cies es ed. Ba code p ime s o he ma K gene we e he leas
success ul, gi ing posi i e esul s in only 50 % o he es ed
species, ollowed by bcL which ampli ied in 82 % o species.
The poB and poC1 genes as well as he sho in e genic
space s nL-F and nH-psbA p o ed o be he mos uni e -
sally success ul ba coding ma ke s, ampli ying in close o
90 % o he in es iga ed species. Ou da a, howe e , gi es a
wi hin species assessmen o PCR success based on six inde-
penden ampli ica ions. As none o he samples had a com-
ple e ailu e o ampli ica ion wi h all p ime combina ions, we
can conclude ha DNA quali y was no a limi ing ac o o
ampli ica ion.
qPCR pa ame e s o speci ic ba codes depending on plan
species
The second phase o he analysis add essed whe he end poin
PCR esul s a e he ou come o PCR e iciency. As shown in
Fig. 1, ampli ica ion e iciency du ing qPCR a ied be ween
ba code ma ke s. The highes a e age e iciency, based on
ampli ica ion om all species, co esponded o he ma ke s
nL-F and nH-psbA ollowed by poB, poC1,and bcL.
The ma K ba code showed he lowes a e age e iciency
among all species. The e iciencies o ma K, bcL,and
poC1, bu no poB and nH-psbA, we e signi ican ly
di e en om high-e iciency ma ke nL-F (p<0.0001 o
ma K and bcLandp=0.0013 o poC1). PCR e iciencies
conside ing all ba code ma ke s o selec ed species a e sum-
ma ized in Table 2showing ha bo h he ba code a ge and
he species a e ampli ied om go e n e iciency.
As PCR success could be he esul o ini ial p iming and
some samples ga e no ampli ica ion, we compa ed he p im-
ing si e o he wo s pe o ming pai o p ime s (2.1. ma K
and 5 ma k) wi h hei co esponding p iming si es o nega-
i e pe o me s Zea mays,Que cus cocci e a,andB assica
ole acea,O yza sa i a as middle quali y, and Vi is ini e a
ha had he bes o e all ampli ica ion wi h his ma ke
(Fig. 2). Indeed, misp iming may explain he lack o ampli i-
ca ion in he case o Z. mays, bu i is no ob ious he di e -
ences in he o he samples. Fu he mo e, ampli ica ion e i-
ciency may be a ec ed by o he pa ame e s beyond p iming
(see below).
Looking a in a-species a ia ion o all ba codes, Cq
alues a ied widely in his case also (Fig. 3and Table 3).
Some ex eme cases o in aspeci ic a ia ion we e ound in
O yza sa i a whe e bcL showed no ampli ica ion, whe eas
nL-F had a Cq o 11.93 (Table 3). Beyond he alse-nega-
i es, o he impo an di e ences in Cq we e obse ed o he
a ious ma ke s. In O. sa i a, he di e ence in Cq be ween
ma K (28.55) and nL-F (11.93) is ex emely la ge. I one
we e o apply he del a-CT o mula [18], and assumed an
a e age e iciency o bo h ma ke s (e iciency=1.9), he p e-
dic ed di e ences in s a ing DNA le el would be 2,116- old
based on he es ima es om hese wo ba codes. This was no
an isola ed case as we ound nega i e ampli ica ion o bcL o
ma K and posi i e albei di e ing Cq alues in 20 % o he
species es ed o his pa ame e (Z. mays,Daucus ca o a,
Q. cocci e a,andAsphodelus is ulosa).
Cq alues also a ied signi ican ly among species consid-
e ing all six ma ke s oge he , and hese di e ences did no
co ela e wi h he a e age e iciency o he PCR ampli ica ion.
Fo example, Z. mays exhibi ed an a e age e iciency o e all
ba codes o 1.88±0.08 and an a e age Cq o 30.76±4.67,
while Solanum ube osum exhibi ed a simila a e age e i-
ciency o 1.86±0.15, ye had a Cq o 15.98±5.30.
Mo eo e , o any gi en ba code, PCR e iciency and Cq
alues also p o ed o be independen a iables, based on e-
g ession analysis (R
2
be ween 0.37 and 0.003).
Di e ences in e iciency o Cq may be ela ed o ampli i-
ca ion bias among empla e DNAs in en i onmen al samples.
We analyzed abundance o eads a e sequencing in o de o
add ess his ques ion.
Biases du ing p e-ampli ica ion and du ing emulsion PCR
The iden i ica ion o genomic DNAs co esponding o di e -
en o ganisms in en i onmen al samples equi es sequencing
o ba code-PCR p oduc s. No all ba codes success ully
.
.
.
.
.
.
.
Fig. 1 Boxplo o PCR e iciency da a o six ba coding ma ke s de i ed
om qPCRs o 48 plan species. The g aphic shows only success ul
ampli ica ion da a wi h an e iciency >1
1844 M. Pawluczyk e al.
ampli y in each species. Table 4shows he esul o simul a-
neous sequencing o equal amoun s o PCR p oduc s om
mixed-species empla es ampli ied wi h ba code ma ke s,
bcL, poB, and poC1. The esul s e eal a s ong bias in
he numbe o eads co esponding each species con ained in
he equimola s a ing sample. In he case o ma ke poB,
mos eads (95 %) co esponded o S. ube osum and only
0.02 % o Z. mays. The numbe o eads was no ela ed o
he PCR e iciencies o he species bu was ela ed o hei Cq
alues when ampli ied sepa a ely (Table 4).
Analysis o ead numbe s also showed a s ong bias in he
numbe o o al eads co esponding o each o he ba codes
(Table 4). Al hough equal amoun s o PCR p oduc om p e-
ampli ica ion we e used o c ea e he amplicon lib a y, only
11.2 % o all eads we e iden i ied as bcL agmen s, 36.5 %
as poB agmen s, and 52.3 % as poC1 agmen s. These
Table 2 PCR e iciency e alua ed in a selec ion o plan species
Plan amily bcL-a ma K poC1 poB nL-F nH-psbA A e age ± SD
Oxalidaceae (Oxalis pes-cap ae) 1.89 1.83 1.70 1.78 1.91 1.90 1.84±0.08
Cis aceae (Cis us he e ophyllus) 1.83 1.80 1.66 1.71 1.90 1.95 1.81±0.11
Poaceae (Zea mays) 1.85 NA 1.72 1.97 1.80 1.91 1.85±0.10
Oleaceae (Olea eu opaea) 1.76 1.51 1.79 1.88 1.93 1.95 1.80±0.16
Salicaceae (Populus alba) 1.78 1.78 1.78 1.89 1.98 1.98 1.87±0.10
Poaceae (O yza sa i a) NA 1.82 1.79 1.72 1.98 1.81 1.82±0.10
Apiaceae (Daucus ca o a) 1.94 NA 1.85 2.00 1.98 2.00 1.95±0.06
Solananceae (Solanum ube osum) 1.70 1.70 1.85 1.84 1.95 2.00 1.80±0.12
Sc ophula iaceae (An i hinum majus) 1.79 1.82 1.98 1.99 2.00 2.00 1.93±0.1
A ecaceae (Phoenix dac yli e a) 1.87 1.90 1.97 1.97 2.00 1.84 1.92±0.06
Cucu bi aceae (Cucumis sa i us) 1.84 1.80 1.91 1.99 1.98 1.91 1.9±0.07
Ama an haceae (Spinacia ole acea) 1.90 1.42 1.99 2.00 2.00 1.99 1.88±0.23
Vi ales (Vi is ini e a) 1.82 1.85 1.75 1.94 1.89 1.95 1.87±0.08
Solanaceae (Pe unia hyb ida) 1.73 1.73 1.86 1.85 1.93 1.94 1.84±0.09
Fabaceae (Ce a onia silique) 1.83 1.70 1.84 1.79 1.91 1.91 1.83±0.08
Fagaceae (Que cus cocci e a) NA NA 1.68 1.72 1.90 1.86 1.79±0.11
Thymelaeaceae (Thymelea hi su a) 1.88 NA 1.73 1.78 1.81 1.75 1.79±0.06
Xan ho hoeaceae (Asphodelus is ulosus) 1.81 NA 1.73 1.76 1.78 1.84 1.78±0.04
B asicaceae (B assica ole acea) 1.70 NA 1.76 1.82 1.76 1.67 1.74±0.06
As e aceae (Cyna a Scolymus) 1.49 1.62 1.50 1.49 1.49 1.40 1.5±0.07
A e age 1.80 1.73 1.79 1.84 1.89 1.88
S anda d de ia ion 0.10 0.14 0.12 0.13 0.12 0.14
Samples wi h NA we e non-success ul PCR ampli ica ions
O yza_sa i a GACTATTTCGGTTCCTATATAACTCTTATGTATCAG 36
5 ma k ------------------------------------
Phoenix_dac yli e a GAGTATTTCGGTTCCCATATAATTCTTATGTATCTG 36
Que cus_cocci e a GATTCTTTTTATTCCTATATAATTCTTATATATGTG 36
Vi is_ ini e a GATTTTTCTTATTCCTATATAATTTTCATGTATGTG 36
B assica_ole acea GATTTTTATTGTTCTTATATAATTCTCATGTATGTG 36
2.1 ma k -------CCTATCCATCTGGAAATCTTAG------- 22
Zea_mays GTGCCTTTTGATGCA---AGAATTGCCTTTCCTTGA 33
..1120......1130......1140......1150
O yza_sa i a TATATACTTCGACTTTCATGCGCTAGAACTTTAGCT 36
5 ma k ---------CGACTTTCTTGTGCTAGAAC------- 20
Phoenix_dac yli e a ------------------------------------
Que cus_cocci e a TATATACTTCGGCTTTCTTGTGTTAAAACTTTGGCC 36
V
i is_ ini e a TATATACTTCGACTTTCTTGTGCTCGAACTTTGGCT 36
B assica_ole acea TATATACTTCGTCTTTGTTGTGTTAAAACTTTGGCT 36
2.1 ma k ------------------------------------
Zea_mays TA-ATTAACCGAATTAATTAAAAAATTCTGCTGATA 35
0......1800......1810......1820.....
Fig. 2 Annealing o p ime s
2.1 -ma k and 5 ma k o
sequences ende ing nega i e
ampli ica ion (Que cus cocci e a,
B assica ole acea,andZea mays)
and posi i e ampli ica ion (O yza
sa i a,Vi is ini e a,andPhoenix
dac yli e a)
Quan i ica ion o biases in QPCR and NGS 1845
esul s a e signi ican ly di e en om an expec ed 33.3 % pe
eac ion (chi-squa e es p<2.2 e-16). The ela i e pe cen ages
in ead numbe p o ed independen o PCR e iciencies o he
speci ic ma ke s bu co ela ed wi h a e age Cq alues o he
ma ke o he h ee species ampli ied.
As emulsion PCR o NGS sequencing is pe o med
wi h p ime s ha co espond o liga ed adap o s, and ne -
e heless a ela ionship be ween Cq alues and inal
numbe o eads is main ained, we can conclude ha he
main bias ha can be encoun e ed in me aba coding p o-
jec s is ela ed o he speci ic sequence o he ba code
agmen . This seems o be independen o any p ime -
speci ic e ec such as in e nal p iming, e c., as i is con-
sis en o e wo di e en p ime pai s. Lib a y cons uc-
ion can p oduce a leas 4.6- old di e ences when com-
pa ing bcL agains poC1.
Fig. 3 Boxplo o Cq alues o
six ba coding ma ke s de i ed
om qPCRs o 48 plan species
Table 3 Cq qPCR alues ob ained in a selec ion o plan species
Plan amily bcL-a ma K poC1 poB nL-F nH-psbA A e age ± SD
Oxalidaceae (Oxalis pes-cap ae) 30.99 36.24 22.63 23.44 19.41 27.76 26.75±6.18
Cis aceae (Cis us he e ophyllus) 25.83 28.80 24.85 25.01 16.74 18.86 23.35±4.58
Poaceae (Zea mays) 34.74 NA 22.35 25.17 20.15 26.06 25.69±5.57
Oleaceae (Olea eu opaea) 26.05 23.86 17.82 15.18 16.74 17.52 19.53±4.36
Salicaceae (Populus alba) 24.13 29.89 15.29 13.82 13.25 13.90 18.38±6.96
Poaceae (O yza sa i a) NA 28.55 14.52 22.77 11.93 25.02 20.56±7.06
Apiaceae (Daucas ca o a) 15.82 NA 13.06 9.77 20.15 25.95 26.95±6.31
Solananceae (Solanum ube osum) 16.77 20.55 10.16 8.65 10.53 10.90 12.93±4.66
Sc ophula iaceae (An i hinum majus) 27.81 33.83 13.06 12.72 12.06 15.08 19.09±9.34
A ecaceae (Phoenix dac yli e a) 31.39 16.06 10.81 15.32 10.12 19.95 17.28±7.81
Cucu bi aceae (Cucumis sa i us) 27.17 29.71 9.89 9.13 9.02 23.57 18.08±9.77
Ama an haceae (Spinacia ole acea) 29.66 19.59 8.94 25.32 9.40 10.40 17.22±8.97
Vi ales (Vi is ini e a) 33.15 18.17 17.65 13.66 13.88 15.48 18.67±7.34
Solanaceae (Pe unia hyb ida) 28.38 19.47 11.02 10.28 10.42 11.03 15.10±7.40
Fabaceae (Ce a onia silique) 32.84 23.26 16.13 18.73 14.99 20.09 21.01±6.50
Fagaceae (Que cus cocci e a) NA NA 23.39 18.43 17.06 25.14 21.01±3.87
Thymelaeaceae (Thymelea hi su a) 29.52 NA 14.70 24.30 16.52 27.4 22.49±6.58
Xan ho hoeaceae (Asphodelus is ulosus) 26.73 NA 19.38 18.13 18.91 22.84 21.20±3.58
B asicaceae (B assica ole acea) 24.55 NA 14.76 13.57 14.35 21.83 17.81±5.02
As e aceae (Cyna a Scolymus) 34.47 32.27 23.89 23.45 23.27 22.94 26.72±5.21
A e age 27.78 25.73 16.22 17.34 14.95 20.09
S anda d de ia ion 5.28 6.41 5.09 5.90 4.13 5.69
Samples wi h NA co espond o unsuccess ul ampli ica ions
1846 M. Pawluczyk e al.
Discussion
Simila i ies be ween p ime and empla e, as well as he e-
gional G + C con en o a empla e, a e ac o s ha in luence
PCR e iciency [22,24]. The low PCR success, pa icula ly in
case o ma K wi h 50 % PCR ailu e in a sc eening o 48
species, is p obably due o lack o simila i y be ween p ime
and empla e, since no highly conse ed si es lanking he
mos a iable pa s o his ba coding ma ke exis [7].
Indeed, indels and misp iming may accoun o lack o suc-
cess in PCR ampli ica ion (see Fig. 2). Howe e , i is no a
s aigh o wa d assessmen o unde s and he lack o ampli i-
ca ion ha may be also he esul o speci ic ea u es o he
DNA s and ampli ied.
The Cq pa ame e is widely used in qPCR analysis [17,
18], and we applied his o assess in aspeci ic and in e spe-
ci ic a iabili y in bo h PCR success and as a possible pa am-
e e o es ima e inal ead numbe s in NGS expe imen s.
Su p isingly, he e was a wide ange o Cq alues iden i ied
wi hin a single species, and e en wi hin a single DNA ex ac-
ion, some hing comple ely unexpec ed as Cq alues a e
hough o ela e o DNA/cDNA quan i ies. These anges we e
a beyond he 1–2 cycles ha migh a ise om sampling and
manipula ion e o s.
Ou esul s show ha PCR e iciency a ies among
ba coding ma ke s and species bu ha hese di e ences in
e iciency do no ela e o he co esponding Cq alues as
measu e o PCR success. The Cq alues in con as p o ed
o be a aluable pa ame e o he es ima ion o PCR success
as ma K and bcL showed he highes Cq alues du ing qPCR.
The la e ake-o in he qPCR assay o bcL and ma K p ob-
ably e lec an excess o misma ches be ween p ime s and
empla es as Cq alues also a ied signi ican ly among species
o e he whole ange o ma ke s ha may be ela ed o DNA
quali y and/o PCR inhibi ing subs ances con ained in he
sample.
One o he mos common aims in analyzing en i onmen al
samples is o es ima e he ela i e abundance o species based
on de e mining he quan i y o hei empla e DNAs. In p in-
ciple, equal amoun s o empla e DNA om di e en species
should lead o 1:1 amplicon numbe s. Howe e , Suzuki and
Gio annoni (1996) obse ed p e e en ial ampli ica ion o ce -
ain bac e ial agmen s in mixed empla es wi h lowe G + C
con en [23]. Ou esul s show he si ua ion is simila in plan s,
wi h a s ong bias in ela i e ead numbe among h ee species
a e Ion To en sequencing. Low ead numbe s co esponded
o species wi h high Cq alues o a gi en ma ke , whe eas
PCR e iciency seemed un ela ed, indica ing ha species wi h
lowe Cqs o a gi en ma ke a e p e e en ially ampli ied.
As such, u he imp o ing he eliabili y o ampli ica ion
and u iliza ion o sequence con en ea u es o de i e and ap-
ply quan i a i e da a no maliza ion algo i hms a e ce ainly
a eas o signi ican in e es o u u e de elopmen in
me aba coding and NGS analysis.
Acknowledgmen s This wo k was pe o med as pa ial ul illmen o
he PhD o Ma a Pawluczyk. This wo k was unded by he Comunidad
Au ónoma de la Región de Mu cia P ojec “Molecula ma ke s in con-
se a ion and managemen o he lo a o Mu cia Region”(“Ma cado es
molecula es en conse ación y ges ión de la lo a mu ciana”). Pa o he
wo k was pe o med unde he P oyec o Vi alis Campus Ma e Nos um
Table 4 A e age PCR e iciencies (PCR
e
,), Cq alues, and sequence eads de i ed om PCR p oduc s o ba codes bcL, poB,and poC1 using ion
semiconduc o sequencing
Ba coding locus
bcL % o eads PCR
e
o he species Cq o he species
A e age PCR
e
o he ampli ied species ( oge he ) 1.81±0.09 Oxalis pes-cap ae 0.87 1.89±0.04 30.99±0.82
A e age Cq o he ampli ied species ( oge he ) 26.97±7.52 Vi is ini e a 4.21 1.82±0.02 33.15±0.78
To al eads 34,239 Solanum ube osum 94.92 1.69±0.04 16.77±0.88
% o o al eads 11.2
poB
A e age PCR
e
o he ampli ied species ( oge he ) 1.85±0.14 Zea mays 0.02 1.71±0.13 25.01±0.7
A e age Cq o he ampli ied species ( oge he ) 21.79±5.00 Cis us he e ophyllus 1.13 1.97±0.06 25.17±0.27
To al eads 111,407 Olea eu opaea 98.85 1.86±0.01 16.28±0.26
% o o al eads 36.5
poC1
A e age PCR
e
o he ampli ied species ( oge he ) 1.74±0.06 Cis us he e ophyllus 0.34 1.66±0.04 24.85±1.24
A e age Cq o he ampli ied species ( oge he ) 18.22±4.96 O yza sa i a 36.57 1.79±0.02 14.52±0.54
To al eads 159,923 Populus alba 63.09 1.78±0.03 15.29±1.51
% o o al eads 52.3
Quan i ica ion o biases in QPCR and NGS 1847
“Espacio Medi e áneo de In es igación en Red en Alimen os y Salud”-
CEI10-2-0002.
Da a a ailabili y Raw and p ocessed da a will be made publicly a ail-
able ia en ies in Da a D yad, and a o mal Da a Desc ip o will be
published de ailing he me hodologies and wo k lows used, as well as
ich desc ip ions o he da a elemen s hemsel es. The analy ical
wo k low o sequence p ocessing and mapping is al eady publicly a ail-
able as a Galaxy wo k low, as desc ibed in he manusc ip , and can be
eely e- un a any ime. The analysis can be ep oduced, wi h he same
pa ame e s and da a, a he ollowing Galaxy ins alla ion (page: h p://
bio d .o g:8983/u/mikel-egana-a angu en/p/sou ces-o -bias-in-applying-
ba coding-ma ke s- o -sequence-analysis-o -en i onmen al-samples).
Re e ences
1. Hajibabaei M, Singe GAC, Hebe PDN, Hickey DA (2007) DNA
ba coding: how i complemen s axonomy, molecula phylogene ics
and popula ion gene ics. T ends Gene 23:167–172. doi:10.1016/J.
Tig.2007.02.001
2. Hajibabaei M, Janzen DH, Bu ns JM e al (2006) DNA ba codes
dis inguish species o opical Lepidop e a. P oc Na l Acad Sci U S
A 103:968–971
3. Pawluczyk M, Weiss J, Vicen e-Colome MJ, Egea-Co ines M
(2012) Two alleles o poB and poC1 dis inguish an endemic
Eu opean popula ion om Cis us he e ophyllus and i s pu a i e hy-
b id (C. x clausonis) wi h C. albidus. Plan Sys E ol 298:409–419
4. K üge M, S ockinge H, K üge C e al (2009) DNA-based species
le el de ec ion o Glome omyco a: one PCR p ime se o all
a buscula myco hizal ungi. New Phy ol 183:212–223. doi:10.
1111/j.1469-8137.2009.02835.x
5. Links MG, Dumonceaux TJ, Hemmingsen SM, Hill JE (2012) The
chape onin-60 uni e sal a ge is a ba code o bac e ia ha enables
de no o assembly o me agenomic sequence da a. PLoS ONE 7:
e49755. doi:10.1371/jou nal.pone.0049755
6. Hollingswo h PM, Fo es LL, Spouge JL e al (2009) A DNA
ba code o land plan s. P oc Na l Acad Sci U S A 106:12794–
12797. doi:10.1073/Pnas.0905845106
7. K ess WJ, E ickson DL (2007) A wo-locus global DNA ba code o
land plan s: he coding bcL gene complemen s he non-coding nH-
psbA space egion. PLoS ONE 2:e508. doi:10.1371/jou nal.pone.
0000508
8. Links MG, Chaban B, Hemmingsen SM e al (2013) mPUMA: a
compu a ional app oach o mic obio a analysis by de no o assembly
o ope a ional axonomic uni s based on p o ein-coding ba code se-
quences. Mic obiome 1:23. doi:10.1186/2049-2618-1-23
9. Coissac E, Riaz T, Puilland e N (2012) Bioin o ma ic challenges o
DNA me aba coding o plan s and animals. Mol Ecol 21:1834–1847.
doi:10.1111/j.1365-294X.2012.05550.x
10. Amend AS, Sei e KA, B uns TD (2010) Quan i ying mic obial
communi ies wi h 454 py osequencing: does ead abundance coun ?
MolEcol19:5555–5565. doi:10.1111/j.1365-294X.2010.04898.x
11. Links MG, Demeke T, G ä enhan T e al (2014) Simul aneous p o-
iling o seed-associa ed bac e ia and ungi e eals an agonis ic in e -
ac ions be ween mic oo ganisms wi hin a sha ed epiphy ic
mic obiome on T i icum and B assica seeds. New Phy ol. doi:10.
1111/nph.12693
12. Pla s AE, Johnson GD, Linnemann AK, K awe z SA (2008) Real-
ime PCR quan i ica ion using a a iable eac ion e iciency model.
Anal Biochem 380:315–322
13. P a l MW, Ho gan GW, Demp le L (2002) Rela i e exp ession so -
wa e ool (REST(C)) o g oup-wise compa ison and s a is ical anal-
ysis o ela i e exp ession esul s in eal- ime PCR. Nucleic Acids
Res 30:e36. doi:10.1093/na /30.9.e36
14. Mallona I, Weiss J, Egea-Co ines M (2011) pc E iciency: a Web
ool o PCR ampli ica ion e iciency p edic ion. BMC Bioin o ma
12:404. doi:10.1186/1471-2105-12-404
15. D’haene B, Vandesompele J, Hellemans J (2010) Accu a e and ob-
jec i e copy numbe p o iling using eal- ime quan i a i e PCR.
Me hods 50:262–270. doi:10.1016/j.yme h.2009.12.007
16. Von Hols C, Boix A, Ma ien A, P ado M (2012) Fac o s in luencing
he accu acy o measu emen s wi h eal- ime PCR: he example o he
de e mina ion o p ocessed animal p o eins. Food Con ol 24:142–
147. doi:10.1016/j. oodcon .2011.09.017
17. Bus in SA, Benes V, Ga son JA e al (2009) The MIQE guidelines:
minimum in o ma ion o publica ion o quan i a i e eal- ime PCR
expe imen s. Clin Chem 55:611–622. doi:10.1373/clinchem.2008.
112797
18. Schmi gen TD, Li ak KJ (2008) Analyzing eal- ime PCR da a by
he compa a i e CT me hod. Na P o oc 3:1101–1108. doi:10.1038/
np o .2008.73
19. Ma dis ER (2008) The impac o nex -gene a ion sequencing ech-
nology on gene ics. T ends Gene 24:133–141. doi:10.1016/j. ig.
2007.12.007
20. Ri z C, Spiess AN (2008) qpcR: an R package o sigmoidal model
selec ion in quan i a i e eal- ime polyme ase chain eac ion analysis.
Bioin o ma ics 24:1549–1551. doi:10.1093/Bioin o ma ics/B n227
21. Spiess A-N, Feig C, Ri z C (2008) Highly accu a e sigmoidal i ing
o eal- ime PCR da a by in oducing a pa ame e o asymme y.
BMC Bioin o ma 9:221. doi:10.1186/1471-2105-9-221
22. Polz MF, Ca anaugh CM (1998) Bias in empla e- o-p oduc a ios in
mul i empla e PCR. Appl En i on Mic obiol 64:3724–3730
23. Suzuki M, Gio annoni S (1996) Bias caused by empla e annealing
in he ampli ica ion o mix u es o 16S RNA genes by PCR. Appl
En i on Mic obiol 62:625–630
24. Beni a Y, Oos ing RS, Lok MC, e al. (2003) Regionalized GC con-
en o empla e DNA as a p edic o o PCR success. 31:1–7. doi: 10.
1093/na /gng101
1848 M. Pawluczyk e al.