scieee Science in your language
[en] (orig)

Spatial and temporal characterization of the rich fraction of plastid DNA present in the nuclear genome of Moringa oleifera reveals unanticipated complexity in NUPTs' formation

Author: Salmerón Cerdán, Antonio,Marczuk Rojas, Juan Pablo,Álamo-Sierra, Angélica,Alcayde García, Alfredo,Isanbaev, Viktor,Carretero Paulet, Lorenzo
Publisher: Universidad de Almería
Year: 2024
DOI: 10.1186/s12864-024-09979-5
Source: https://repositorio.ual.es/bitstream/10835/18027/1/s12864-024-09979-5.pdf
Ma czuk‑Rojase al. BMC Genomics (2024) 25:60
h ps://doi.o g/10.1186/s12864‑024‑09979‑5
RESEARCH
Spa ial and empo al cha ac e iza ion
o  he ich ac ion o plas id DNA p esen
in henuclea genome o Mo inga olei e a
e eals unan icipa ed complexi y inNUPTs´
o ma ion
Juan Pablo Ma czuk‑Rojas1,2 , Angélica Ma ía Álamo‑Sie a1,2, An onio Salme ón3 , Al edo Alcayde4 ,
Vik o Isanbae 4 and Lo enzo Ca e e o‑Paule 1,2*
Abs ac
Backg ound Beyond he massi e amoun s o DNA and genes ans e ed om he p o oo ganelle genome
o he nucleus du ing he endosymbio ic e en ha ga e ise o he plas ids, s e ches o plas id DNA o a ying
size a e s ill being copied and eloca ed o he nuclea genome in a p ocess ha is ongoing and does no esul
in he concomi an sh inking o he plas id genome. As a esul , plan nuclea genomes ea u e small, bu a iable,
ac ion o hei genomes o plas id o igin, he so‑called nuclea plas id DNA sequences (NUPTs). Howe e , he mech‑
anisms unde lying he o igin and ixa ion o NUPTs a e no ye ully elucida ed and esea ch on he opic has been
mos ly ocused on a limi ed numbe o species and o plas id DNA.
Resul s He e, we le e aged a ch omosome‑scale e sion o he genome o he o phan c op Mo inga olei e a, which
ea u es he la ges ac ion o plas id DNA in any plan nuclea genome known so a , o gain insigh s in o he mech‑
anisms o o igin o NUPTs. Fo his pu pose, we examined he ch omosomal dis ibu ion and a angemen o NUPTs,
we explici ly modeled and es ed he co ela ion be ween hei age and size dis ibu ion, we cha ac e ized hei si es
o o igin a he chlo oplas genome and hei si es o inse ion a he nuclea one, as well as we in es iga ed hei
a angemen in clus e s. We ound a bimodal dis ibu ion o NUPT ela i e ages, which implies NUPTs in mo inga we e
o med h ough wo sepa a e e en s. Fu he mo e, NUPTs om e e y e en showed ma kedly dis inc i e ea u es,
sugges ing hey o igina ed h ough dis inc mechanisms.
Conclusions Ou esul s e eal an unan icipa ed complexi y o he mechanisms a he o igin o NUPTs and o he
e olu iona y o ces behind hei ixa ion and highligh mo inga species as an excep ional model o assess he impac
o plas id DNA in he e olu ion o he a chi ec u e and unc ion o plan nuclea genomes.
Keywo ds Mo inga, NUPTs, Plas id DNA, Chlo oplas , Genome E olu ion
Open Access
© The Au ho (s) 2024. Open Access This a icle is licensed unde a C ea i e Commons A ibu ion 4.0 In e na ional License, which
pe mi s use, sha ing, adap a ion, dis ibu ion and ep oduc ion in any medium o o ma , as long as you gi e app op ia e c edi o he
o iginal au ho (s) and he sou ce, p o ide a link o he C ea i e Commons licence, and indica e i changes we e made. The images o
o he hi d pa y ma e ial in his a icle a e included in he a icle’s C ea i e Commons licence, unless indica ed o he wise in a c edi line
o he ma e ial. I ma e ial is no included in he a icle’s C ea i e Commons licence and you in ended use is no pe mi ed by s a u o y
egula ion o exceeds he pe mi ed use, you will need o ob ain pe mission di ec ly om he copy igh holde . To iew a copy o his
licence, isi h p://c ea i ecommons.o g/licenses/by/4.0/. The C ea i e Commons Public Domain Dedica ion wai e (h p://c ea i ecom‑
mons.o g/publicdomain/ze o/1.0/) applies o he da a made a ailable in his a icle, unless o he wise s a ed in a c edi line o he da a.
BMC Genomics
*Co espondence:
Lo enzo Ca e e o‑Paule
[email p o ec ed]
Full lis o au ho in o ma ion is a ailable a he end o he a icle
Page 2 o 11
Ma czuk‑Rojase al. BMC Genomics (2024) 25:60
Backg ound
Nea ly all plan s con ain a small, bu signi ican , ac ion
o hei nuclea genomes composed o DNA sequences
de i ed om hei chlo oplas s [1]; hese nuclea in e-
g an s o plas id DNA a e commonly known as nuclea
plas id DNA sequences (NUPTs) [2] The p ocess o
NUPTs´ o ma ion has been commonly associa ed o
he p ocess by which mos genes p esen in he bac e-
ial ances o o plas ids we e ans e ed o he nuclea
genome and hei p oduc s e en ually e a ge ed o
hei ances al compa men a e he endosymbio ic
e en ha ga e ise o he chlo oplas o ganelle. How-
e e , whe eas he la e en ails he loss o as amoun s
o DNA wi h he subsequen educ ion o i s size and
he ans e o mos o he genes o iginally p esen in he
p o oo ganelle o ganism o he nuclea genome [3, 4], he
o me in ol es he copy o s e ches o DNA om he
chlo oplas genome. E en hough mos NUPTs a e less
han 1 kb in leng h, NUPTs o ecen o igin spanning he
whole chlo oplas ch omosome ha e been de ec ed in
O yza sa i a ( ice) and Populus ichoca pa [5, 6], and
did no esul in he sh inking o he plas id genome.
Al hough he p ocess o NUPTs’ o ma ion is s ill
poo ly unde s ood, i is expec ed o in ol e he ollowing
sequence o e en s. Fi s , he duplica ion o a s e ch o
DNA p esen in he chlo oplas genome. Second, he lysis
o chlo oplas o ganelle memb anes o allow he leak-
age o duplica ed plas id DNA. Thi d, he impo o he
nucleus o he leaked plas id DNA. Fou h, he in eg a-
ion o plas id DNA in o he nuclea genome. A p esen ,
no mechanism has been o mally p oposed o explain
he ecu en duplica ion o s e ches o plas id DNA o
a ying sizes ha a e a he o igin o NUPTs. The biologi-
cal mechanisms in ol ed in he leakage o plas id DNA o
he cy oplasm and i s subsequen impo by he nucleus
a e no ye comple ely elucida ed ei he , al hough game-
ogenesis and cell s ess (especially pollen de elopmen
and mild hea s ess, espec i ely) ha e been epo ed
o induce he dis up ion o chlo oplas o ganelle mem-
b anes [2, 7–10]. I has been also sugges ed ha ce ain
kinds o s esses, such as ionizing adia ion and pa hogen
in ec ions, may, no only igge he leakage o plas id
DNA o he nucleocy osolic compa men , bu also a o
i s in eg a ion in o he nuclea genome [11]. The molecu-
la mechanisms o NUPTs´ in eg a ion in o he nuclea
genome a e no ully desc ibed ei he , bu hey a e p ob-
ably di e se and gene ally in ol e double-s anded
b eaks (DSBs) and DNA damage and hus a e po en ially
mu agenic. Fo example, i has been hypo hesized ha
NUPTs´ in eg a ion is media ed by non-homologous
end joining (NHEJ) du ing DSB epai e en s [12–14],
mos NUPTs a e expec ed o be apidly agmen ed
and shu led away h ough ansposi ions and genome
a angemen s and, e en ually, pu ged om he nuclea
genome [15–17]. As a consequence, he dis ibu ion o
NUPTs by age should ollow an exponen ial dis ibu-
ion, indica ing a con inuous a e o NUPTs’ o ma ion
and decay h oughou ime [15]. Al hough such a pa -
e n has been sugges ed o ice, Medicago unca ula, P.
ichoca pa and Zea mays [15, 17, 18], di e en pa e ns
ha e been obse ed in o he species such as A abidopsis,
Ca ica papaya, F aga ia esca, Mo inga olei e a (mo -
inga) and Vi is ini e a [17–19]. A second consequence
is he expec ed posi i e co ela ion be ween NUPTs’ size
and age, an obse a ion ha has been sugges ed o se -
e al species, despi e no being explici ly es ed s a is i-
cally [7, 16, 17, 20].
Indeed, he ac ion o nuclea genomes occupied
by NUPTs a ies eno mously among species and e en
wi hin di e en popula ions o he same species [5, 21,
22]. Mos species showed a ound 0.1% o plas id DNA in
hei nuclea genome, wi h e y ew showing mo e han
1% [1] These la ge a ia ions in he ac ion o nuclea
genomes occupied by NUPTs aise he ques ion o wha
e olu iona y o ces may lie behind he ixa ion o a i-
able ac ions o plas id DNA in plan nuclea genomes.
Howe e , p e ious s udies on he mechanisms o o igin
and e olu iona y a e o NUPTs we e mos ly ocused on
a limi ed numbe o species and in ol ed a educed num-
be o NUPTs. A mo e de ailed pic u e will ce ainly ben-
e i om a la ge numbe o NUPTs and a highe ac ion
o he nuclea genome occupied by plas id DNA.
So a , he la ges ac ion o DNA o plas id o igin
ound in any plan nuclea genome (4.71%) has been
de ec ed in he o phan c op mo inga [19]. In he p esen
s udy, we le e aged a ecen ch omosome-scale e sion
o he mo inga genome [23] o examine he spa ial dis i-
bu ion and a angemen in clus e s o NUPTs, o explic-
i ly model and es he co ela ion be ween hei age and
size dis ibu ion, o cha ac e ize hei o igin wi hin he
chlo oplas genome and hei si es o inse ion a he
nuclea one, as well as o in es iga e hei a angemen in
clus e s. Ou esul s e eal an unan icipa ed complexi y
o he mechanisms a he o igin o NUPTs as well as o
he e olu iona y o ces behind hei ixa ion.
Resul s
Widesp ead dis ibu ion o NUPTs in hemo inga nuclea
genome
In o de o de ec NUPTs p esen in he mo inga nuclea
genome, a ch omosome-scale assembly o he mo inga
genome, AOCC 2 [23], was scanned using BLASTN
and he mo inga chlo oplas genome sequence (NCBI
Re Seq numbe : NC_041432.1) [24] as que y, esul ing
in 13,901 o al alignmen s. We isually inspec ed he
alignmen s and de ec ed a signi ican ac ion o hem
Page 3 o 11
Ma czuk‑Rojase al. BMC Genomics (2024) 25:60
(8657; 62.28%) a ising om wo speci ic egions o he
chlo oplas genome. Those wo egions we e 200 bp and
350 bp in leng h and we e essen ially composed by As
and Ts (Addi ionalFile1), hus likely co esponding o
low complexi y egions, which a e known o esul in
spu ious alignmen s no e lec ing ue homology bu
a i ac s. Indeed, BLASTN sea ches on NCBI da abases
using hose wo egions as que ies esul ed in ma ches
o seemingly un ela ed genomes wi h high pe cen o
iden i y, indica ing hey p obably co espond o a i ac s
( esul s no shown). The e o e, we e an BLASTN wi h
he -dus op ion u ned on in o de o mask alignmen s
esul ing om low complexi y egions. 5203 NUPTs we e
now de ec ed, which we e con iden ly de ined as NUPTs
in ou analysis (Supplemen al Table S1). 11 ou o he
14 ch omosomes hos ed mo e han 100 NUPTs ( ang-
ing om 118 o 1072) and se en ch omosomes plus one
sca old con ained NUPTs summing up abo e 160,600 bp
(i.e., he size o he mo inga chlo oplas genome) (Sup-
plemen al Table S1).
The o al aligned egion be ween he chlo oplas
genome and he nuclea genome, i.e., he o al egion o
he nuclea genome occupied by NUPTs, summed up
a o al o 9,781,275 bp, which ep esen s a 4.14% o he
size o he nuclea genome assembly, close o es ima ions
ob ained wi h p e ious e sions o he genome [25–27]
(Table1). A e co ec ing o edundancy in BLASTN
hi s esul ing om In e ed Repea (IR) egions o he
mo inga chlo oplas genome (1272), he ac ion o he
mo inga nuclea genome co esponding o NUPTs was o
3.29%, again p e y simila o es ima ions ob ained wi h
he h ee o he e sions o he mo inga genome [25–27]
(Table1), and u he suppo ing hese esul s we e no
due o genome assembly e o s.
Mos NUPTs inmo inga o igina ed h ough wo dis inc
o ma ion episodes sepa a ed in ime
In o de o gain insigh s on he iming o plas id DNA
acquisi ion by he mo inga nuclea genome, we exam-
ined he ela i e age dis ibu ion o NUPTs using he
pe cen iden i y o he co esponding BLASTN hi s as a
p oxy o e olu iona y ime. Assuming he mu a ion a e
is p opo ional o e olu iona y ime, i. e., he molecula
clock hypo hesis holds, he lowe he pe cen iden i y,
he olde he NUPTs. Pe cen iden i y o BLASTN hi s
anges be ween 72.37 and 100% and shows an appa en
bimodal dis ibu ion (Fig. 1A). Indeed, when Gaussian
mix u e models we e i ed o he co esponding den-
si y cu es, wo clea peaks, cen e ed a ound 79.05 and
93.1%, espec i ely, we e de ec ed (Fig. 1A). Acco d-
ing o he pos e io p obabili ies o assigning a NUPT
o ei he one o ano he peak, using a h eshold o 95%,
776 NUPTs (14.91% o he o al) summing up a o al o
253,096 bp (2.59% o he o al) belonged o he olde peak
( om now on Episode I, o NUPTs-I), while 3855 NUPTs
(74.09% o he o al) summing up a o al o 9,189,682 bp
(93.95% o he o al) belonged o he younge peak ( om
now on Episode II o NUPTs-II). The es o NUPTs (572,
summing up a o al o 338,497 bp, i.e., 3.46% o he o al)
we e no con iden ly assigned o ei he one o he o he
peak. Taking as a whole, hese esul s suppo wo main
episodic o ma ion e en s a he o igin o mos NUPTs.
Nex , we examined he size dis ibu ion o NUPTs, pa -
i ioned by each o he e ie ed episodes. While NUPTs-
I anged in size om 69 o 3591 bp, NUPTs-II anged
om 33 o 71,935 bp (Fig.1B). Bo h ollowed a non-no -
mal igh -skewed unimodal dis ibu ion (Fig.1B), wi h
a mean and a median size o 326.2 and 127 o 2384 and
778 bp o NUPTs-I and NUPTs-II, espec i ely.
F om s udies in ice and o he plan species, i had
been sugges ed an appa en posi i e co ela ion be ween
size and sequence iden i y o NUPTs, i.e., la ge NUPTs
end o be mo e conse ed a he sequence le el. This
obse a ion can be in e p e ed as young, la ge con-
se ed NUPTs declining and agmen ing o e ime,
and e en ually being pu ged om he genome [7, 15–17,
20, 28]. To es whe he his obse a ion also applied o
mo inga NUPTs, we s udied he co ela ion be ween size
and sequence iden i y by means o wo di e en es s
app op ia e o no -no mally dis ibu ed da a, again pa -
i ioned by e e y episode de ec ed (Fig.1C and Table2).
In e es ingly, while o younge NUPTs om episode II
size nega i ely co ela ed wi h sequence iden i y in bo h
es s (Table2), no signi ican co ela ion was ound o
Table 1 Summa y o he mo inga nuclea genome e sions used in his s udy
The ac ion o plas id DNA de ec ed in each e sion, be o e and a e co ec ing o edundan NUPTs, is also indica ed
Nuclea genome e sion To al ac ion o plas id DNA (%) To al ac ion o plas id DNA a e emo ing
edundan NUPTs (%) Re e ence
AOCC 2 4.14 3.29 [23]
Shyamli, e al., 2021 [27] 4.73 3.81 [27]
AOCC 1 4.25 3.28 [26]
Tian, e al., 2015 [25] 4.19 3.12 [25]
Page 4 o 11
Ma czuk‑Rojase al. BMC Genomics (2024) 25:60
NUPTs-I (Table 2), sugges ing di e en mechanisms
migh ha e been a he o igin o NUPTs om e e y epi-
sode and / o , once in eg a ed, hey migh also ha e ol-
lowed di e en e olu iona y ajec o ies.
To p o ide u he suppo o he accu acy o he
ob ained esul s and disca d hei o igin h ough genome
assembly e o s, we epea ed all he analysis using he
h ee p e iously published e sions o he mo inga
nuclea genome assembly a ailable [25–27]. In each case,
when i ing Gaussian mix u e models o each dis ibu-
ion o pe cen iden i ies, he wo main peaks could be
Fig. 1 Modeling he dis ibu ion o pe cen iden i y and size o mo inga NUPTs. A his og am o he dis ibu ion o NUPTs pe cen iden i y alues.
The wo densi y plo s esul ing om i ing Gaussian mix u e models, pu a i ely co esponding o dis inc e en s o NUPTs´ o ma ion (I and II), a e
shown. B His og am o he dis ibu ion o NUPTs size alues pa i ioned by o ma ion e en . C sca e plo o pe cen iden i y e sus sizes o mo inga
NUPTs pa i ioned by o ma ion e en . Fo an easie isualiza ion, NUPT size alues ha e been log 10‑ ans o med
Table 2 Co ela ion analysis be ween NUPTs’ sequence iden i y
and size by NUPTs’ o ma ion e en
Me hod Co ela ion
coe icien
PEpisode
Kendall’s ank co ela ion au −0.05 0.06 I
Spea man’s ank co ela ion ho −0.03 0.42 I
Kendall’s ank co ela ion au −0.05 6.72 × 10−7 II
Spea man’s ank co ela ion ho − 0.07 6.67 × 10−7 II
Page 5 o 11
Ma czuk‑Rojase al. BMC Genomics (2024) 25:60
simila ly e ie ed (Supplemen al Fig. S1 and Supple-
men al Table S2). Nega i e co ela ions be ween size
and sequence iden i y we e also simila ly e ie ed o
NUPTs-II (Supplemen al Table3), while no signi ican
o only ma ginally signi ican posi i e co ela ion was
ound o NUPTs-I.
We ound 61 NUPTs, 51 o hem no edundan , span-
ning a o al o 14,177 bp, showing 100% iden i y wi h he
chlo oplas genome. These NUPTs migh no ep esen
a eal biological phenomenon bu be he esul o a mis-
assembly ha e oneously inco po a ed plas id egions
in o he nuclea genome sequence. In o de o disca d
his possibili y, we sampled he sequences om six ep-
esen a i e NUPTs showing 100% iden i y and a ious
sizes plus 100 bp o hei lanking egions in he nuclea
genome and scanned o hei occu ence in he h ee
addi ional e sions o he mo inga nuclea genome a ail-
able. As e ealed by he co esponding mul iple sequence
alignmen s, he six NUPTs plus lanking egions selec ed
could be iden ically e ie ed in a leas one o he
emaining h ee genome e sions (Addi ional iles2, 3, 4,
5, 6and7), u he alida ing ou indings.
Cha ac e iza ion o  hedi e en ial dis ibu ion o NUPTs´
inse ion si es in hemo inga nuclea genome
The dis ibu ion and equency o NUPTs ac oss he 14
ch omosomes con o ming he mo inga nuclea genome
was ep esen ed in a Ci cos plo as independen densi y
plo s o e e y episode (Fig.2). In con as o NUPTs-I,
which showed an appa en homogenous dis ibu ion
h oughou he mo inga nuclea genome, mos NUPTs-
II appea ed o be highly concen a ed in some speci ic
egions o ch omosomes one, ou , i e, six and 10, which
showed p ominen peaks in he densi y plo s, likely co -
esponding o ho spo s whe e NUPTs in eg a ion and /
o subsequen ixa ion is a o ed (Fig.2).
A ecen su ey in A ican and Asian ice epo ed a
composi ional bias a he lanking egions o NUPTs’
Fig. 2 Ci cos plo ep esen a ion o NUPTs in he mo inga nuclea genome. Nuclea and chlo oplas ch omosomes a e ep esen ed as g ey
and g een illed blocks, espec i ely, o ming a ci cum e ence. Resul s a e shown o he 14 nuclea ch omosomes, hos ing 4812 NUPTs (92.49%
o he o al numbe ) spanning 8,928,478 bp (91.28% o he o al leng h). The block co esponding o he chlo oplas genome is loca ed a 12
o’clock, and he 14 nuclea ch omosomes a e a anged clockwise. Nuclea ch omosomes a e d awn o scale, wi h leng hs p opo ional o size
and exp essed in Mb, while he chlo oplas genome has been upscaled o occupy a qua e o he image ci cum e ence; i s size uni was se
o 10,000 bp. Line plo s ep esen ing he espec i e densi y dis ibu ions o NUPTs‑I ( ed) and NUPTs‑II (blue) a e displayed. Windows o 500,000
and 100 bp we e selec ed o he nuclea and chlo oplas ch omosomes, espec i ely. Local BLASTN sequence alignmen s be ween he chlo oplas
and he nuclea genome co esponding o indi idual NUPTs a e ep esen ed as ibbons. Ribbons a e colo ed acco ding o he pe cen age
o sequence iden i y o he local alignmen s (NUPTs) g ouped by qua iles (wi h yellow, ligh o ange, o ange, and ed co esponding o he i s ,
second, hi d and ou h qua iles, espec i ely). LSC, La ge Single Copy; IRA, In e ed Repea A; IRB, In e ed Repea B; SSC, Small Single Copy

Page 6 o 11
Ma czuk‑Rojase al. BMC Genomics (2024) 25:60
inse ion si es [22]. Simila ly, we examined whe he he
100 bp egions lanking egions o NUPTs in mo inga
also showed any composi ional bias. While he 100 bp
lanking egions o NUPTs-I we e ea u ed by a g ea e
GC con en on a e age (36.4%) han he es o he
genome a e excluding NUPT sequences (35.72%), he
opposi e end was obse ed o NUPTs-II, which dis-
played a lowe GC con en on a e age (32.3%) wi h di -
e ences being signi ican acco ding o Mann-Whi ney
U- es s (P = 2.07 × 10−14; P = 2.99 × 10−103, espec i ely).
Mo eo e , p e ious analysis on NUPTs om A abi-
dopsis and ice iden i ied hei endency o g oup in
clus e s, de ined as a g oup o wo o mo e non-o e -
lapping NUPTs whe e he dis ance be ween wo con-
secu i e in eg an s was less han 5 kb [7]. We ied o
de e mine whe he NUPTs in mo inga we e also o m-
ing clus e s. 880 NUPTs (16.91% o he o al) summing
up a o al o 1,232,888 bp (12.6% o he o al) we e
ound g ouping in o 282 clus e s, which we e de ec ed
in he 14 ch omosomes plus nine sca olds, and whose
sizes anged om 122 o 46,929 bp (Supplemen al Table
S4).
Then we examined sepa a ely clus e s g ouping NUPTs
om e e y episode. 56 NUPTs-I (i.e., 7.22%) summing
up a o al o 18,145 bp (i.e., 7.17%) we e ound o ming
24 clus e s which hos ed up o i e in eg an s (Fig.3)
(Supplemen al Table S4), whe eas 476 NUPTs-II (i.e.,
12.35%) summing up a o al o 976,761 bp (i.e., 10.63%)
we e ound inside 150 clus e s which hos ed up o 11
in eg an s (Fig.3) (Supplemen al Table S4). The es o
he clus e s (108) hos ed 380 NUPTs om ei he one o
bo h episodes and / o unclassi ied NUPTs (Supplemen-
al Table S4).
We u he checked whe he he o de ing o NUPTs
wi hin indi idual clus e s we e a anged collinea ly wi h
espec o he chlo oplas genome o we e a he shu led
in some way. Fo his pu pose, we g aphically ep esen ed
he en la ges clus e s in e ms o numbe o in eg an s
om e e y episode and he co esponding dono egions
in he chlo oplas genome (Fig.4). While clus e s o med
by NUPTs-I showed a endency o be a anged collinea ly
wi h he chlo oplas genome (Fig.4A), no such collinea -
i y could be obse ed o clus e s o NUPTs-II (Fig.4B).
The g ouping in o clus e s o NUPTs a speci ic posi-
ions migh be e lec ing ei he la ge NUPTs agmen ing
o e ime a e hei in eg a ion in o he nuclea genome
o ch omosomal ho spo s. I he o me we e he case,
he sequence iden i y o NUPTs should co ela e wi h
hei endency o g oup in o clus e s. To es his hypo h-
esis, we examined he co ela ion be ween he a e age
sequence iden i y o he NUPTs in e e y clus e and he
numbe o in eg an s. The es s we e pe o med inde-
penden ly on clus e s o med exclusi ely by NUPTs-I
and NUPTs-II. No signi ican co ela ion was ound o
NUPTs om ei he episode (Supplemen al Table S5).
Biased dis ibu ion o NUPTs‑I in hemo inga chlo oplas
genome
Finally, we s udied he dis ibu ion o NUPTs ac oss
he mo inga chlo oplas genome. Fo his pu pose, we
di ided he co esponding DNA sequence in o 100 bp
egions and ep esen ed he equency o occu ence o
NUPTs as densi y plo s (Fig.2). We pe o med he analy-
sis conside ing sepa a ely NUPTs-I and NUPTs-II. F om
he densi y plo s o NUPTs-I, ou peaks we e appa -
en , which accoun ed o 354 NUPTs-I, i.e., 45.61% o
Fig. 3 Dis ibu ion o he numbe o in eg an s o NUPT clus e s
Page 7 o 11
Ma czuk‑Rojase al. BMC Genomics (2024) 25:60
he o al. Two o he peaks, designa ed 1 and 2 (Fig.2),
spanned 200 bp each and we e loca ed in almos con-
secu i e egions o he La ge Single Copy (LSC) egion o
he chlo oplas genome. The emaining wo, designa ed
3 and 4 (Fig.2), we e o 3800–3900 bp in size and co -
esponded o edundan sequences om he IR egions
o he chlo oplas genome. In con as , NUPTs-II we e
ound o be almos uni o mly dis ibu ed ac oss he
chlo oplas genome, excep o he IR egions, whe e, as
expec ed, a ound wice he numbe o NUPTS-II could
be obse ed (Fig.2).
Discussion
By le e aging a ecen ly ob ained high-quali y long- ead
ch omosome-scale assembly o he nuclea genome o
mo inga (i.e., AOCC 2) [23], we gained a ine cha -
ac e iza ion o he ich ac ion o plas id DNA o igi-
nally de ec ed in an olde , less con iguous, e sion (i.e.,
AOCC 1) [26], he highes epo ed o any plan species
so a [19]. While he o al ac ion o plas id DNA was
simila using bo h e sions o he genome, di e ences
we e obse ed ega ding he e en s unde lying such
en ichmen . Ou p e ious epo [19], using he dis i-
bu ion o synonymous subs i u ions a es as a p oxy o
e olu iona y ime, a ibu ed such en ichmen in plas id
DNA o a ecen single bu s o plas id gene duplica es
eloca ing o he mo inga nuclea genome. He e, in u n,
by i ing Gaussian mix u e models o he dis ibu ions
o sequence iden i y o NUPTs ( aken ins ead as a p oxy
o e olu iona y ime), wo dis inc main episodic e en s
o NUPTs’ o ma ion could be de ec ed, namely NUPTs-
I and NUPTs-II. The eason o his disc epancy likely
esides in e o s in he anno a ion o he AOCC 1 mo -
inga nuclea genome, ea u ed by an o e ep esen a ion
o small genes anno a ed wi h chlo oplas and pho osyn-
he ic unc ions. While 656 and 114 genes we e anno-
a ed wi h he e ms “chlo oplas ” o “pho osyn hesis”,
espec i ely, in he AOCC 1 mo inga genome, only 378
and 51 genes we e anno a ed wi h such e ms in AOCC 2
[23]. Fo example, while 45 agmen ed nuclea genes
we e anno a ed as encoding o he plas id-encoded la ge
subuni o ibulose-1,5-bisphospha e ca boxylase/oxyge-
nase (RBCL) in AOCC 1, only h ee we e anno a ed as
such in AOCC 2, al hough all o hem could be mapped
o speci ic genomic egions in AOCC 2. Al oge he sug-
ges s he p e ious en ichmen in chlo oplas ela ed
unc ions obse ed among nuclea genes was likely
due o agmen ed DNA o plas id o igin, i.e., NUPTs,
encompassing coding egions, w ongly anno a ed as gene
coding models.
Hi he o, ela i e ages o NUPTs’ o ma ion in di e en
plan species had been epo ed o be ea u ed by ei he
exponen ially dec easing o uni o mly cons an dis ibu-
ions [15, 17, 18], which i , espec i ely, in o wo di e -
en modes o NUPTs´ o ma ion, i.e., single e en s and
ho spo s [7, 28]. The single e en mode commonly esul s
Fig. 4 G aphical ep esen a ion o he en la ges clus e s o NUPTs o e e y episode in e ms o numbe o in eg an s and he co esponding
dono egions in he chlo oplas genome (A) NUPTs‑I. B NUPTs‑II. Fo e e y clus e , dono egions in he chlo oplas genome a e shown as g een
blocks, while NUPTs‑I and NUPTs‑II a e depic ed as ed and blue blocks, espec i ely. Fo e e y NUPT, he co esponding BLASTN sequence
alignmen be ween he chlo oplas and he nuclea genome is ep esen ed as a ibbon. Ribbons a e colo ed acco ding o he pe cen age
o sequence iden i y o he unde lying alignmen g ouped by qua iles (wi h yellow, ligh o ange, o ange, and ed co esponding o he i s ,
second, hi d and ou h qua iles, espec i ely). The di e en elemen s in he diag am a e d awn o scale, wi h he chlo oplas genome and i s ou
canonical egions (LSC, La ge Single Copy; IRA, In e ed Repea A; IRB, In e ed Repea B; SSC, Small Single Copy) displayed on op as a e e ence
o size
Page 8 o 11
Ma czuk‑Rojase al. BMC Genomics (2024) 25:60
in long con inuous NUPTs collinea wi h speci ic egions
o he chlo oplas genome, which a e concen a ed in
speci ic egions o he nuclea genome, e.g., (pe i)cen o-
me ic egions [7, 15, 16, 28], and a e expec ed o decay
in o smalle agmen s and eloca e as a consequence o
ch omosomal ea angemen s and eshu ling in ol ing
ansposable elemen ac i i y [16]. In con as , ho spo s
esul in he concomi an in eg a ion o mul iple sho
NUPTs om di e en o igins a anged as a mosaic in
speci ic loci o he nuclea genome [28, 29].
To he bes o ou knowledge, no p e ious s udies ha e
epo ed he bimodal dis ibu ion o NUPT ela i e ages
obse ed he e o mo inga. The obse ed bimodal dis i-
bu ion implies NUPTs in mo inga we e o med h ough
wo e en s sepa a ed in ime. Fu he mo e, NUPTs om
e e y e en showed ma kedly dis inc i e ea u es, sug-
ges ing hey o igina ed h ough dis inc mechanisms.
Fo example, acco ding o he ela i e dis ibu ion o
sizes, younge NUPTs om episode II showed seemingly
andom o igins h oughou he chlo oplas genome and
we e ea u ed by a wide ange o sizes, hei p e e en-
ial loca ion in ho spo s ac oss he nuclea genome and
nega i e co ela ion be ween sequence iden i y and size.
Howe e , al hough some NUPTs-II may ha e o igina ed
as long agmen s subsequen ly b eaking in o smalle
pieces a anged collinea ly as clus e s h oughou he
nuclea genome, in acco dance wi h he single e en
mode [28], no co ela ion was obse ed be ween he
numbe o NUPTs-II g ouping in clus e s and sequence
iden i y. This lack o co ela ion sugges s a leas some
NUPTs-II may ha e also o igina ed as smalle agmen s
landing in speci ic landma ks o he nuclea genome,
i.e., ch omosomal ho spo s, e en ually u he dispe s-
ing ough di e en kinds o genome ea angemen s.
This was also in ag eemen wi h he obse a ion ha
NUPTs-II g ouped in clus e s ended o be ound shu -
led in some way a he han a anged collinea ly wi h
he chlo oplas genome. Al oge he suppo s he o igin
o NUPTs-II h ough bo h single e en s and ho spo s
modes o o igin.
In u n, olde NUPTs om episode I, ea u ed by a
na owe dis ibu ion o sizes, no co ela ion be ween
sequence iden i y and size and a endency o be a anged
colinea ly wi h he chlo oplas genome when ound
g ouped in clus e s, do no seem o i in o any o he
wo modes o NUPTs’ o ma ion p e iously desc ibed.
Mo eo e , almos hal o he NUPTs om episode I
o igina ed om ou speci ic egions in he chlo oplas
genome, an obse a ion only epo ed p e iously o
Aspa agus o icialis [20] and in con as o p e ious s ud-
ies in A abidopsis, ice and o he species, which showed
a homogenous dis ibu ion o NUPTs h oughou he
chlo oplas genome [15, 17]. We he e o e p opose he e
a hi d mode o NUPTs’ o ma ion h ough small-scale
ecu en e en s. Once indi idual NUPTs a e o med,
wo scena ios a e plausible i) mul iple copies o NUPTs
i s ly o ming in he chlo oplas and la e eloca ing o
he nucleus, o ii) indi idual NUPTs ecu en ly duplica -
ing once in eg a ed in o he nuclea genome.
In espec o he possible e olu iona y o ces unde -
lying he leakages and subsequen ixa ion o a iable
amoun s o plas id DNA in plan nuclea genomes, hese
migh be ela ed o he di e en s ess ul condi ions o
which e e y species would ha e been subjec ed h ough-
ou hei ecen e olu iona y his o y; di e en s esses
ha e been shown o p omo e DNA mig a ion om chlo-
oplas s o he nucleus [10, 30]. The massi e amoun s o
plas id DNA ound in he mo inga nuclea genome migh
be well ela ed o he exposu e o s ess ul condi ions
du ing i s ecen e olu iona y his o y [31, 32]. Indeed,
domes ica ion o mo inga om he sub-Himalayan low-
lands in NW India, i s pu a i e loca ion o o igin whe e
mean annual p ecipi a ions exceed 1100 mm, o opical
and sub- opical a eas a ound he wo ld whe e i s cul u e
has sp ead [31] likely in ol ed he selec ion o a ie ies
be e adap ed o d ie and ho e en i onmen s [32, 33].
Fu he mo e, mo inga shows a g ea adap i e po en ial
o success ully cope wi h mul iple s esses, pa icula ly
wa e de ici and UVB adia ion [34]. A his espec , i
has been no ed ha he 11 gian NUPTs ound in Asian
ice ended o dis ibu e in na u al popula ions om
highe la i ude egions ea u ed by lowe empe a-
u es and ligh in ensi ies [22]. This obse a ion led he
au ho s o a ibu e NUPTs a po en ial ole in enhancing
en i onmen al adap a ion by inc easing he numbe o
chlo oplas -de i ed genes which migh , in u n, imp o e
pho osyn hesis [22]. Howe e , we belie e his adap i e-
o-s ess hypo hesis seems unlikely gi en ha “ ecen ”
plas id- o-nuclea gene ans e s a e exceedingly a e,
especially o pho osyn he ic genes, wi h he genes mos
equen ly ans e ed in ex an lineages being ibosomal
p o eins [35]. Wha e e he speci ic o ces ha a e a he
o igin o he ixa ion o he massi e amoun s o plas id
DNA ound in he mo inga nuclea genome, hey appea
o be o a di e en na u e o e e y independen e en o
NUPTs o ma ion de ec ed he e.
Conclusions
Resul s p esen ed he e e eal an unan icipa ed complex-
i y o he mechanisms a he o igin o NUPTs and o he
e olu iona y o ces behind hei ixa ion. Compa a i e
genomics o domes ica ed mo inga oge he wi h ha o
he 12 wild Mo inga species ha make up he axonomic
amily Mo ingaceae wi hin he B assicales o de [36],
eme ges as an excellen model o econs uc ing he
Page 9 o 11
Ma czuk‑Rojase al. BMC Genomics (2024) 25:60
mechanisms o o igin and e olu iona y ixa ion o plas id
DNA in he nuclea genome.
Me hods
De ec ion andanalysis o plas id DNA in henuclea
genome
NUPTs in he published e sions o he mo inga nuclea
genome [25–27] we e de ec ed using he BLASTN
local alignmen ool om he BLAST+ p og am pack-
age 2.12.0+ [37]. The chlo oplas genome sequence o
mo inga [24] (Table1) was used as que y and he pub-
lished e sions o i s nuclea genome sequence (Table1)
as da abases. The pa ame e s we e as ollows: -e alue
1e-5 -wo d_size 9 -penal y − 2 -show_gis
-dus no -num_ h eads 8. In o de o deal wi h
low complexi y egions pu a i ely p esen in he chlo o-
plas genome ha migh esul in spu ious alignmen s
w ongly de ec ed as homologous egions, he analyses
we e epea ed by u ning on he -dus se ing (−dus
yes). Resul s in e ms o sequence iden i y and densi y
o NUPTs we e ep esen ed as ci cula plo s, cons uc ed
using Ci cos e sion 0.69–8 [38]. In o de o co ec o
edundancy o NUPTs esul ing om he IR egion o he
chlo oplas genome, BLASTN hi s in ol ing IR egions
we e coun ed only once.
In o de o de ec NUPTs showing 100% iden i y wi h
he chlo oplas genome plus hei 100 bp lanking egions
in he p e ious published e sions o he mo inga nuclea
genome, BLASTN alignmen s we e i s ly pe o med
using he whole se o 100% iden i y NUPTs as que y and
he genome sequence o each e sion as da abase. NUPTs
and hei bes sco ing hi s de ec ed in each e sion o
he genome we e hen aligned using he MUSCLE algo-
i hm [39] h ough he SeaView 5.0.5 p og am [40]. The
esul ing mul iple sequence alignmen s we e edi ed using
GeneDoc 2.7 [41].
In o de o examine whe he NUPTs in clus e s we e
a anged collinea ly wi h he dono egions o he chlo-
oplas genome o shu led in some way, he co espond-
ing BLASTN alignmen s we e isualized h ough he R
genoPlo R 0.8.11 package [42].
Gaussian mix u e modeling o NUPTs’ pe cen iden i y
dis ibu ion
In o de o de ec peaks in he dis ibu ion o pe cen
iden i y alues pu a i ely co esponding o episodic
e en s o NUPTs in eg a ion in he nuclea genome,
Gaussian mix u e models we e i ed o he co e-
sponding dis ibu ion by employing he Expec a ion-
Maximiza ion (EM) algo i hm o mix u es o no mal
dis ibu ions. We i s de e mined he op imal numbe
o Gaussian componen s (k) using he boo .comp()
unc ion om he R mix ools 1.2 package [43], which
pe o ms a pa ame ic boo s ap by p oducing B boo -
s ap ealiza ions ( eplica es) o he likelihood a io s a-
is ic o es ing he null hypo hesis o a k-componen
i e sus he al e na i e hypo hesis o a (k + 1)-com-
ponen i o a ious mix u e models. Fo his s ep, we
used 1000 eplica es, a signi icance le el o 0.01, and
se he maximum numbe o componen s o nine. The
numbe o componen s de e mined in he p e ious s ep
was hen used o i a mix u e o Gaussian models o he
dis ibu ion o pe cen iden i y alues, u ilizing he no -
malmixEM() unc ion om he same package and he
ollowing pa ame e s: maxi = 1e-30, max es a s = 1e− 3,
epsilon = 1e− 10. Each peak was cha ac e ized by an age
(exp essed in pe cen iden i y alues) ha co esponded
o he mean o he Gaussian mix u e componen . Se -
e al o he pa ame e s we e es ima ed om each o he
models, including he s anda d de ia ion o each compo-
nen , as well as he mixing p obabili ies o each NUPT o
belonging o each e ie ed peak.
Abb e ia ions
NUPT nuclea plas id DNA sequence
DSB double‑s anded b eak
NHEJ non‑homologous end joining
SSA single s and annealing
Supplemen a y In o ma ion
The online e sion con ains supplemen a y ma e ial a ailable a h ps:// doi.
o g/ 10. 1186/ s12864‑ 024‑ 09979‑5.
Addi ional ile1.
Addi ional ile2.
Addi ional ile3.
Addi ional ile4.
Addi ional ile5.
Addi ional ile6.
Addi ional ile7.
Supplemen al Tables.
Supplemen al Figu es.
Acknowledgemen s
No applicable.
Au ho s’ con ibu ions
LC‑P concei ed and designed he p ojec and all esea ch ac i i ies. JPM‑R
pe o med all he analyses, wi h con ibu ions om AMA‑S. AS con ibu ed
o he s a is ical analysis implemen ed in he pape . VI and AA con ibu ed
o coding sc ip s used in he pape and p o ided compu a ional suppo .
All au ho s con ibu ed o da a analysis and in e p e a ion. LC‑P w o e and
edi ed he manusc ip wi h subs an ial con ibu ions om JPM‑R. All au ho s
e iewed he manusc ip .
Funding
This wo k was suppo ed by a “P oyec os I+D Gene ación de Conocimien o”
g an om he Spanish Minis y o Science and Inno a ion (g an code:
PID2020‑113277GB‑I00) o LC‑P, and by unds ecei ed by he “Sis ema de
In o mación Cien í ica de Andalucía” Resea ch G oup id BIO359 o LC‑P.