O iginal Resea ch A icle
Visible, nea -in a ed, and sho wa e-in a ed spec a as an inpu
a iable o digi al mapping o soil o ganic ca bon
Vahid Khos a i
a
,
*
, Asa Gholizadeh
a
, Radka Kode
so
a
a
, P ince Chapman Agyeman
a
,
Mohammadmehdi Sabe ioon
b
, Lubo
s Bo
u ka
a
a
Depa men o Soil Science and Soil P o ec ion, Facul y o Ag obiology, Food and Na u al Resou ces, Czech Uni e si y o Li e Sciences P ague, Kamycka 129,
Suchdol, P ague, 16500, Czech Republic
b
Helmhol z Cen e Po sdam GFZ Ge man Resea ch Cen e o Geosciences, Sec ion 1.4 Remo e Sensing and Geoin o ma ics, Teleg a enbe g, Po sdam, 14473,
Ge many
a icle in o
A icle his o y:
Recei ed 27 Ap il 2023
Recei ed in e ised o m
2 Oc obe 2024
Accep ed 4 Oc obe 2024
A ailable online xxx
Keywo ds:
SOC modeling and mapping
In e pola ed spec a
Machine lea ning
Reg ession k iging
Unce ain y
abs ac
This s udy p oposes a no el me hodology o employ disc e e poin spec a as inpu a iable o digi al
mapping o soil o ganic ca bon (SOC). Acco dingly, wo SOC modeling app oaches we e used in h ee
ag icul u al si es in Czech Republic: i) machine lea ning (ML) including pa ial leas squa es eg ession
(PLSR), cubis , andom o es (RF), and suppo ec o eg ession (SVR), and ii) eg ession k iging (RK) by
he combina ion o o dina y k iging (OK) and PLSR (PLSR-K), cubis (cubis -K), RF (RF-K), and SVR (SVR-
K). Models we e de eloped on en i onmen al p edic o co a ia es (EPCs) and hi y gene ic algo i hms
(GA)-selec ed isible, nea -in a ed, and sho wa e-in a ed (VNIReSWIR) wa eleng hs spec a, indi-
idually and combined. Thi y as e s we e hen c ea ed using in e pola ion o he selec ed spec a and
se ed as he inpu a iables ewi h and wi hou EPCs e o es and compa e he de eloped models and
SOC p edic i e maps wi h each o he and wi h hose e ie ed om he hi d app oach: iii) k iging using
OK o he measu ed and ML-p edic ed SOC. The impac o employing selec ed wa eleng hs’spec a and
EPCs on models' pe o mance was in es iga ed using independen es samples and he unce ain y
associa ed wi h he p oduced maps. Using in e pola ed spec a as he only inpu a iable yielded a
ela i ely accep able accu acy (No
a Ves: RMSE ¼0.19%, Úd nice: RMSE ¼0.12%, Klu
co : RMSE ¼0.13%).
In compa ison, he in e pola ed spec a coupled wi h EPCs enhanced he esul s. Rega ding he unce -
ain y, howe e , he ML-based SOC maps we e mo e eliable, han RK-based ones. Fu he mo e, maps
p oduced using bo h spec a and EPCs showed less unce ain y han hose cons uc ed on he indi idual
da ase s.
©2024 In e na ional Resea ch and T aining Cen e on E osion and Sedimen a ion, China Wa e and
Powe P ess, and China Ins i u e o Wa e Resou ces and Hyd opowe Resea ch. Publishing se ices by
Else ie B.V. on behal o KeAi Communica ions Co. L d. This is an open access a icle unde he CC BY-
NC-ND license (h p://c ea i ecommons.o g/licenses/by-nc-nd/4.0/).
1. In oduc ion
Soil o ganic ca bon (SOC) is a dynamic p ope y ha plays a
c ucial ole in e ili y o he ag icul u e and o es ecosys ems
(Bhunia e al., 2019). Va ious soil- ela ed p ocesses and se ices,
including ood p oduc ion and clima e change mi iga ion, a e being
a ec ed by his soil p ope y (Sza m
a i e al., 2021). I is, hence,
impe a i e o con inuously moni o and map SOC ac oss he soil
landscape. Con en ional labo a o y-based chemical measu emen
me hods a e expensi e, labo ious, and ime-consuming, needing
ex a chemicals ha migh be o en i onmen al conce n. The
adi ional SOC polygon maps a e also expensi e and ime-
demanding o p oduce, di ficul o upda e, and wi hou su ficien
spa ial esolu ion (Mahmoudzadeh e al., 2020). This has spa ked
an inc easing in e es in indi ec p edic ion and digi al mapping o
he SOC con en eas a a ge a iableeusing sepa a ed and hyb id
implemen a ion o non-geospa ial echniques such as simple and
mul iple linea eg ession and machine lea ning (ML) app oaches,
and geospa ial echniques such as geos a is ical k iging.
As a subg oup o a ificial in elligence, ML has ecen ly been
unde significan a en ion in soil sciences (Pada ian e al., 2020) o
quan i y he ela ionship be ween soil p ope ies, poin , and
*Co esponding au ho .
E-mail add ess: [email p o ec ed] (V. Khos a i).
Con en s lis s a ailable a ScienceDi ec
In e na ional Soil and Wa e Conse a ion Resea ch
jou nal homepage: www.else ie .com/loca e/iswc
h ps://doi.o g/10.1016/j.iswc .2024.10.002
2095-6339/©2024 In e na ional Resea ch and T aining Cen e on E osion and Sedimen a ion, China Wa e and Powe P ess, and China Ins i u e o Wa e Resou ces and Hyd opowe Resea ch.
Publishing se ices by Else ie B.V. on behal o KeAi Communica ions Co. L d. This is an open access a icle unde he CC BY-NC-ND license (h p://c ea i ecommons.o g/licenses/by-nc-nd/4.0/).
In e na ional Soil and Wa e Conse a ion Resea ch xxx (xxxx) xxx
Please ci e his a icle as: V. Khos a i, A. Gholizadeh, R. Kode
so
ae al., Visible, nea -in a ed, and sho wa e-in a ed spec a as an inpu
a iable o digi al mapping o soil o ganic ca bon, In e na ional Soil and Wa e Conse a ion Resea ch, h ps://doi.o g/10.1016/
j.iswc .2024.10.002
imaging spec oscopy, and wea he and clima e da a. ML algo-
i hms can deal wi h hidden pa e ns wi hin high dimensional
complex da ase s as well as non-linea dependencies be ween
a iables. Mo eo e , geos a is ical me hods, such as k iging, a e
p incipally used o quan i y changes in soil p ope ies o e a ious
dis ances mos ly by means o a semi- a iog am as hei powe ul
cen al ool (Oli e , 1987). Du ing he modeling p ocess, ML ech-
niques accoun o he de e minis ic pa o he o al a ia ions and
spa ial co ela ion is dis ega ded, while k iging un a els he
spa ially dependen s ochas ic pa and independen use o each
model will cause losing one o he pa (Keskin &G unwald, 2018).
To a oid his p oblem, a numbe o hyb id echniques ha e been
de eloped, including uni e sal k iging (Bu gess &Webs e , 1980),
k iging wi h ex e nal d i (Goo ae s,1997), and eg ession k iging
(RK), which is also called “k iging combined wi h eg ession”
(Kno e s e al., 1995). K iging wi h ex e nal d i and uni e sal
k iging sha e he same o mula ion, ha es ima ion o end and
esiduals is done in a single sys em, whe e he p edic ion a iance
is also de e mined. RK, on he o he hand, in ol es k iging o he
esidual alues a e eg ession is applied o de i e a a ge a iable
and can be independen o he k iging (Minasny &McB a ney,
2007).
One main impo an d awback o k iging is ha i p oduces one
unce ain alue o each loca ion while canno add ess he eal
unce ain y. This is mainly due o he smoo hing e ec , making he
model's final ou pu s un eliable o make p ope decisions. Simu-
la ion echniques can deal wi h hese p oblems by gene a ing many
in e pola ed su aces, all able o ep oduce he samples spa ial
ea u es (Gius ini e al., 2019), c ea ing be e eali y ep esen a-
ions while emo ing he smoo hing e ec s by ocusing on ep o-
duc ion o he semi- a iog am models o global s a is ics
(Goo ae s, 1997). Acco dingly, simula ion echniques a e inc eas-
ingly p e e ed o e k iging. In he case o no mally dis ibu ed
con inuous da a ypes, he gaussian geos a is ical simula ion is
usually mo e popula . The simula ion is conside ed as “condi-
ional”, i he simula ed alues hono he obse ed alues o known
sample poin s a hei loca ions, i.e., a e condi ional on he
obse ed a iables. As an ex ensi ely used ype o condi ional
simula ion, sequen ial gaussian simula ion (SGS) employs he
p e iously simula ed alues o simula ion o he successi e g id
poin s (Goo ae s, 2001).
Acco ding o he SCORPAN unc ion defined by McB a ney e al.
(2003), soil maps a e dependen on ac o s o soil (S), clima e (C),
o ganisms (O), elie (R), pa en ma e ial (P), age (A), and space (N).
Mos s udies ha e used all o some o hese ac o s o p edic and
map SOC (Minasny e al., 2013). The conside able p og ess in
p oximal sensing appa a us has made he measu emen o soil
ac o s easie and as e , leading o mo e e ficien p edic ion and
digi al mapping o soil p ope ies. Po able X- ay fluo escence
(Kebonye e al., 2021;Mukhopadhyay e al., 2020;Yan e al., 2023),
gamma- ay adia ion (Wang e al., 2024;Zhang e al., 2020), and
isible, nea -in a ed, and sho wa e-in a ed (VNIReSWIR) spec-
oscopy (Ben-Do e al., 2022;B odský e al., 2013) a e among he
mos popula p oximal sensing da a employed o measu e soil
ac o s. Al hough, because hese ypes o da a a e discon inuous
poin measu emen s, di e en scena ios ha e been implemen ed
o inco po a e hem in o digi al soil mapping.
Some s udies ha e implemen ed poin measu emen s as co a-
ia es in co-k iging o he desi ed soil p ope ies (Kim e al., 2019).
O he s udies ha e used hem indi ec ly o calib a e ML o RK-
based p edic ion models and hen employed (geo)s a is ical ech-
niques o in e pola e he models' ou pu and ob ain he a ge
pa ame e a unsampled loca ions (Chak abo y e al., 2017;
Kebonye e al., 2021). Ben-Do &Banin (1995) con ol ed labo a o y
spec a o soil samples in o Landsa hema ic mappe bands
(excluding he he mal bands) and ound a ough bu posi i e
co ela ion be ween he p oximal and emo e sensing da a, mainly
due o high signal- o-noise a io o he mul ispec al senso . This
me hodology was employed by de Sousa Mendes e al. (2021) o
c ea e a new en i onmen al a iable, he Bes Syn he ic Soil Image,
o enhance he p edic i e powe o some soil a ibu es a h ee
di e en dep hs. Poin measu emen s, howe e , ha e no di ec ly
been used as inpu a iables in o digi al soil mapping models o
ob ain he spa ial a iabili y o he a ge a iable. This is mainly
due o hei disc e e na u e and a la ge numbe o con aining a -
iables eespecially in he case o soil spec a.
This s udy p oposes a no el solu ion o fill he abo e-men ioned
lacuna by p oducing as e s ia o dina y k iging (OK) o he gene ic
algo i hm (GA)-selec ed wa eleng hs o he measu ed spec a and
employing hem as inpu p edic o a iables alone and coupled
wi h en i onmen al p edic o co a ia es (EPCs), o p edic and map
SOC in unsampled a eas (fi s objec i e). To achie e his, h ee
di e en app oaches we e used: i) ML app oach; o de elop models
based on ou ML algo i hms (i.e., pa ial leas squa es eg ession
(PLSR), cubis , andom o es (RF), and suppo ec o eg ession
(SVR)) on he calib a ion da ase con aining he selec ed spec a
and EPCs (sepa a ely and in combina ion), ii) RK app oach; o
de elop PLSR-K, cubis -K, RF-K, and SVR-K models on he calib a-
ion da ase con aining he selec ed spec a and EPCs (sepa a ely
and in combina ion), and iii) K iging app oach; o de elop OK
models on he calib a ion da ase con aining he measu ed and ML
p edic ed SOC. Resul s ob ained by sepa a ed and combined
implemen a ion o he spec a and EPCs we e hen compa ed o
each o he eou second Objec i e. Model inpu s a e he main
sou ces o unce ain y (A ouays e al., 2014), especially when hese
inpu s a e as e s p oduced by in e pola ion o poin measu e-
men s. Hence, he SGS echnique was used o quan i y and map he
unce ain y associa ed wi h he in e pola ed spec a as inpu o he
SOC mapping models.
2. Ma e ials and me hods
2.1. Si e desc ip ion, soil sampling and analysis
The s udy a ea consis ed o h ee ag icul u al si es o No
a Ves
nad Popelkou (No
a Ves), Úd nice (J
cin) and Klu
co loca ed in u al
egions o he Czech Republic (Fig. 1). The si es mos ly p oduce
maize, po a oes, ce eals and oilseed ape. Acco ding o he Wo ld
Re e ence Base (WRB) o soils (IUSS Wo king G oup WRB, 2014),
No
a Ves soil ype is mainly Haplic Cambisol o med on he
Pe mian-Ca boni e ous pa en sands one while Úd nice is
comp ised o Lu isols, Albic Lu isols, Lu ic Che nozems on he
Pleis ocene loess uni s. Calcic Che nozem, Lu ic Che nozem and
Regosols a e he dominan soil ypes in Klu
co de eloped on he
P o e ozoic and Paleozoic schis and g anodio i e ocks. The si es
ha e simila clima e condi ions wi h high a es o ac ual and po-
en ial deg ada ion bye osion. The main e ain ea u es o he si es
include side alley, pla eau, oe-slope and back-slope. Long- e m
illage, no conse a ion, a e age 25 cm plough dep h and 5e6
cou se o a ion based on he No olk sys em a e conside ed as he
main land managemen p ac ices in he s udy a ea (Gholizadeh
e al., 2018).
Two hund ed and o y (240) soil samples we e collec ed om
he opsoil (0e20 cm) in June 2021. The posi ion o each sampling
poin was eco ded using a GeoXM (T imble Inc., Sunny ale, Cali-
o nia, USA) global posi ioning sys em (GPS) wi h 1 m accu acy. All
samples we e ai -d ied, g ound, sie ed (<2 mm), and mixed ho -
oughly be o e he o al oxidized ca bon was measu ed using he
WalkleyeBlack me hod.
V. Khos a i, A. Gholizadeh, R. Kode
so
a e al. In e na ional Soil and Wa e Conse a ion Resea ch xxx (xxxx) xxx
2
2.2. Lab spec oscopy and da a p e-p ocessing
ASD FieldSpec III P o FR spec ome e (ASD Inc., Den e , Colo-
ado, USA) was used o eco d he spec al eflec ance o he
samples ac oss he VNIReSWIR ange (350e2500 nm) by con-
ac ing wi h a high-in ensi y p obe. Soil samples we e placed in
5.5 cm diame e pe i dishes o ming 2 cm laye s o soil o a oid
beam eflec ance om he bo om o he dish (Gholizadeh e al.,
2018). Samples we e le eled o o ob ain a fla su ace o
maximum ligh eflec ion and a high signal o noise a io. All
spec al eadings we e om he cen e o he samples. The spec-
ome e was calib a ed using a whi e Spec alon™(Lab-sphe e,
No h Su on, New Hampshi e, USA) be o e he fi s scan and a e
e e y en measu emen s.
Be o e modeling, he aw spec a (Fig. 2 (a)) we e subjec ed o
se e al p e-p ocessing scena ios including smoo hing and noise
emo al, sca e co ec ions, de i a i e ans o ma ions and
de ending. The mos e ficien spec al ea men was: i) emo ing
he a ificial noise be ween 350e449 nm and 2451e2500 nm,
caused by he de ice, ii) ans o ma ion o eflec ance o abso -
bance ia log (1/R) in which R is he eflec ance spec a, iii)
Sa i zky-Golay smoo hing wi h a second-o de polynomial fi and
11 smoo hing poin s, and i ) fi s de i a i e (FD) ans o ma ion o
emo e he baseline o se (Fig. 2 (b)). Finally, he ou lie s we e
emo ed by applying he H-dis ance echnique on he FD-spec a.
Consequen ly, six samples ( ou om he No
a Ves and wo om
Úd nice) we e ound no o be consis en wi h he o he obse a-
ions and emo ed om u he p ocessing.
2.3. Wa eleng h selec ion and c ea ing as e s
The GA echnique aims o selec he op imum a iables in
sea ching o he bes solu ion om a popula ion o candida es. The
highe he solu ion quali y o he eg ession model, known as
fi ness, he highe he ep oduc ion p obabili y. This s udy imple-
men ed PLSR on he spec a a GA-selec ed wa eleng hs because o
Fig. 1. (a) S udy si es in he Czech Republic and sampling loca ions in (b) Klu
co , (c) No
a Ves nad Popelkou and (d) Úd nice.
Fig. 2. The (a) aw and (b) fi s de i a i e spec a o he soil samples.
V. Khos a i, A. Gholizadeh, R. Kode
so
a e al. In e na ional Soil and Wa e Conse a ion Resea ch xxx (xxxx) xxx
3
i s capabili y o cap u e he maximum a iance and p o ide he
highes co ela ion be ween he spec a and he SOC con en .
Chemome icsWi hR package was used o GA-PLSR wi h andom
ini ializa ion, simple h eshold selec ion, and uni o m ype c oss-
o e (Weh ens, 2011). 500 i e a ions we e done wi h 30e50
spec a and he de aul mu a ion p obabili y o 1%. The same p o-
cedu e was pe o med o each si e and he mos impo an com-
mon wa eleng hs we e selec ed. The OK me hod was pe o med on
he FD alue a each selec ed wa eleng h o ob ain a 5 m esolu ion
pixel as e o ha wa eleng h, wi h he alue o e e y pixel/g id
co esponding o 5 m by 5 m squa e in he s udy a ea. I was ied o
fi he bes heo e ical a iog am model o he FD alues a each
selec ed wa eleng h and pe o m an op imum FD es ima ion o
unsampled a eas wi h he lowes e o and s anda d de ia ion.
2.4. P edic o a iables
Depending on he app oach, a ious ypes o p edic o a iables
we e used indi idually o in combina ion o p edic and map he
SOC con en s. Fo he ML and RK app oaches, he GA-selec ed
measu ed spec a, EPCs, and hei combina ion we e used o
cons uc he p edic ion models on he calib a ion da ase . Two
g oups o EPCs we e used in his s udy (Table A1): i) Sen inel-2A-
de i ed spec al indices, and ii) e ain-based EPCs de i ed om
Ad anced Space-bo ne The mal Emission and Reflec ion Radiom-
e e Global Digi al Ele a ion Model (ASTER GDEM) a 30 m spa ial
esolu ion (by using SAGA GIS (Con ad e al., 2015),).
2.5. Modeling app oaches
The classic Kenna deS one me hod, which chooses samples
based on a dis ance measu e (Kenna d &S one, 1969), was used o
spli samples o each si e in o calib a ion (75%) and es (25%)
da ase s. Using 5- old c oss- alida ion, RK, and ML models we e
es ablished on he calib a ion da ase consis ing o EPCs and sam-
ples' FD alue a he GA selec ed wa eleng hs, sepa a ely and in
combina ion, as p edic o a iables, and measu ed SOC as he
a ge a iable. The ob ained models we e e alua ed using he es
da ase , which included he measu ed and in e pola ed spec a,
EPCs, and hei combina ion as inpu , and he measu ed SOC as he
a ge a iables (fi s and second app oaches). Models de eloped
based on he measu ed spec a we e e alua ed by applying hem
sepa a ely o he es da ase 's measu ed and in e pola ed spec a.
The models de eloped on he EPCs we e es ed on he EPCs da a in
he pixels, whe e he es samples we e loca ed. The models
de eloped on he combina ion o he spec a and EPCs we e
applied o bo h EPC-measu ed spec a and EPCs-in e pola ed
spec a in he es samples' loca ions. This could help o gain a
be e insigh in o he e ficiency o using he in e pola ed spec a o
p edic he desi ed a iable in unsampled loca ions. Finally, he OK
models we e de eloped on he measu ed SOC and he SOC p e-
dic ed by he ML models on he calib a ion da ase ( hi d
app oach).
In he case o ML and RK, he esul s ob ained by using he
in e pola ed spec a and he in e pola ed spec a-EPCs we e
compa ed wi h hose yielded by he measu ed spec a, EPCs, and
he measu ed spec a-EPCs o he es da ase . This could help o
de e mine how eliable he p edic ion o SOC in unsampled loca-
ions was, by applying he models on he in e pola ed spec a.
Then, he ML and RK models cons uc ed on soil spec a, EPCs, and
hei combina ion we e compa ed o each o he o unde s and i
hei syne gy can imp o e he p edic ion accu acy. Va ious ypes o
da ase s used o he models’calib a ion and e alua ion in each
app oach a e p esen ed in Table 1.
2.5.1. Machine lea ning
The PLSR, cubis , RF, and SVR algo i hms we e used o de elop
SOC models in his s udy. The PLSR algo i hm includes he cha -
ac e is ics o bo h p incipal componen eg ession, and s epwise
mul iple linea eg ession (Ma ens &Naes, 1992). Cubis , as a
eg ession ee-based algo i hm, is an ex ension o Quinlan's M5
model ee (Quinlan, 1992). RF, de eloped by (B eiman, 2001), is a
ee-based ensemble lea ning echnique o bo h classifica ion and
eg ession applica ions. SVR, as a subca ego y o suppo ec o
machines, is based on calcula ing a linea eg ession unc ion in a
mul idimensional ea u e space, in which nonlinea unc ions a e
used o mapping he da a (Awad &Khanna, 2015).
2.5.2. K iging
As a subclass o spa ial es ima ion me hods, k iging is a p oba-
bilis ic ool ha assumes a s a is ical model o he da a. I is based
on he semi- a iog am (Eq. (1)) o he s udy a ea, which p o ides
he spa ial a ia ion s uc u e o he a ge a iable in a quan i a i e
o m (Webs e &Oli e , 2007).
g
ðhÞ¼ 1
2nX
n
i¼1
ðZðXiÞZðXiþhÞÞ2(1)
whe e,
g
(h) is he a e age o semi- a iances be ween all possible n
pai s o samples sepa a ed by he lag dis ance ec o o h, and Z(X
i
)
and Z(X
i
þh) a e he alues o a Z a iable a i h pai wi h dis ance
o h. A eliable a iog am model is needed o op imal explana ion
o he spa ial complexi y o he a ea unde s udy and high accu acy
k iging esul s. This can be achie ed by sepa a ing samples by sho
o e y la ge lag dis ances. The mos common k iging me hod is OK,
which is a linea unbiased es ima o wi h an e o mean equal o
ze o. The mean alue o he a ge a iable o e he s udy a ea is
no needed o be known in OK (Eq. (2)).
ZðuÞ¼XNðuÞ
j¼1
u
jðuÞZuj;s: :X
NðuÞ
j¼1
u
j¼1 (2)
whe e, Z(u) is he es ima ed alue a poin u, N(u) is he numbe o
obse ed poin s employed o es ima ion a poin u,
u
j
(u) a e he
k iging weigh s wi h he sum equal o 1 assigned o he obse ed
poin s. In his s udy, exponen ial, sphe ical, ci cula , s able, and
Gaussian models we e es ed o expe imen al semi- a iog am
calcula ion based on c oss- alida ion. Fu he mo e, he SOC con-
en was log- ans o med o comply wi h he no mali y assump ion
Table 1
Va ious ypes o da ase s used o models’calib a ion and es in each app oach.
App oach Model Da ase
Calib a ion (75%) Tes (25%)
ML PLSR
Cubis
RF
SVR
Measu ed spec a - Measu ed spec a
-In e pola ed spec a
EPCs -EPCs
Measu e spec a þEPCs -Measu e spec a þEPCs
-In e pola ed spec a þEPCs
RK PLSR
Cubis
RF
SVR
Measu ed spec a -Measu ed spec a
-In e pola ed spec a
EPCs -EPCs
Measu e spec a þEPCs -Measu e spec a þEPCs
-In e pola ed spec a þEPCs
K iging OK Measu ed SOC Tes samples' loca ions
PLSR p edic ed SOC
Cubis p edic ed SOC
RF p edic ed SOC
SVR p edic ed SOC
V. Khos a i, A. Gholizadeh, R. Kode
so
a e al. In e na ional Soil and Wa e Conse a ion Resea ch xxx (xxxx) xxx
4
o k iging.
2.5.3. Reg ession k iging (RK)
In oduced by Odeh e al. (1995), RK is one o he mos popula
hyb id spa ial echniques o p edic ing soil p ope ies. In RK, he
expe imen al a iog am o eg ession echniques p edic ion e-
siduals (on he calib a ion da ase ) is bes fi ed by an app op ia e
heo e ical a iog am. The SOC con en a he es samples loca ion,
Z(u), is hen es ima ed by he OK o esiduals and summing up he
k iged alues, (u), o he ML p edic ions on he es se , P(u) (Eq.
(3)).
ZðuÞ¼PðuÞþ ðuÞ(3)
2.6. Digi al soil mapping
Fu he o quan i a i e e alua ion o he calib a ed models, he
bes o hose ob ained ia he ML and RK app oaches we e applied
o he in e pola ed spec a and EPCs, sepa a ely and in combina-
ion, o p oduce he SOC maps o he s udy a ea. The bes OK model
ob ained in he hi d app oach was also used o p o ide a SOC map
o compa e wi h he abo e-men ioned maps.
2.7. Accu acy assessmen and unce ain y analysis
The mean e o (ME), oo mean squa e e o (RMSE), coe ficien
o de e mina ion (R
2
), and Lin's conco dance co ela ion coe ficien
(LCCC) we e used as he e alua ion c i e ia ob ained by applying
he calib a ed models on he es se (Eqs. (4)e(7)).
ME ¼Pn
i¼1ðoipiÞ
n(4)
R2¼1Pn
i¼1ðoipiÞ2
Pn
i¼1ðoi
m
oÞ2(5)
RMSE ¼ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
1
nX
n
i¼1
ðoipiÞ2
u
u
(6)
LCCC ¼2
s
o
s
p
s
2
oþ
s
2
pþ
m
o
m
p2(7)
whe e o
i
is he obse ed alue, p
i
is he p edic ed alue, n is he
numbe o samples,
m
o
and
m
p
a e he means o he obse ed and
p edic ed alues,
s
2
o
and
s
2
p
a e he co esponding a iances, and
is he Pea son's co ela ion coe ficien be ween he wo a iables.
In gene al, a obus model has high R
2
and LCCC and low ME and
RMSE alues.
Analysis o unce ain y was pe o med using SGS by simula ing
100 ealiza ions ( u hs) o each selec ed wa eleng h as e in o
5 m g ids ac oss each si e and using he simula ed ealiza ions o
eed 100 di e en unce ain ML p edic ion models. The s anda d
de ia ion o he 100 p edic ed SOC alues we e de e mined and
conside ed as each ML-models’unce ain y. The unce ain y asso-
cia ed wi h RK app oach models we e also ob ained in a e y
simila way wi h 100 mo e ealiza ions simula ed on he p edic ion
esiduals o each o he 100 ealiza ions o he ML models. The
unce ain y dis ibu ion, which was equal in all samples, was no
in ol ed in any o he modeling p ocesses o his s udy. A schema ic
iew o he me hodology is shown in Fig. 3.
3. Resul s
3.1. S a is ical desc ip ion
Summa y s a is ics o he SOC con en o he collec ed samples
(excluding ou lie s) a e p esen ed in Table 2. I can be seen ha he
SOC con en s we e low, below 2.59% as he maximum alue. The
alues anged be ween 0.88 and 2.59%, 0.50e1.87%, and 0.77e1.61%
wi h he mean alues o 1.38%, 1.04%, and 1.11% o No
a Ves,
Úd nice, and Klu
co , espec i ely. SOC skewed posi i ely in all si es,
wi h he highes skewness alue o 1.31 o No
a Ves, indica ing he
la ges asymme y in dis ibu ion o he a ge a iable. The coe -
ficien o a ia ion (CV) indica ed low o mode a e a iabili ies o
all si es (0.18 <CV <0.24), wi h he highes alue o Úd nice,
which can mainly be due o ha i lies in a highland wi h complex
elie ha is na u ally mo e he e ogeneous. Samples o no si e
ollowed he no mal dis ibu ion; hence loga i hmic ans-
o ma ion was applied o he samples o gua an ee he no mali y,
whe e needed.
3.2. GA-selec ed wa eleng hs
Fo each si e, FD as e s a 30 op impo an GA-selec ed
wa eleng hs we e used as he p edic o a iables o ou u he
p ocessing. The OK-in e pola ed as e s o he selec ed spec al
a iables o No
a Ves can be seen in Fig. 4. All as e s we e c ea ed
using spec a o he calib a ion da ase . The same esul ing as e s
o Úd nice and Klu
co can be seen in Fig. A1 and Fig. A2, in he
supplemen a y ma e ials.
3.3. Pe o mance o models on in e pola ed spec a
The esul s o applying he ML and RK models, on in e pola ed
spec a (as es da ase ) a e p esen ed in Tables 3 and 4. As shown
in Table 3, mos o he models de eloped unde bo h ML and RK
app oaches, yielded ela i ely weak esul s when e alua ed on he
in e pola ed spec a. Conside ing he RK app oach, he bes esul
on he in e pola ed spec a was ob ained by SVR-K wi h
ME ¼0.03%, RMSE ¼0.12%, R
2
¼0.69, and LCCC ¼0.76, o
Úd nice, ME ¼0.05%, RMSE ¼0.13%, R
2
¼0.62, and LCCC ¼0.72 o
Klu
co , and ME ¼0.09%, RMSE ¼0.19%, R
2
¼0.42, and LCCC ¼0.61
o No
a Ves. Applying models on he measu ed spec a, yielded
ou pu s wi h he same o de , bu highe accu acy ei.e., SVR-K wi h
ME ¼0%, RMSE ¼0.09%, R
2
¼0.77, and LCCC ¼0.85 o Úd nice,
ME ¼0.01%, RMSE ¼0.10%, R
2
¼0.75, and LCCC ¼0.78 o Klu
co ,
and ME ¼0.08%, RMSE ¼0.16%, R
2
¼0.59, and LCCC ¼0.63 o
No
a Ves. Simila ly, he pe o mance o all models unde he ML
app oach d opped, and SVR e o ins anceeshowed lowe p edic-
ion accu acy on he in e pola ed spec a (ME ¼0.03%,
RMSE ¼0.15%, R
2
¼0.65, and LCCC ¼0.73 o Úd nice, ME ¼0.06%,
RMSE ¼0.17%, R
2
¼0.53, and LCCC ¼0.64 o Klu
co , and
ME ¼0.1%, RMSE ¼0.22%, R
2
¼0.40, and LCCC ¼0.51 o No
a Ves)
han he measu ed spec a (ME ¼0%, RMSE ¼0.11%, R
2
¼0.73, and
LCCC ¼0.82 o Úd nice, ME ¼0.03%, RMSE ¼0.12%, R
2
¼0.66, and
LCCC ¼0.69 o Klu
co , and ME ¼0.10%, RMSE ¼0.19%, R
2
¼0.56
and LCCC ¼0.61 o No
a Ves).
Simila esul s we e ob ained using he combina ion o spec a
and EPCs as he es da ase (Table 4). Acco dingly, he models
unde RK p o ided highe SOC p edic ion pe o mance when
applied o he combina ion o measu ed spec a and EPCs,
compa ed o he combina ion o in e pola ed spec a and EPCs.
Alike o he ML app oach, be e p edic ion esul s we e ob ained
using he combina ion o measu ed spec a and EPCs han he
in e pola ed spec a and EPCs.
In addi ion, he esul s ob ained on he in e pola ed spec a
V. Khos a i, A. Gholizadeh, R. Kode
so
a e al. In e na ional Soil and Wa e Conse a ion Resea ch xxx (xxxx) xxx
5
sepa a ely and combined wi h EPCs we e compa ed wi h he e-
sul s o he hi d app oach (p esen ed in Table 5 and Fig. A3), he
SOC es ima ions (a he es da ase ) ob ained by he OK models
calib a ed on he measu ed and ML-p edic ed SOC con en s (a he
calib a ion da ase ). Acco dingly, he esul s o OK on he measu ed
SOC (ME ¼0.05, RMSE ¼0.12%, R
2
¼0.71, and LCCC ¼0.79 o
Úd nice, ME ¼0.03%, RMSE ¼0.14%, R
2
¼0.65, and LCCC ¼0.77 o
Klu
co , ME ¼0.08%, RMSE ¼0.21%, R
2
¼0.42, and LCCC ¼0.60 o
No
a Ves) we e be e han hose ob ained by he ML and RK ap-
p oaches on he in e pola ed spec a, alone. Tha was while he SOC
p edic ed by he ML and RK app oaches on he in e pola ed spec a
we e mo e accu a e han all he SOC alues es ima ed by he OK o
ML-p edic ed SOC. In o he wo ds, using he in e pola ed spec a as
ML and RK models inpu , yields mo e accu a e SOC han in e po-
la ion o he RK and ML models’ou pu SOC. I is also no ewo hy o
men ion ha he RK and ML app oaches pe o med on combina-
ion o he in e pola ed spec a and EPCs had supe io pe o mance
han all models unde he hi d app oach.
3.4. The e ec o combining spec a and EPCs on de eloped models
Compa ing esul s ob ained using he sepa a ed and combined
samples spec a and EPCs (Tables 3 and 4), can highligh he be e
pe o mance o models calib a ed on he measu ed spec a and
EPCs han hose calib a ed on any o hem sepa a ely. Fo No
a Ves,
as an example, he bes pe o mance be ween all app oaches and
me hods was a ained using SVR-K on he combina ion o
measu ed spec a and EPCs (R
2
¼0.65, RMSE ¼0.15%), ollowed by
SVR-ML on he same da ase (R
2
¼0.62, RMSE ¼0.17%), SVR-K on
he measu ed spec a (R
2
¼0.59, RMSE ¼0.16%), and SVR-ML on
he measu ed spec a (R
2
¼0.56, RMSE ¼0.19%). In addi ion, he
combina ion o in e pola ed spec a and EPCs yielded be e p e-
dic ions han using hem indi idually (Tables 3 and 4). ML and RK
on he in e pola ed spec a combined wi h he EPCs also p o ided
highe accu acy (R
2
¼0.51 and RMSE ¼0.17% o SVR-K and
R
2
¼0.49 and RMSE ¼0.19% o SVR-ML in No
a Ves) han OK on
he measu ed SOC (R
2
¼0.42 and RMSE ¼0.21%). This u he
p o ed ha da a ypes had he highes impac on SOC p edic ion, in
his s udy.
By compa ing he esul s ob ained on all si es and ypes o
da ase s, ollowing dec easing pe o mance o de can be obse ed:
he ML and RK app oaches calib a ed and es ed on he measu ed
spec a combined wi h he EPCs > he ML and RK app oaches
calib a ed and es ed on he measu ed spec a > he ML and RK
app oaches calib a ed on he measu ed spec a combined wi h he
EPCs, and es ed on he in e pola ed spec a combined wi h he
EPCs > he OK-calib a ed on he measu ed SOC > he ML and RK
app oaches calib a ed and es ed on he EPCs > he ML and RK
app oaches calib a ed on he measu ed spec a and es ed on he
in e pola ed poin spec a > he OK-calib a ed on he ML-p edic ed
SOC.
3.5. Gene al compa ison be ween app oaches and algo i hms
Rega dless o he si es and da ase s, SVR p esen ed he bes
accu acies among all algo i hms, ollowed by RF. Be ween PLSR and
Cubic, no me hod showed a clea supe io i y. Howe e , by
accep ing ha RMSE is a mo e impo an c i e ion han R
2
(Willmo , 1981), PLSR was mo e success ul han cubis . The poo -
es SOC p edic ion was yielded using OK applied on he cubis -
p edic ed SOC alues wi h ME ¼0.14%, RMSE ¼0.25%, R
2
¼0.21,
and LCCC ¼0.29.
3.6. Spa ial dis ibu ion maps
Ou main mo i a ion o spec a in e pola ion was o employ i
o digi al mapping o SOC. As ob ained p e iously (Tables 3 and 4),
SVR eunde he RK (SVR-K) and ML (SVR) app oachesecalib a ed
Fig. 3. Me hodology flowcha o his s udy.
Table 2
Summa y s a is ics o SOC con en (%) o he samples ga he ed om he s udy a ea.
Si e n Min Max Mean S D CV Skewness
No
a Ves 76 0.88 2.59 1.38 0.29 0.21 1.31
Úd nice 78 0.50 1.87 1.04 0.25 0.24 0.94
Klu
co 80 0.77 1.61 1.11 0.20 0.18 0.33
n: numbe o samples, Min: Minimum, Max: Maximum, S D: S anda d de ia ion,
CV: Coe ficien o a ia ion.
V. Khos a i, A. Gholizadeh, R. Kode
so
a e al. In e na ional Soil and Wa e Conse a ion Resea ch xxx (xxxx) xxx
6
on he sepa a ed and combined spec a and EPCs, p o ided he
highes accu acies o SOC p edic ion. These models we e used o
c ea e he 5 m esolu ion SOC spa ial dis ibu ion maps o all si es,
along wi h he dis ibu ion map ob ained by OK o he measu ed
SOC (Fig. 5).
As can be seen in all fi e maps o No
a Ves, he lowes SOC
con en s a e in he no he n and sou he n pa s being inc eased
g adually owa d he middle o he si e, showing highe SOC con-
en s o o e 1.3%. Fo Úd nice, he sou he n and middle pa s
(uppe po ion) show ela i ely highe SOC con en s in all he
maps. The low SOC is also e iden in he no he n and middle pa s
(lowe po ion) o he si e wi h con en s lowe han abou 1%. In he
case o Klu
co , he sou he n and middle pa s show he highes SOC
alues, while he no he n and eas e n pa s ha e ela i ely lowe
SOC con en s o unde abou 1.1%. In gene al, compa ing he ap-
p oaches and da ase s used, he spa ial pa e n o SOC was almos
simila , hough, some di e ences we e obse ed in he ange o
p edic ed SOC. As an example, o all si es, he ange o he p e-
dic ed SOC using SVR (Fig. 5(b)) and SVR-K (Fig. 5d) on he in e -
pola ed spec a combined wi h EPCs, we e mo e simila o he
ac ual SOC limi s, han he ange o he p edic ed SOC using SVR
(Fig. 5(a)) and SVR-K (Fig. 5(c)) on he in e pola ed spec a. These
esul s we e compa ible wi h hose epo ed in Tables 3 and 4.
Finally, he maps ob ained using SVR-K and OK (Fig. 5 (c), (d), (e))
a e cha ac e ized by being smoo he han he maps ob ained using
SVR (Fig. 5(a) and (b)), which migh be due o he inhe en
smoo hing e ec o he OK me hod.
3.7. Unce ain y
The a e age unce ain y associa ed wi h he SOC p edic ion
models applied on he in e pola ed spec a is shown in Table 6. The
Fig. 4. Ras e s ob ained by OK-based in e pola ion o he samples FD spec a a he selec ed wa eleng hs (No
a Ves).
V. Khos a i, A. Gholizadeh, R. Kode
so
a e al. In e na ional Soil and Wa e Conse a ion Resea ch xxx (xxxx) xxx
7
a e age unce ain y esul ing om applying he SGS on he
measu ed SOC is also added o compa ison. As e idenced, he ML
app oach mos ly showed lowe unce ain y han RK, ega dless o
he da a ypes, algo i hms, and si es (Table 6). Compa ed o he
k iging app oach, he unce ain y ob ained by SVR-ML was lowe
o bo h ypes o da ase s, while, he unce ain y o he o he al-
go i hms, was mo e han he measu ed SOC. Models de eloped on
he combina ion o spec a and EPCs eleased lowe unce ain y
han hose solely calib a ed on he spec a. Compa ing he algo-
i hms, SVR caused he lowes unce ain y in bo h app oaches and
da ase s, ollowed by RF. No significan di e ence was obse ed
be ween PLSR and cubis models’ esul ing unce ain ies. Consid-
e ing he si es, Klu
co had lowe unce ain y han No
a Ves and
Úd nice.
The unce ain y maps o SOC ob ained by applying he bes ML
and RK models on he in e pola ed spec a and he combina ion o
Table 3
Resul s o ML and RK app oaches applied on measu ed and in e pola ed poin spec a ( es samples).
Si e App oach Model Measu ed poin spec a In e pola ed poin spec a
ME RMSE R
2
LCCC ME RMSE R
2
LCCC
No
a Ves ML PLSR 0.07 0.20 0.46 0.54 0.09 0.24 0.22 0.33
Cubis 0.12 0.22 0.43 0.49 0.14 0.23 0.26 0.37
RF 0.08 0.22 0.56 0.59 0.11 0.23 0.34 0.44
SVR 0.10 0.19 0.56 0.61 0.10 0.22 0.40 0.51
RK PLSR 0.09 0.21 0.45 0.58 0.13 0.24 0.25 0.54
Cubis 0.10 0.22 0.49 0.53 0.14 0.22 0.29 0.47
RF 0.06 0.19 0.58 0.62 0.09 0.22 0.34 0.59
SVR 0.08 0.16 0.59 0.63 0.09 0.19 0.42 0.61
Úd nice ML PLSR 0.06 0.12 0.77 0.84 0.08 0.15 0.35 0.53
Cubis 0.03 0.13 0.65 0.79 0.10 0.18 0.39 0.44
RF 0.04 0.14 0.59 0.68 0.06 0.16 0.47 0.56
SVR 0 0.11 0.73 0.82 0.03 0.15 0.65 0.73
RK PLSR 0.02 0.11 0.78 0.86 0.05 0.14 0.42 0.60
Cubis 0.04 0.14 0.67 0.81 0.09 0.16 0.49 0.57
RF 0.01 0.12 0.70 0.73 0.07 0.13 0.59 0.68
SVR 0 0.09 0.77 0.85 0.03 0.12 0.69 0.76
Klu
co ML PLSR 0 0.15 0.43 0.71 0.04 0.18 0.41 0.59
Cubis 0.05 0.12 0.57 0.56 0.09 0.18 0.51 0.48
RF 0.02 0.13 0.72 0.75 0.08 0.16 0.52 0.58
SVR 0.03 0.12 0.66 0.69 0.06 0.17 0.53 0.64
RK PLSR 0.05 0.12 0.59 0.73 0.04 0.16 0.55 0.62
Cubis 0.03 0.14 0.46 0.59 0.07 0.15 0.40 0.51
RF 0.05 0.13 0.74 0.78 0.08 0.14 0.52 0.65
SVR 0.01 0.10 0.75 0.78 0.05 0.13 0.62 0.72
Table 4
Resul s o ML and RK app oaches applied on en i onmen al p edic o co a ia es
(EPCs), measu ed poin spec a and EPCs and in e pola ed poin spec a and EPCs
( es samples).
Si e App oach Model EPCs Measu ed
poin
spec a &
EPCs
In e pola ed
spec a &
EPCs
RMSE R
2
RMSE R
2
RMSE R
2
No
a Ves ML PLSR 0.23 0.24 0.18 0.44 0.22 0.36
Cubis 0.22 0.28 0.22 0.46 0.22 0.38
RF 0.23 0.35 0.20 0.59 0.21 0.42
SVR 0.20 0.43 0.17 0.62 0.19 0.49
RK PLSR 0.23 0.28 0.18 0.51 0.23 0.39
Cubis 0.21 0.32 0.19 0.47 0.21 0.41
RF 0.22 0.37 0.19 0.62 0.21 0.45
SVR 0.18 0.45 0.15 0.65 0.17 0.51
Úd nice ML PLSR 0.14 0.41 0.12 0.79 0.14 0.45
Cubis 0.18 0.42 0.12 0.68 0.16 0.43
RF 0.16 0.49 0.11 0.63 0.15 0.53
SVR 0.14 0.68 0.10 0.79 0.12 0.74
RK PLSR 0.12 0.44 0.11 0.81 0.12 0.49
Cubis 0.16 0.43 0.11 0.69 0.14 0.58
RF 0.13 0.54 0.10 0.71 0.12 0.62
SVR 0.11 0.71 0.09 0.87 0.11 0.75
Klu
co ML PLSR 0.17 0.52 0.12 0.59 0.15 0.55
Cubis 0.17 0.41 0.14 0.47 0.16 0.43
RF 0.16 0.56 0.13 0.75 0.14 0.58
SVR 0.15 0.59 0.12 0.68 0.13 0.61
RK PLSR 0.16 0.44 0.12 0.61 0.14 0.59
Cubis 0.15 0.59 0.13 0.50 0.15 0.46
RF 0.13 0.61 0.12 0.75 0.13 0.66
SVR 0.11 0.67 0.10 0.78 0.11 0.71
Table 5
Resul s o OK applied on measu ed SOC and ML p edic ed SOC ( es samples).
Si e Me ics Measu ed SOC PLSR
SOC
Cubis SOC RF
SOC
SVR
SOC
No
a Ves ME 0.08 0.16 0.14 0.13 0.11
RMSE 0.21 0.24 0.25 0.24 0.22
R
2
0.42 0.20 0.21 0.30 0.37
LCCC 0.60 0.32 0.29 0.41 0.46
Úd nice ME 0.05 0.10 0.11 0.08 0.05
RMSE 0.12 0.19 0.21 0.17 0.16
R
2
0.71 0.33 0.31 0.51 0.64
LCCC 0.79 0.50 0.42 0.63 0.67
Klu
co ME 0.03 0.07 0.1 0.09 0.06
RMSE 0.14 0.21 0.18 0.18 0.16
R
2
0.65 0.39 0.40 0.46 0.52
LCCC 0.77 0.47 0.43 0.49 0.60
V. Khos a i, A. Gholizadeh, R. Kode
so
a e al. In e na ional Soil and Wa e Conse a ion Resea ch xxx (xxxx) xxx
8
in e pola ed spec a and EPCs a e p esen ed in Fig. 6. The measu ed
SOC unce ain y map (ob ained by SGS unde he k iging app oach)
is also added o compa ison.
The unce ain y dis ibu ion pa e ns o he si es we e almos
simila o all ML, RK, and K iging app oaches. Howe e , he un-
ce ain y anges we e a iable among di e en app oaches, as i is
also e iden in Table 6. Fo ins ance, o No
a Ves, he ange o
unce ain y was highe o RK (Fig. 6(c) and (d): 0.17%e0.35%) han
ML (Fig. 6(a) and b: 0.07%e0.18%) and OK (Fig. 6(e): 0.11%e0.18%).
Likewise, o he o he si es, he unce ain y anges o he maps
c ea ed using he RK app oach (Fig. 6(c) and d: 0.12%e0.32% o
Úd nice and 0.11%e0.28% o Klu
co ) we e highe han hose ob-
ained om ML and OK. Compa ible wi h he esul s epo ed in
Table 6, he lowes unce ain y a ia ions we e obse ed o SVR-
ML (applied on he in e pola ed spec a combined wi h EPCs)
ollowed by OK. The unce ain y maps illus a ed in Fig. 6,p e-
sen ed highe spa ial de ails han he p edic ion maps, which is
mainly due o he highe a iabili ies in p edic ion confidence and
SGS- ela ed de ails o he unce ain y maps. The inhe en
smoo hing e ec o OK, which elimina es he fine-scale SOC a i-
abili ies in p edic ion maps can be conside ed as he o he impo -
an eason.
4. Discussion
4.1. In e pola ed spec a as inpu a iables
Focusing on he main objec i e o his s udy, applying he
de eloped models on he in e pola ed spec a yielded ela i ely
accep able esul s. Bu how logical is inco po a ing as e s o he
Fig. 5. The spa ial dis ibu ion maps o SOC p oduced by (a) SVR-ML applied on he in e pola ed spec a (ML app oach), (b) SVR-ML applied on he in e pola ed spec a combined
wi h EPCs (ML app oach), (c) SVR-K applied on he in e pola ed spec a (RK app oach), (d) SVR-K applied on he in e pola ed spec a combined wi h EPCs (RK app oach), and (e) OK
o measu ed SOC (k iging app oach).
Table 6
A e age unce ain y (%) o SOC p edic ion models unde di e en app oaches.
App oach Model Lab spec a Lab spec a &EPCs Measu ed SOC
No
a Ves Úd nice Klu
co No
a Ves Úd nice Klu
co No
a Ves Úd nice Klu
co
RK PLSR 14.13 15.26 11.02 12.11 10.84 9.36 eee
Cubis 15.73 11.29 10.81 11.89 9.97 10.65 eee
RF 14.19 10.86 10.76 10.25 9.36 8.29 eee
SVR 13.51 10.91 9.35 11.48 8.91 7.17 eee
ML PLSR 12.16 13.20 10.89 11.07 10.15 8.97 eee
Cubis 11.62 12. 77 11.72 10.45 9.21 10.53 eee
RF 11.87 9.74 8.92 10.23 8.73 6.97 eee
SVR 10.76 7.81 7.65 9.70 6.62 5.49 eee
K iging SGS eeeeee10.18 7.07 6.91
V. Khos a i, A. Gholizadeh, R. Kode
so
a e al. In e na ional Soil and Wa e Conse a ion Resea ch xxx (xxxx) xxx
9