Open Raman spec al lib a y o biomolecule iden i ica ion
Ma celo Te ´
an
a,*,1
, Jos´
e Ja ie Ruiz
b,1
, Pablo Loza-Al a ez
b
, Da id Masip
a
, Da id Me ino
a
a
Facul y o Compu e Science and Telecommunica ions, Uni e si a Obe a de Ca alunya (UOC), 08018, Ba celona, Spain
b
ICFO-Ins i u de Ciencies Fo oniques, The Ba celona Ins i u e o Science and Technology, Cas ellde els, 08860, Ba celona, Spain
ARTICLE INFO
Keywo ds:
Raman spec oscopy
Spec al lib a y
Biomolecules
Biomedicine
Da abase
Open-sou ce
ABSTRACT
Raman spec oscopy combined wi h Mul i a ia e Cu e Resolu ion (MCR) analysis is widely used in biomedical
applica ions. Howe e , assigna ion o biomolecules o he componen s ex ac ed by MCR can be challenging due
o he absence o an open Raman spec al lib a y o biomolecules. Raman expe s ypically iden i y unmixed
componen spec a as biomolecules by compa ing hem wi h e e ence spec a om he li e a u e. This p ocess
can be ime-consuming and subjec o human bias. In his wo k, we c ea ed an open Raman spec al da abase
wi h 140 biomolecules by implemen ing an algo i hm o digi alize he spec a plo s and mos ele an peaks om
a icles a ailable in he li e a u e. Addi ionally, we implemen ed wo sea ch algo i hms. The i s one uses he
spec al linea ke nel o cosine simila i y on he ull spec a. The second algo i hm is based on peak ma ching,
and elies on he in e sec ion o e he union o he ma ched peaks wi h a de ined ole ance o peak ma ching.
Ou expe imen al alida ion showed 100 % op 10 accu acy in molecule iden i ica ion (e.g. collagen) and 100 %
accu acy in molecule ype iden i ica ion (e.g. p o ein) in bo h pu e biomolecule measu emen s and also when
eplica ing esul s om p io s udies. Objec i ely na owing he iden i ica ion o he op 10 anked candida es
and p o iding ype iden i ica ion can signi ican ly educe bo h he ime equi ed o isual iden i ica ion and he
need o pu chase e e ence componen samples. We publish ou spec al lib a y as an open-sou ce ool so i can
be expanded collabo a i ely by he esea ch communi y. I is a ailable a : h ps://gi hub.com/m e anm/ ama
nbiolib.
1. In oduc ion
Raman spec oscopy (RS) is an op ical echnique ha uses he phe-
nomenon o inelas ic sca e ing o monoch oma ic adia ion o p o ide
ib a ional inge p in s o he di e en ypes o molecules p esen in he
sample unde s udy [1]. In ecen yea s, RS has gained a en ion as an
analy ical ool in biomedical applica ions due o i s non-in asi e and
label- ee cha ac e is ics [2,3]. RS does no equi e any speci ic sample
p epa a ion, making i pa icula ly ad an ageous o eal- ime and in
i o analysis o biological sys ems wi hou he need o biopsies. This
makes i especially aluable in clinical diagnosis, as i enables he ex-
amina ion o issue composi ion and he de ec ion o disease ma ke s
wi hou in asi e p ocedu es, educing pa ien discom o and elimi-
na ing he isks associa ed wi h sample collec ion [4].
Al hough RS can be used o s udy he chemical composi ion o
samples o med by he mix u e o mul iple componen s, i is usually
di icul o iden i y each o hese componen s, specially when measu ing
complex mix u es, such as biological issue [5]. Molecules wi h simila
chemical cha ac e is ics gene a e ib a ional bands ha o e lap in he
same egion o he Raman spec um. This is usually happening when
mul iple biomolecules o he same ype a e p esen in he mix u e; o
ins ance p o eins, lipids, ca bohyd a es o nucleic acids. In addi ion,
Raman sca e ing signal is usually weak and is o en masked by o he
e ec s, such as au o- luo escence, which can be in ense in cells and
issues [6]. The e o e, complex analy ical echniques a e equi ed o
ex ac he ele an in o ma ion om spec al da a.
To o e come he o e lapping challenge ha mix u es p esen , he
di e en spec a a iables mus be conside ed o iden i y which Raman
bands o peaks belong o one molecule and which o o he s, which
jus i ies he necessi y o implemen ing mul i a ia e analysis. Speci -
ically, Mul i a ia e Cu e Resolu ion (MCR) analysis has been widely
used o unmix Raman spec al da a [2]. I pe o ms a ma ix ac o iza-
ion me hod ha decomposes a ma ix o spec a mix u es in o a ma ix
o componen spec a and ano he ma ix wi h hei ela i e abundances
[7]. Also, i is usually non-supe ised, which means ha i does no need
any a p io i in o ma ion ega ding he chemical con en o he sample.
* Co esponding au ho .
E-mail add ess: [email p o ec ed] (M. Te ´
an).
1
These au ho s con ibu ed equally o his wo k.
Con en s lis s a ailable a ScienceDi ec
Chemome ics and In elligen Labo a o y Sys ems
jou nal homepage: www.else ie .com/loca e/chemome ics
h ps://doi.o g/10.1016/j.chemolab.2025.105476
Recei ed 18 Ma ch 2025; Recei ed in e ised o m 19 May 2025; Accep ed 27 June 2025
Chemome ics and In elligen Labo a o y Sys ems 264 (2025) 105476
A ailable online 27 June 2025
0169-7439/© 2025 The Au ho s. Published by Else ie B.V. This is an open access a icle unde he CC BY-NC-ND license (
h p://c ea i ecommons.o g/licenses/by-
nc-nd/4.0/ ).
In addi ion, i is possible o impose meaning ul cons ain s ha imp o e
i s in e p e a ion, as o ins ance non-nega i i y alues in he spec a
componen s and abundances i p o ides.
Impo an ly, he i s s ep owa ds a biochemical in e p e a ion o
spec al da a analysis consis s o assigning each spec al componen
p o ided by he MCR me hod o i s co esponding molecule. This s ep is
usually pe o med by compa ing he mos p ominen bands in he un-
known spec um wi h published Raman spec a e e ences and band
posi ions o he biological molecules expec ed in he measu emen .
O en, his ask can become challenging and equi es Raman spec os-
copy expe ience due o: he high complexi y o mix u es, which leads o
he decomposi ion o componen s ha can be assigned o mul iple bio-
molecules; he e ec o backg ound signal; and he a ia ions o he
spec a ela ed o he speci ic se up used [8]. This ask can be
ime-consuming, subjec o human bias, and allows he alida ion o
only a limi ed subse o biomolecule candida es. Al hough e e ence
Raman spec a o some o he main biological molecules a e e iewed
and sha ed in he bibliog aphy [9–12], cu en ly, he e is no open
Raman spec a digi al da abase o biological molecules, no any open
sea ch lib a y ool ha can assis esea che s in he iden i ica ion p o-
cess. This can be a limi a ion o he b oad usage o Raman spec oscopy
in biomedical analysis. An open Raman spec al lib a y can p o ide a
s anda dized ool o iden i ying he Raman spec a o biomolecules,
which can be expanded h ough collabo a i e e o s by esea ch com-
muni y membe s. Spec al digi al da abases and sea ch ools a e a ail-
able in o he a eas o applica ion o Raman spec oscopy, such as
mine als and pain ing pigmen s iden i ica ion [13–16], and close o he
biological ield, in mic obiology, al hough in his case spec a do no
speci y biomolecules [17].
Conside ing his, in his wo k we aim o c ea e a digi al da abase
con aining he Raman spec a o he main biological molecules and o
p o ide and de elop an open sou ce ool o Raman spec al sea ch and
iden i ica ion.
2. Ma e ials and me hods
2.1. Ex ac ion o he Raman spec a biological molecules e e ence
da abase
We ha e iden i ied and digi alized al eady published Raman spec a.
E en hough spec a iles a e commonly no sha ed in he li e a u e, wo
di e en ypes o Raman in o ma ion can be commonly ound in pub-
lica ions: plo s o ull Raman spec a o ables wi h he posi ions o he
mos ele an Raman bands o peaks, in wa enumbe s (cm
−1
). Due o
his si ua ion, we c ea ed wo di e en iles in he da abase: one wi h he
spec a, and ano he wi h a lis o Raman peaks. In he Raman plo s ile,
he Raman shi and he in ensi y alues o he spec um plo a e
included, while in he Peak’s posi ions da abase he lis o ele an peaks
posi ions and he lis o ele an peaks posi ions in ensi y a e included.
An addi ional ile con ains he me ada a o he spec a, such as he
componen ype (lipid o p o ein, o ins ance), he DOI o he e e ence,
sample in o ma ion, and de ails o spec a acquisi ion and p ocessing.
The ull lis o me ada a is de ined in Supplemen a y Da a Table ST1. All
iles a e in he CSV o ma .
Al hough o iginal da a is always desi able, and despi e he possible
loss in da a quali y, we ex ac ed he spec a plo s om he images o
igu es om a icles using a cus om-de eloped Py hon sc ip based on
classical compu e ision con ou de ec ion echniques, using openc
lib a y [18,19], see Supplemen a y Da a S2 o mo e de ails. Since he
spec a acqui ed by di e en au ho s con ained di e en signal- o-noise
a io, luo escence baseline, Raman spec al ange, spec al esolu ion
and in ensi y alues, we s anda dized he da abase spec a applying he
ollowing p ep ocessing.
1. Smoo hing by means o a Sa i sky-Golay il e .
2. Baseline emo al using ai PLS algo i hm [20], implemen ed in he
BaselineRemo al lib a y [21].
3. Spec um ange s anda diza ion om 450 cm
−1
o 1800 cm
−1
wi h a
s ep o 1 cm
−1
, using linea in e pola ion.
4. Min-max no maliza ion o in ensi y alues be ween [0, 1].
We ob ained he peak posi ion in o ma ion om a icles ables by a
cus om-de eloped ex p ocessing code, in Py hon, om he PDF o
HTML sou ces o he a icles.
A o al o 17 a icles we e p ocessed o analyze 202 se s o da a, and
140 di e en molecule componen s, since some o he se s we e epea ed
in 2 o mo e a icles [9,10,12,22–34]. Table 1 shows he summa y o he
da abase componen s by ype and he da abase ull in o ma ion is
de ined in Supplemen a y Da a Table ST1. The da abase con ains mul-
iple spec um en ies o 36 componen s, see Table 2. Among hem, 11
o he duplica ed en ies we e measu ed by di e en au ho s using
di e en spec ome e s and da a acquisi ion condi ions. Addi ionally,
25 componen s we e ob ained by he same au ho s using he same
spec ome e equipped wi h lase s ope a ing a 488 nm and 532 nm. As
i will be shown la e , hese epea ed componen s will be used o e al-
ua e he sea ching algo i hm.
We ha e made he da abase a ailable on Gi Hub wi h he aim o
enabling he esea ch communi y o expand i by including addi ional
Raman spec a om biological molecules ha we e no conside ed in
he scope o his a icle. The guidelines o he con ibu ion p ocess can
be ound in Supplemen a y Da a S4.
2.2. Raman spec a based biological molecules sea ch algo i hms
Once he da abases o Raman spec a ha e been buil , he e we p o-
pose wo spec al sea ch algo i hms o assis in he iden i ica ion o an
unknown Raman spec um om a biological molecule. The algo i hms
aim o p o ide a anking o he simila i y o he unknown spec a o each
o he da abase spec a, educing he iden i ica ion o a manageable lis
o likely candida es. They we e implemen ed in an open-sou ce Py hon
lib a y a ailable on Gi hub
2
and h ough Py hon package ins alle .
Addi ionally, a desk op applica ion was de eloped o p o ide a G aph-
ical Use In e ace (GUI) o he co e lib a y o non- echnically adep
use s, which can be downloaded om Gi Hub.
2
2.2.1. Sea ch algo i hm based on Raman spec a plo simila i y
The i s algo i hm uses he Raman spec a plo s. I elies on a sim-
ila i y anking be ween he unknown que y Raman spec um, Su, and
each Raman spec um o he da abase, Sdb, using cosine simila i y (CS)
o spec al linea ke nel (SLK) as a simila i y sco e. In he inal anking,
we deduplica e he mul iple occu ences o Sdb o a single componen by
e aining only he Sdb wi h he highes simila i y o Su.
CS is a gene al simila i y me ic used in a wide ange o applica ions
and da a ypes [35]. I calcula es he cosine o he angle be ween wo
ec o s, in his case he que y and da abase Raman spec a, which e-
lec s hei simila i y in e ms o di ec ion a he han magni ude. In
e ms o Raman signal, i means ha i is based on he shape o he
spec um, a he han on i s absolu e in ensi y. This is pa icula ly
use ul, since absolu e in ensi ies a e mo e p one o change i di e en
expe imen condi ions a e used, such as di e en lase powe , acquisi-
ion ime, molecula concen a ion, di e en sys em componen s o
e en a di e en sys em alignmen . Conside ing he wo spec a a ay Su
and Sdb wi h he same leng h N, he CS is de ined as:
CS=Su⋅Sdb
Su⋅Su
√
Sdb⋅Sdb
√(1)
Whe e Su⋅Sdb is he do p oduc be ween bo h spec a.
2
h ps://gi hub.com/m e anm/ amanbiolib.
M. Te ´
an e al.
Chemome ics and In elligen Labo a o y Sys ems 264 (2025) 105476
2
SLK is a simila i y me ic ha was speci ically designed o Raman
spec a simila i y calcula ions by Khan e al. [36]. SLK akes in o
conside a ion he in ensi y a each wa enumbe and i s neighbou ing
poin s, wi hin a window. They designed and e alua ed SLK no only
conside ing pu e molecule spec a bu also mix u es. The SLK poin
simila i y is de ined as:
SimSLK(xi,zi)=xi.zi+∑
j=i+W
j=i−W(xi−xj)(zi−zj)(2)
Whe e x and z a e he spec a alues o Su and Sdb o compa e and W is
he window size, o which we selec ed a alue o W =25 cm
−1
o i
wi h he mos ele an peaks in he da abase. The SLK sco e is de ined as
he sum o he poin - o-poin simila i y:
SLK(x,z)= ∑
i=N
i=1
SimSLK(xi,zi)(3)
Analogously, o align he SLK sco e ange wi h he ange o he CS,
[−1, 1], we p opose he ollowing simila i y no maliza ion:
SLK(x,z)no m =SLK(x,z)
SLK(x,x)
√
SLK(z,z)
√(4)
The sea ch consis s o compu ing bo h simila i ies be ween he un-
known que y spec um and all spec a om he da ase . Then, he
componen s o he lib a y a e anked by descending ma ching simila i y
o de .
2.2.2. Sea ch algo i hm based on Raman spec a peak posi ions ma ching
When dealing wi h componen s in he da abase ha a e desc ibed by
means o a able wi h Raman peak posi ions lis , we p opose a peak
ma ching (PM) algo i hm ha conside s a symme ical ole ance T in he
ma ching o he peak posi ions om bo h he unknown que y spec um
and da abase spec a.
We de ined a peak ma ching sco e o compa e i wo se s o peak
posi ions con ain he same peak posi ions conside ing a symme ical
ole ance T (cm
−1
). The sco e is called he in e sec ion o e he union
a io, IUR@T, and aims o p o ide a sco e ha conside s which peaks a e
ma ched and which a e missed in bo h se s.
Conside ing a spec um Su, ep esen ed by a se o peak posi ions Pu,
and a spec um Sdb, ep esen ed by a se o peak posi ions Pdb, we de ined
he IUR@T as he numbe o peaks posi ions in he in e sec ion o he
subse o peaks in Pu ha a e also ound in Pdb , conside ing T, o e he
numbe o peak posi ions in he union o he se o Pu and Pdb:
IUR@T=
Pu∩Pdb|T
|Pu∪Pdb|(5)
Based on ou expe imen al obse a ions, we selec ed a symme ical
ole ance T =5 cm
−1
o compensa e o he peak posi ion shi ha can
occu due o di e en spec al esolu ion and calib a ion be ween
di e en spec ome e s and he spec a pixel images used o c ea e he
da abase. This alue was ound o p o ide a balance be ween co ec ing
o ins umen al a ia ions and p ese ing he di e en ia ion o spec al
bands.
The p oposed algo i hm is de ined as ollows.
1. Fo each Pdb in he da abase.
a. Find he peaks o Pu ha a e wi hin a dis ance T om a peak in
Pdb.
b. Deduplica e he cases whe e a single peak in Pdb was assigned o
mul iple peaks in Pu. The assigna ion combina ion ha maximizes
he numbe o ma ched peaks is used.
c. Calcula e he IUR sco e be ween Pu and Pdb.
2. Deduplica e he mul iple occu ences o Pdb o a single componen
by e aining only he Pdb wi h he highes ma ching sco e o Pu.
3. Rank he componen s by descending ma ching sco e o de
Since his algo i hm only uses he peaks posi ion da abase, i is no
a ec ed by he a ia ions in spec a peak’s shape and in ensi y due o
di e en acquisi ion condi ions [8].
The inpu o he peak ma ching algo i hm is a se o he spec um
peak posi ions in wa enumbe uni s. I is necessa y o i s ex ac he
mos p ominen peaks o he spec um plo , which we call in his a icle
peak ex ac ion. In his a icle we pe o med he peak ex ac ion using
SciPy ind_peaks unc ion [37]. SciPy ind_peaks inds all local maxima by
compa ison o neighbou ing alues. A peak minimum p ominence
h eshold was de ined o each spec um o il e ou he less p ominen
peaks ha a e a ibu ed o he noise o he signal.
2.2.3. Componen ype iden i ica ion using k-NN
We used he k-nea es neighbo s (k-NN) algo i hm o he molecula
ype iden i ica ion by iden i ying he k-nea es neighbo s o he da abase
wi h espec o he unknown componen spec um [38]. To iden i y
hem, he CS, SLK and IUR@T simila i ies we e conside ed as dis ance
Table 1
Summa y o he da abase componen s coun by biomolecule ype.
Lipids P o eins Saccha ides Amino Acids P ima y Me aboli es Nucleic Acids Pigmen s O he
Coun [ e e ence] 48 [9,11,24,25,34] 30 [10,26,27,32,33] 26 [9,12,31] 13 [9,29,30] 10 [9] 7 [9,22,23] 2 [9,28] 4 [9]
Table 2
Componen s wi h duplica ed spec um in he da abase.
Type Componen Re e ences
Lipids Fa y acids my is ic acid [9,11]
oleic acid
palmi ic acid
s ea ic acid
T iglyce ides ilinolein
ilinolenin
iolein
Saccha ides Monosaccha ides amylopec in [9,12]
amylose
Polysaccha ides d-(+)-xylose
d-(−)- uc ose
P o eins ca bonic anhyd ase [10]
collagen
cy och ome c
elas ase
e i in
glu a hione ans e ase
hemoglobin
ho se adish pe oxidase
albumin
lac albumin
lec in
majo p o einase
myoglobin
papain
pepsin
pepsinogen
supe oxide dismu ases
hauma in
iosephospha e isome ase
ypsin
ypsin inhibi o
ypsinogen
ubiqui in
xylanase
α
-chymo ypsinogen a
M. Te ´
an e al.
Chemome ics and In elligen Labo a o y Sys ems 264 (2025) 105476
3
me ics. The ype o he unknown componen was de e mined by ma-
jo i y o ing among he k nea es neighbo s’ ypes, and ies we e
esol ed based on he ype mean simila i y. We selec ed a numbe o k =
5 nea es neighbo s.
2.2.4. Sea ch e alua ion
To e alua e he sea ch algo i hms, we de e mined i a da abase
componen en y, which is known o be p esen in he spec um, appea s
a op 1, 5, and 10 elemen s o he sea ch esul . Addi ionally, we
e alua ed i we could p o ide ype iden i ica ion o he que y spec um
by de ining he ollowing expe imen s.
● We iden i ied a duplica ed spec um in he da abase, emo ed i om
he da abase, and used he sea ch algo i hm expec ing o ind he
duplica ed spec um ha is s ill in he da abase amongs he op
esul s.
● We measu ed Raman spec a om biomolecules pu e samples,
expec ing o ind he co ec componen iden i ica ion in he op
esul s.
● We used wo open da ase s o mix u es o saccha ides and amino
acids o assess componen iden i ica ion when using con olled
aqueous mix u es [39–42]. We compa ed he di ec iden i ica ion in
mix u es agains he iden i ica ion a e MCR applica ion. When
iden i ying di ec ly in mix u es, we e alua ed he op 6 and op 10
accu acy o each componen p esen in he mix u e, since he mix-
u es can con ain up o 6 biomolecules. We analyzed how he
biomolecule concen a ion a ec s he esul s by de ining 3 anges o
concen a ion – low, medium and high. In he saccha ides da ase ,
he concen a ions o 30
μ
l, 75
μ
l, and 120
μ
l we e espec i ely
ma ked as low, medium and high. In he amino acids da ase , we spli
he concen a ion ange in 3 sub anges o equal leng h.
●We eplica ed published a icles ha assigned a biological compo-
nen o measu ed spec a, whe e spec al unmixing algo i hms we e
applied o biomolecules mix u es spec a.
3. Resul s and discussion
3.1. Sea ch e alua ion using componen s wi h duplica ed spec a in he
da abase
In his expe imen , ou goal is o e alua e how Raman spec a o he
same componen acqui ed unde di e en acquisi ion condi ions a ec
he sea ch pe o mance. Table 3 shows he esul s o he sea ch o each
componen wi h a duplica ed spec um in he da abase, which a e
indica ed in Table 2. In he case o s ea ic acid, measu emen wi h he
532 nm appea s a i s posi ion in PM, bu a 10 h and 12 h o CS and
SLK espec i ely. Fig. 1 shows ha he Raman spec a o he op- anking
componen s in he s ea ic acid sea ch - a a y acid - a e ac ually e y
Table 3
Duplica ed spec um posi ion in sea ch esul s, o each duplica ed spec um in he da abase. P o ein da a was acqui ed wi h he same spec ome e , bu wi h di e en
lase wa eleng hs. The es , lipids and saccha ides, we e acqui ed wi h di e en spec ome e s and lase wa eleng hs.
Type Componen Componen posi ion
CS SLK PM (IUR)
488 nm 532 nm 488 nm 532 nm 488 nm 532 nm
P o eins ca bonic anhyd ase 1 1 1 1 3 1
collagen 1 1 1 1 1 20
cy och ome c6 3 3 3 1 3
elas ase 1 1 1 5 7 7
e i in 29 11 25 14 22 1
glu a hione ans e ase 1 1 1 1 2 5
hemoglobin 5 4 5 3 2 13
ho se adish pe oxidase 1 1 2 1 1 125
human albumin 1 8 1 2 2 1
lac albumin 1 1 1 1 7 1
lec in 3 2 1 1 25 3
majo p o einase 1 1 1 1 1 1
myoglobin 5 2 5 3 3 3
papain 1 1 1 1 1 1
pepsin 1 1 9 1 6 34
pepsinogen 5 22 2 5 1 1
supe oxide dismu ases 1 1 1 1 3 2
hauma in 2 1 8 1 20 11
iosephospha e isome ase 1 4 1 13 1 2
ypsin 2 2 2 2 5 5
ypsin inhibi o 1 1 1 1 1 2
ypsinogen 1 1 1 1 4 2
ubiqui in 1 1 1 1 2 2
xylanase 2 2 1 1 4 3
α
-chymo ypsinogen a 1 1 1 1 6 12
Type Componen 532 nm 785 nm 532 nm 785 nm 532 nm 785 nm
Lipids my is ic acid 2 3 1 3 3 1
oleic acid 2 3 2 3 9 1
palmi ic acid 8 4 9 11 1 1
s ea ic acid 10 1 12 10 1 1
Type Componen 785 nm 1064 nm 785 nm 1064 nm 785 nm 1064 nm
Lipids ilinolein 1 1 1 1 2 2
ilinolenin 1 1 1 1 4 4
iolein 3 3 3 2 2 7
Saccha ides amylopec in 2 4 1 3 2 2
amylose 3 4 1 4 2 4
d-(+)-xylose 1 1 1 1 1 1
d-(−)- uc ose 1 1 1 1 1 1
M. Te ´
an e al.
Chemome ics and In elligen Labo a o y Sys ems 264 (2025) 105476
4
simila and sha e many o he peaks. All o hem a e lipids, including 5
a y acids and 4 iglyce ides. T iglyce ides a e es e s de i ed om
glyce ol a ached o h ee a y acids, so hey exhibi Raman peaks
simila o single a y acids, explaining hen his op- anking esul . The
main di e ences be ween hem a e he posi ion o weake peaks o lack
he eo , howe e hey do sha e he s onges peaks. In his case, PM
pe o ms be e because all peaks a e equally conside ed.
We e alua ed how di e en lase wa eleng hs a ec he sea ch
pe o mance. We conside ed he 25 p o ein componen s wi h duplica ed
spec a in he da abase ha we e acqui ed by Rygula e al. [10], see
Table 2, using he same spec ome e ope a ing a bo h 488 nm and 532
nm.
Fig. 2 shows examples o aw spec a compa ed o he spec a in he
da abase, a e applying he s anda diza ion p ep ocessing de ined in
sec ion 2.1 o e i in, ho se adish pe oxidase and hemoglobin. The
esul s exempli y how Raman spec a, and he e o e he sea ch esul s,
can be hea ily a ec ed by he lase wa eleng h. These i on-con aining
molecules exhibi di e en Raman esonance modes depending on he
di e en lase wa eleng hs [10]. This in oduces di e ences in spec a
in addi ion o a di e en luo escence baselines. Mo eo e , componen
iden i ica ion elies hea ily on peak ex ac ion. In his example, i is
pe o med by he au ho s wi h c i e ia ha can be complex. The
di e en c i e ia in peak ex ac ion can con ibu e o di e ences in he
esul s. An example o his phenomenon can be seen in he case o
Fig. 1. Compa ison o que y spec um and he op 4 esul s o s ea ic acid (532 nm) sea ch using he me hods: PM (le ), SLK (cen e ) and CS ( igh ). In he esul s
plo , spec a a e anked by esul s posi ion, wi h he highes simila i y a he bo om.
Fig. 2. Compa ison o he aw spec um ( op ow) and inal da abase spec um a e baseline co ec ion (bo om ow), o he spec a o e i in (le ), ho se adish
pe oxidase (cen e ) and hemoglobin ( igh ) using 488 nm (in blue) and 532 nm (in ed) lase s. Spec a we e acqui ed by Rygula e al. [10]. The e ical dashed lines
ep esen he da abase peaks, o each lase , while he iangula ma ke s ep esen he ex ac ed peaks om he que y spec um. The spec um plo and peak posi ion
in o ma ion is plo ed o bo h acquisi ion wa eleng hs o easy compa ison. (Fo in e p e a ion o he e e ences o colou in his igu e legend, he eade is e e ed
o he Web e sion o his a icle.)
M. Te ´
an e al.
Chemome ics and In elligen Labo a o y Sys ems 264 (2025) 105476
5
ho se adish pe oxidase measu ed wi h he 532 nm lase . We obse ed
signi ican shi s compa ing he peak posi ions we ex ac ed om i s
spec um o he published peaks o he same componen measu ed wi h
he 488nn lase . These shi s educed he accu acy o he sea ch
ma ching, as shown in Fig. 2. This si ua ion can explain he esul s ob-
ained in he iden i ica ion o hese molecules.
Table 4 shows he componen iden i ica ion accu acy in he op 1, 5
and 10 and he ype iden i ica ion accu acy o each sea ch algo i hm.
The esul s show an accu acy la ge han 88 % a op 10 bu lowe han
59 % in he op 1. The di icul y o inding a componen in he op 1 can
be explained by he high simila i y o he mul iple spec a in he da a-
base, especially wi hin he same molecule ype. This simila i y makes i
ha de o ind a speci ic molecule in he op 1. Howe e , componen ype
iden i ica ion accu acy is la ge han 95 %. Fo his, CS and SLK ob-
ained he bes esul s.
3.2. Sea ch e alua ion using Raman spec a om samples wi h isola ed
pu e biomolecules
In his expe imen we aim o analyze he sea ch pe o mance in he
case o spec al da a measu ed in he labo a o y. We acqui ed spec a
om samples o 6 molecules ha exis in he da abase: collagen, albu-
min, cy och ome c, DNA, RNA and glycogen. Spec a can be seen in he
supplemen a y ma e ials, Figu e SF2, he samples we e gene a ed
ollowing he in o ma ion in he supplemen a y ma e ials, Table ST3.
The spec a was acqui ed using a Raman mic oscope (inVia Renishaw,
Apply Inno a ion, Glouces e shi e, UK) using a 532 nm diode lase and a
1800 lines/mm g a ing, wi h a spec al esolu ion o app oxima ely 1.8
cm
−1
. Spec a s anda diza ion p ep ocessing desc ibed in sec ion 2.1
was used on his da a. Mos o he componen s we e measu ed in a
phospha e-bu e ed saline solu ion (PBS). Wa e in PBS shows a la ge
peak a wa enumbe s highe han 1600 cm
−1
[43]. To analyze he e ec
his peak may ha e on he esul s, spec a we e analyzed in wo di e en
se s: one ha included wa enumbe s up o 1800 cm
−1
, and he same
da a conside ed only up o 1600 cm
−1
, he e o e excluding his wa e
peak.
Table 5 p esen s he sea ch esul s o he expe imen . Despi e di -
e ences be ween he spec a in he da abase and hose measu ed in he
labo a o y, he sea ch algo i hms can iden i y he unknown componen
a op 5 in mos cases. Poo es pe o mance is obse ed when using CS
conside ing ull da a ange. This can be a ibu ed o he o e lap o he
cha ac e is ic Amide I band o p o eins (1600-1690 cm
−1
) wi h he main
peak o PBS beyond 1600 cm
−1
[10]. SLK pe o ms be e han he es a
op 1 and ma ches he pe o mance o PM a op 5 ac oss all cases. Fo
DNA, glycogen and RNA he challenge o iden i y he componen in he
i s posi ion a ises om he p esence o highly simila componen s o
he same ype in he da abase, as shown in Fig. 3. Table 6 summa izes
he iden i ica ion accu acy a op 1, 5, 10, along wi h he k-NN ype
iden i ica ion accu acy, achie ing a ype iden i ica ion highe han 83 %
in all cases.
3.3. Sea ch e alua ion using biomolecules mix u es da ase s
In his expe imen , we aim o e alua e how he algo i hms can
iden i y componen s wi hin mix u es. To do his, we i s sea ched
di ec ly he mix u e spec a, hen we used MCR o ex ac he compo-
nen s o he mix u e, and sea ched o hem in he da abase. Fig. 4 shows
he esul s o he iden i ica ion o he mix u es di ec ly. The esul s show
ha he iden i ica ion accu acy is di ec ly ela ed o he concen a ion o
he biomolecule, wi h op 10 accu acy be ween 74.7 and 93.7 o mol-
ecules wi h high concen a ions and be ween 36.5 and 50.2 o mole-
cules wi h low concen a ions. This esul s in o e all op 6 and op 10
accu acies o less han 66.6 and 73.7 espec i ely. SLK shows be e
pe o mance han he es , which can be a ibu ed o he inclusion o
mix u es analysis in i s design and e alua ion [36].
Since he p e ious esul s depend hea ily on he concen a ion o he
molecules in he mix u e, we analyzed how he iden i ica ion pe o ms
a e unmixing he spec a using MCR, see Supplemen a y Da a S6. We
belie e ha his is a common p ocedu e and e lec s he po en ial eal
use o ou da abase. Table 7 shows he sea ch esul s o each ex ac ed
componen . The wo s esul s obse ed in he saccha ides da ase can be
explained by he molecules’ simila chemical composi ion (e.g mal ose
is composed o wo glucose molecules), and di e ence be ween he
da abase saccha ides, in solid s a e, and in he mix u e da ase , in
aqueous solu ion [12,40]. Table 8 p esen s he o e all accu acy esul s
o componen and ype iden i ica ion. Accu acy a op 6 and op 10 o
he MCR case shows highe alues compa ed wi h he sea ch di ec ly in
mix u es, showing he ad an ages o his me hod. Biomolecule ype
iden i ica ion shows wo se pe o mance compa ed o o he expe imen s,
as amino acids ype iden i ica ion is especially di icul o k-NN due o
la ge di e ences in spec a be ween a ious amino acids.
3.4. Sea ch e alua ion eplica ing published a icles assigna ion
This expe imen aims a eplica ing esul s published in he li e a-
u e. We iden i ied published wo ks whe e he au ho s assigned an un-
known spec um o a speci ic molecule. I is impo an o no e ha hese
wo ks also published he spec a hey iden i ied as a speci ic molecule.
This is no e y common p ac ice, and iden i ica ion o hese a icles has
been use ul as a alida ion o he me hod p esen ed he e.
Feng e al. [44] pe o med Raman spec al imaging on skin sec ions
and used MCR o ex ac he spec um o collagen, elas in, ke a in, i-
olein, ce amide, melanin, wa e and DNA. They published he nume ical
da a o he spec a. All o hese componen s a e p esen in ou wo da-
abases. The ex ac ed componen s we e iden i ied by measu ing syn-
he ic samples o he expec ed componen s and isually iden i ying he
ele an Raman bands in he di e en spec a.
Ma o e al. [45] decon olu ed 6 componen s om e ina cul u es,
bu only one was assigned o a speci ic biomolecule ha is p esen in
bo h o ou da abases, namely phospha idylcholine. The unknown
spec um was iden i ied by isual inspec ion, compa ing he Raman
bands wi h published spec a. The spec um o phospha idylcholine
used as unknown spec um was ex ac ed om he e e enced a icle
plo igu e, using he me hod desc ibed in sec ion S2.
Fig. 5 illus a es he di e ences be ween he unmixed unknown
spec a and he da abase spec a.
Table 9 de ails he esul s o each componen analyzed in his sec-
ion. Collagen and DNA a e ound in he op 1 posi ion o all me ics,
despi e no iceable di e ences in peak shapes and in ensi ies be ween he
unmixed unknown spec a and he da abase spec a. In con as , in he
case o ce amide, CS and SLK me hods ail o iden i y he unknown
componen wi hin he op 10. This may be due o peak shape and in-
ensi y di e ences and he p esence o nume ous simila spec a o he
same ype in he da abase. Howe e , he PM me hod p oduced be e
esul s by elying exclusi ely on peak wa enumbe posi ion in o ma-
ion. Phospha idylcholine esul s highligh he challenges o peak
in e e ence due o MCR unmixing e o s, whe e a p o ein peak a 1003
cm
−1
causes he me hods based on he ull spec a plo o misiden i y
Phospha idylcholine as p o ein ins ead o lipid.
Table 10 shows he accu acy o inding he componen assigned in
Table 4
Top 1, 5 and 10 componen inding accu acy and ype k-NN accu acy, in pe -
cen age, o each duplica ed spec um in he da abase, o each me hod.
Me hod Componen Type
Accu acy (%) k-NN Accu acy (%)
Top 1 Top 5 Top 10 k =5
CS 55.55 90.27 95.83 97.22
SLK 59.72 87.50 93.05 97.22
PM (IUR) 34.72 77.77 87.50 95.83
M. Te ´
an e al.
Chemome ics and In elligen Labo a o y Sys ems 264 (2025) 105476
6
he a icles in he op 1, 5 and 10 o he sea ch esul s. PM ob ained he
bes esul s in he op 5 and 10. SLK pe o ms sligh ly be e han PM a
op 1 bu wo se in he es o he cases. CS esul s a e wo se compa ed
wi h he p e ious me hods. Type iden i ica ion accu acy shows a alue
highe han 85 % in all cases, wi h a 100 % accu acy o PM. The e o e,
i is obse ed again ha ype iden i ica ion is success ul e en hough
componen iden i ica ion may no be as accu a e.
3.5. Discussion
We p esen a da abase o Raman spec a, as well as a me hod o
iden i y biomolecules based on hei Raman spec a. The me hod was
es ed in ou di e en ways: using he duplica e spec a in he da abase,
gene a ing new labo a o y measu emen s o pu e samples, using public
mix u es da ase s, and eplica ing published a icles assigna ion. These
es s allowed us o assess how he me hods p oposed handle a ia ions in
peak shape and in ensi y caused by di e en acquisi ion condi ions and
unmixing e o s.
Despi e a op 1 accu acy below 50 % in mos cases, he op 10 ac-
cu acy and ype iden i ica ion accu acy exceeded 90 % o a leas one
sea ch me hod ac oss all expe imen s. Type iden i ica ion alone is o en
su icien when s udying complex mix u es in cellula and issue
samples, as molecules o he same ype ypically sha e biological unc-
ions. Addi ionally, p o iding a lis o he 10 mos simila spec a can
signi ican ly educe he ime equi ed o iden i ica ion by na owing
he numbe o candida es o conside .
When iden i ying componen s in mix u es, ou algo i hms achie ed a
op 10 accu acy o 93 %, using SLK, when he a ge biomolecules we e
p esen a high concen a ions. Howe e , pe o mance declined no ably
a medium and low concen a ion. The esul s sugges ha he algo-
i hms a e mo e e ec i e when iden i ying unknown pu e biomolecules
o componen s ex ac ed ia MCR unmixing, a he han when applied
di ec ly o complex mix u es.
The CS me ic achie ed i s bes pe o mance when he que y spec-
um was ee o backg ound in e e ence and could be a ibu ed o a
single biomolecule. Howe e , i s pe o mance d opped signi ican ly
when peak shapes and in ensi ies we e subs an ially al e ed, as CS was
no speci ically designed o obus spec al compa ison. The SLK sco e
demons a ed obus ness agains backg ound in e e ences and a ia-
ions in peak a ios be ween wo measu emen s o he same componen .
None heless, i showed limi a ions when compa ing highly simila
spec a when only he leas p ominen peaks we e di e en , as occu s
wi h componen s o he same ype, such as lipids and p o eins. The PM
me hod pe o med be e han he es when analyzing he MCR-
Table 5
Sea ch esul s o spec a om isola ed pu e biomolecules using all he algo i hms.
Componen Type Lase Componen posi ion
Max Range =1600 cm
−1
Max Range =1800 cm
−1
CS PM (IUR) SLK CS PM (IUR) SLK
Albumin P o eins 532 2 2 1 1 1 1
Collagen P o eins 532 1 1 1 1 2 1
Cy och ome C P o eins 532 2 1 1 2 1 2
DNA Nucleic Acids 532 1 2 1 5 2 1
Glycogen Saccha ides 532 4 2 4 24 2 4
RNA Nucleic Acids 532 2 4 1 26 3 1
Fig. 3. Spec a in he op 3 esul s in DNA sea ch (le ) and op 4 esul s in glycogen sea ch ( igh ), o he DNA and glycogen measu emen s (bo om).
Table 6
Accu acy o all me hods inding he co ec componen a op 1, 5 and 10 and componen ype using k-NN, om isola ed pu e biomolecules spec a.
Me hod Max Range =1600 cm
−1
Max Range =1800 cm
−1
Componen Type (k-NN) Componen Type (k-NN)
Accu acy (%) Accu acy (%)
Top 1 Top 5 Top 10 k =5 Top 1 Top 5 Top 10 k =5
CS 33.33 100 100 100 33.33 66.67 66.67 50
PM (IUR) 33.33 100 100 83.33 33.33 100 100 83.33
SLK 83 100 100 100 66.67 100 100 100
M. Te ´
an e al.
Chemome ics and In elligen Labo a o y Sys ems 264 (2025) 105476
7
decomposed ou pu spec a om eal biological samples. Tha is because
PM can igno e he peak shape and in ensi y a io a ia ions caused by
he decomposi ion e o s, since i elies only on he peak’s posi ion.
Howe e , he peak ex ac ion s age is hea ily a ec ed by noise and peak
posi ion shi s, which needs o be mi iga ed wi h a e ined ole ance
alue and peak de ec ion me hod.
4. Conclusions
We cons uc ed an open Raman spec al lib a y o biomolecule
iden i ica ion om he li e a u e. We ha e made he da abase a ailable
on Gi Hub. We also p esen some algo i hms o spec al da a ex ac ion
ha we ha e made openly a ailable on Gi Hub. The digi al da abase
p esen ed can be b oadened by he esea ch communi y including
addi ional Raman spec a om biological molecules ha we e no
conside ed in he scope o his a icle. We also p esen di e en as
algo i hms ha iden i y spec a in he da abase ha may ma ch un-
known spec al da a. The use o hese algo i hms and da abases can help
signi ican ly na ow he ma ching candida es o he op 10 anked
spec a. The algo i hm can also p o ide ype iden i ica ion, signi ican ly
educing he ime equi ed o isual iden i ica ion and he need o
pu chase e e ence componen samples. The esul s highligh he
lib a y’s po en ial o ease Raman spec al analysis o biological mole-
cule iden i ica ion, enhancing wide biomedical applica ions.
CRediT au ho ship con ibu ion s a emen
Ma celo Te ´
an: W i ing – o iginal d a , Visualiza ion, Valida ion,
So wa e, Me hodology, In es iga ion, Fo mal analysis, Da a cu a ion,
Concep ualiza ion. Jos´
e Ja ie Ruiz: W i ing – e iew & edi ing,
Valida ion, Me hodology, In es iga ion, Fo mal analysis, Concep uali-
za ion. Pablo Loza-Al a ez: W i ing – e iew & edi ing, Supe ision,
Resou ces, P ojec adminis a ion, Funding acquisi ion, Concep ualiza-
ion. Da id Masip: W i ing – e iew & edi ing, Supe ision, Resou ces,
P ojec adminis a ion, Funding acquisi ion, Concep ualiza ion. Da id
Me ino: W i ing – e iew & edi ing, Supe ision, Resou ces, P ojec
adminis a ion, Funding acquisi ion, Concep ualiza ion.
Decla a ion o compe ing in e es
The au ho s decla e ha hey ha e no known compe ing inancial
in e es s o pe sonal ela ionships ha could ha e appea ed o in luence
Fig. 4. (a) A e age op 6 and op 10 iden i ica ion accu acy by concen a ion in saccha ides and amino acids mix u es da ase s. (b) O e all a e age op 6 and op 10
iden i ica ion accu acy by simila i y me hod.
Table 7
Sea ch esul s o spec a om saccha ides and amino acids mix u es da ase s
a e MCR ex ac ion using all he algo i hms.
Da ase Componen Componen posi ion
CS PM (IUR) SLK
Amino Acids Alanine 1 1 1
Aspa agine 2 1 7
Aspa ic acid 1 1 1
Glucosamine 1 15 1
Glu amic acid 1 27 1
His idine 1 1 1
Saccha ides F uc ose 2 1 1
Glucose 2 5 2
Mal ose 4 78 5
Suc ose 9 8 25
Table 8
Accu acy o all me hods inding he co ec componen a op 1, 6 and 10 and
componen ype using k-NN, om saccha ides and amino acids mix u es da a-
se s a e MCR ex ac ion.
Me hod Componen Type
Accu acy (%) k-NN Accu acy (%)
Top 1 Top 6 Top 10 k =5
CS 50.0 90.0 100.0 70.0
SLK 60.0 80.0 90.0 70.0
PM (IUR) 50.0 60.0 70.0 50.0
M. Te ´
an e al.
Chemome ics and In elligen Labo a o y Sys ems 264 (2025) 105476
8
he wo k epo ed in his pape .
Acknowledgemen s
The au ho s acknowledge unding om Fundaci´
o CELLEX; Minis-
e io de Economía y Compe i i idad - Se e o Ochoa p og amme o
Cen es o Excellence in R&D (CEX2019-000910-S); CERCA p og amme
(999619436); Lase lab-Eu ope (871124); Minis e io de Ciencia e
Inno aci´
on PID2021-122807OB-C31 and PID2022-138721NBI00 p o-
jec s unded by MCIN/AEI/10.13039/501100011033/FEDER, UE;
CARET p ojec . The SLN acili y co esponds o a “G up econegu ” 2021
SGR 01456 Depa amen de Rece ca i Uni e si a s de la Gene ali a de
Ca alunya.
Appendix A. Supplemen a y da a
Supplemen a y da a o his a icle can be ound online a h ps://doi.
o g/10.1016/j.chemolab.2025.105476.
Da a a ailabili y
Da a a ailable on Gi Hub: h ps://gi hub.
com/m e anm/ amanbiolib
Re e ences
[1] R.R. Jones, D.C. Hoope , L. Zhang, D. Wol e son, V.K. Vale , Raman echniques:
undamen als and on ie s, Nanoscale Res. Le . 14 (2019) 231, h ps://doi.o g/
10.1186/s11671-019-3039-2.
[2] H. Noo halapa i, K. Iwasaki, T. Yamamo o, Biological and medical applica ions o
mul i a ia e cu e esolu ion assis ed Raman spec oscopy, Anal. Sci. 33 (2017)
15–22, h ps://doi.o g/10.2116/analsci.33.15.
[3] S. Cui, S. Zhang, S. Yue, Raman spec oscopy and imaging o cance diagnosis,
J. Heal hc. Eng. 2018 (2018) e8619342, h ps://doi.o g/10.1155/2018/8619342.
[4] I.P. San os, E.M. Ba oso, T.C.B. Schu , P.J. Caspe s, C.G.F. an Lanscho , D.-
H. Choi, M.F. an de Kamp, R.W.H. Smi s, R. an Doo n, R.M. Ve dijk, V.N. Heg ,
J.H. on de Thüsen, C.H.M. an Deu zen, L.B. Koppe , G.J.L.H. an Leende s, P.
C. Ewing-G aham, H.C. an Doo n, C.M.F. Di en, M.B. Buss a, J. Ha dillo,
A. Sewnaik, I. en Ho e, H. Mas , D.A. Monse ez, C. Meeuwis, T. Nijs en, E.
B. Wol ius, R.J.B. de Jong, G.J. Puppels, S. Koljeno i´
c, Raman spec oscopy o
cance de ec ion and cance su ge y guidance: ansla ion o he clinics, Analys
142 (2017) 3025–3047, h ps://doi.o g/10.1039/C7AN00957G.
[5] N. Kuha , S. Sil, T. Ve ma, S. Umapa hy, Challenges in applica ion o Raman
spec oscopy o biology and ma e ials, RSC Ad . 8 (2018) 25888–25908, h ps://
doi.o g/10.1039/C8RA04491K.
[6] R. eja Vulchi, V. Mo guno , R. Junju i, T. Bockli z, A i ac s and anomalies in
aman spec oscopy: a e iew on o igins and co ec ion p ocedu es, Molecules 29
(2024) 4748, h ps://doi.o g/10.3390/molecules29194748.
Fig. 5. Compa ison be ween he componen spec um ex ac ed in he e alua ed a icles (in blue) and he spec um o he same componen in he da abase (in ed).
(Fo in e p e a ion o he e e ences o colou in his igu e legend, he eade is e e ed o he Web e sion o his a icle.)
Table 9
Componen ma ching coincidence posi ion o spec a published in selec ed
a icles.
Re e ence Componen Type Componen posi ion
CS SLK PM
(IUR)
Feng e al. [44] Ce amide Lipids 14 15 5
T iolein Lipids 1 1 1
DNA Nucleic
Acids
1 1 1
Collagen P o eins 1 1 1
Elas in P o eins 24 1 3
Ke a in P o eins 15 2 2
Ma o e al.
[45]
Phospha idylcholine Lipids 27 13 2
Table 10
Accu acy o all me hods inding he co ec componen a op 1, 5 and 10 and
componen ype using k-NN, om spec a published in selec ed a icles.
Me hod Componen Type
Accu acy (%) k-NN Accu acy (%)
Top 1 Top 5 Top 10 k =5
CS 42.86 42.86 42.86 85.71
SLK 57.14 71.43 71.43 85.71
PM (IUR) 42.86 100.00 100.00 100.00
M. Te ´
an e al.
Chemome ics and In elligen Labo a o y Sys ems 264 (2025) 105476
9