1
Vol.:(0123456789)
Scien i ic Repo s | (2023) 13:7815 | h ps://doi.o g/10.1038/s41598-023-34981-4
www.na u e.com/scien i ic epo s
F om language models
o la ge‑scale ood and biomedical
knowledge g aphs
Gjo gjina Cenikj
1,2*, Lidija S ojnik
1, Ris o Angelski
3, Ni es Og inc
1,
Ba ba a Ko oušić Seljak
1 & Tome E imo
1
Knowledge abou he in e ac ions be ween die a y and biomedical ac o s is sca e ed h oughou
uncoun able esea ch a icles in an uns uc u ed o m (e.g., ex , images, e c.) and equi es
au oma ic s uc u ing so ha i can be p o ided o medical p o essionals in a sui able o ma . Va ious
biomedical knowledge g aphs exis , howe e , hey equi e u he ex ension wi h ela ions be ween
ood and biomedical en i ies. In his s udy, we e alua e he pe o mance o h ee s a e‑o ‑ he‑a
ela ion‑mining pipelines (FooDis, FoodChem and ChemDis) which ex ac ela ions be ween ood,
chemical and disease en i ies om ex ual da a. We pe o m wo case s udies, whe e ela ions we e
au oma ically ex ac ed by he pipelines and alida ed by domain expe s. The esul s show ha
he pipelines can ex ac ela ions wi h an a e age p ecision a ound 70%, making new disco e ies
a ailable o domain expe s wi h educed human e o , since he domain expe s should only
e alua e he esul s, ins ead o inding, and eading all new scien i ic pape s.
Noncommunicable ch onic diseases (NCDs) accoun o mo e han 70% o dea hs wo ldwide. Ca dio ascula
diseases accoun o mos NCD dea hs (17.9 M people annually), ollowed by cance s (9.3 M), espi a o y diseases
(4.1 M), and diabe es melli us (1.5 M)1,2. As he leading cause o dea h globally, mos o he dea hs ha happen
om ca dio ascula diseases (CVDs) a e due o hea a acks and s okes3. A lo o scien i ic e idence indica es
ha be ween he mos impo an isk ac o s o hea disease and s oke a e unheal hy die , alcohol and obacco
consump ion, and physical ac i i y. Among all he ac o s ha con ibu e o he de elopmen and p og ession
o CVDs, die is one o he majo ones4,5. I has been shown ha ea ing mo e ui and ege ables and dec easing
he sal in die educe he isk o CVDs.
Fu he , al hough he e is a lo o knowledge abou die a y e ec s on CVDs and b oadly on NCDs, he e a e
s ill many un esol ed esea ch ques ions. Such ques ions a e no easy o be answe ed because ood and nu i ion
in ela ion o diseases a e desc ibed by a ious concep s and en i ies ha in e ac in a ious ways6. Fo ins ance,
he e a e many oods (desc ibed by ood en i ies) made up o componen s (desc ibed by chemical en i ies)7
ha may igh NCDs (desc ibed by disease en i ies) while o he s can be ha m ul8. These impac s a e dependen
on he combina ion o oods and hei chemicals, he s a e o he ood (e.g., aw/cooked, esh/molded, e c.),
he cooking me hod (e.g., s eamed, g illed, baked, e c.), he heal h s a us o he pe son consuming ood (e.g.,
heal hy, ill, alle gic) and o he s9. As he e a e many combina ions o hese ac o s, collec ing and s uc u ing he
ela ions be ween all he concep s and en i ies desc ibing he impac s o ood on NCDs is a e y complex wo k
exceeding human capabili ies. And aking in o accoun he ac ha esea ch in his ield is s ill p og essing,
he ela ed knowledge e ol es on a daily basis, making i challenging o ollow. Such knowledge u he opens
possibili ies o use A i icial In elligence (AI) me hods o aid in he ea ly de ec ion (p edic ion) o NCDs as well
as hei p og ession. Howe e , be o e de eloping p edic i e AI me hods, uns uc u ed ( ex ual) da a a ailable
in coho s, elec onic heal h eco ds (EHRs), egis ies, and scien i ic and g ey li e a u e needs o be s uc u ed
and no malized/linked o domain seman ic esou ces and u he included in knowledge bases (KBs) which can
be u ilized o p edic i e modeling and in eg a ed in o heal h sys ems which will make he in o ma ion easily
accessible o medical p o essionals. To his end, use in e aces play a c i ical ole in ensu ing ha heal hca e
p o essionals can e ec i ely u ilize AI sys ems o p o ide high-quali y ca e o hei pa ien s10.
A Knowledge G aph (KG) is a ype o KB, whe e knowledge is s o ed in he o m o en i ies cha ac e ized
by some a ibu es, and ela ions connec ing he en i ies. Con en ional me hods o KG cons uc ion can be
b oadly ca ego ized in o manual, and au oma ic, o semi-au oma ic me hods. The bene i s o manual c ea ion
and cu a ion app oaches a e hei high p ecision and eliabili y11, howe e , due o he high amoun o e o
OPEN
1Jože S e an Ins i u e, Ljubljana 1000, Slo enia. 2Jože S e an In e na ional Pos g adua e School, Ljubljana 1000,
Slo enia. 3Clinic Doc o 24-hou s, Ljubljana 1000, Slo enia. *email: gjo g[email p o ec ed]
2
Vol:.(1234567890)
Scien i ic Repo s | (2023) 13:7815 | h ps://doi.o g/10.1038/s41598-023-34981-4
www.na u e.com/scien i ic epo s/
equi ed by domain expe s, hey also ha e lowe ecall a es, poo scalabili y and ime e iciency12. Au oma ic
and semi-au oma ic KG cons uc ion is enabled by ex -mining me hods, which a e able o ex ac en i ies and
ela ions which can be s uc u ed as a KG.
In he biomedical domain, au oma ic and semi-au oma ic s uc u ing o ex ual da a in he o m o KGs is an
ac i e esea ch a ea, which ypically in ol es he use o In o ma ion Ex ac ion (IE) pipelines consis ing o mul-
iple componen s. These componen s include Named En i y Recogni ion (NER) me hods, which ex ac speci ic
ypes o en i ies om aw ex , Named En i y Linking (NEL) me hods, whose goal is o map en i y men ions o
en ies in a gi en KB, and Rela ion Ex ac ion (RE) me hods, which aim o au oma ically de ec ela ions be ween
en i ies13. O e he pas 20 yea s, signi ican p og ess has been made in c ea ing mul iple IE pipelines o he
biomedical domain. These pipelines p ima ily concen a e on iden i ying geno ype and pheno ype en i ies, as
well as heal h- ela ed en i ies such as diseases, ea men s, d ugs, and o he s. To allow hei de elopmen , se e al
collabo a i e wo kshops, as pa o con e ence e en s like BioNLP14, BioC ea i e15, i2b216, and DDIEx ac ion17,
ha e been a anged o p o ide seman ic esou ces (e.g., anno a ed co po a, on ologies) ha will u he allow
he de eloping o biomedical IE pipelines. The e o s done in he biomedical domain a e ocused en i ely on
biomedical concep s and no in es iga ing ela ions wi h ood concep s. On he o he side, mos o he e o s
done in IE in he ood domain a e ocused on ela ions ha do no in ol e heal h/biomedical concep s, and e en
mo e, a e de eloped using s a ic da a ha is al eady p esen ed in some o he esou ces (e.g., da ase s, con olled
ocabula ies, on ologies), so hey need o be upda ed when new da a is a ailable in hese esou ces. In addi-
ion, only a ew s udies ha e concen a ed on adi ional ex mining echniques ha employ sen imen analysis
h ough manual ea u e ex ac ion18–20. Despi e his, he ood and nu i ion domain is low- esou ced in seman ic
da a esou ces compa ed o he biomedical domain. The e is a lack o anno a ed ood-disease ela ion co po a
ha se e as a benchma k and help de elop IE pipelines. E en mo e, ood seman ic esou ces such as FoodOn21,
FoodEx222, a e s ill unde de elopmen (i.e., equen ly upda ing hem wi h new da a) o suppo IE ac i i ies.
To b idge he gap be ween he ood and biomedical domains, we in oduce an app oach ha uses language
models o ex ac he ela ions ha exis be ween ood, chemical, and disease en i ies and u he no malize hem o
allow he c ea ion o a KG. In ou case, we e alua e he app oach o ace he new knowledge abou CVDs and milk
p oduc s. The bene i o ou app oach is ha we a e no using he in o ma ion ha al eady exis s in some s a ic
esou ces (e.g., da abases), bu y o ca ch all ela ions om ex ual da a ela ed o CVDs and milk p oduc s (milk
was selec ed as a case s udy since i is ich in nu ien s, a esou ce o p o eins, i amins, mine als, and a y acids,
which ha e an impo an impac on human me abolism and heal h) a ailable in scien i ic abs ac s, whe e new
indings a e p esen ed. This makes he me hodology easy o apply on new co po a o scien i ic abs ac s, whe e
he esul s o he pipelines can poin ou a eas whe e he KG should be upda ed wi h new en i ies o ela ions.
Rela ed wo k
A ecen su ey on knowledge-based biomedical da a science23 highligh s he applica ion o KGs in he bio-
medical and clinical domain in imp o ing he e ie al o in o ma ion om la ge sou ces o clinical da a o
li e a u e24–26, p o iding e idence o suppo phenomena obse ed in da a27,28, using link p edic ion o com-
ple e missing in o ma ion and hypo hesize p e iously unknown ela ionships29, and imp o ing pa ien da a
ep esen a ion30–32. In he biomedical domain, IE pipelines ha e been de eloped o he ex ac ion o d ug-disease
ela ions33,34 and disease-symp om ela ions35 om biomedical li e a u e. A Co ona i us KG has been cons uc ed
by me ging he Analy ical G aph, wi h a collec ion o published scien i ic a icles36. A PubMed KG has been
cons uc ed by ex ac ing biomedical en i ies om PubMed abs ac s and en iching i wi h unding, au ho , and
a ilia ion da a37. A ecen wo k12 p oposes he cons uc ion o domain-speci ic KGs wi h minimal supe ision,
which is able o de i e open-ended ela ions om uns uc u ed biomedical a icles wi hou he need o ex ensi e
labeling. While his s udy is la gely ocused on da a in eg a ion, and only uses NER o ex ac he biomedical
en i ies om he li e a u e, ou s udy goes a s ep u he in he RE ask, o ex ac he ela ions be ween he en i-
ies based on he ex in he scien i ic abs ac s, so ha new ela ions can be added be ween en i ies in exis ing
esou ces. Apa om using biomedical scien i ic pape s as a sou ce o in o ma ion, EHRs ha e also been used
o ex ac ing disease-symp om ela ions38 and cons uc ing a medical KG wi h nine biomedical en i y ypes39.
In he ood domain, FoodKG has been ecen ly de eloped o ep esen ing ood ecipe da a including hei
ing edien s and nu i ional con en 40 by en iching a la ge amoun o ecipe da a om Recipe1M da ase wi h he
nu i ional in o ma ion a ailable om USDA’s Na ional Nu ien Da abase o S anda d Re e ence ep esen ed
wi h FoodOn21 seman ic me a-da a. Addi ionally, FoodKG41 was de eloped by using he exis ing ex and g aph
embedding echniques applied o a con olled ocabula y called AGROVOC, o model he ela ions ha exis
in a ple ho a o da ase s ela ed o ood, ene gy and wa e .
Resul s
To ace he knowledge abou ood, chemical, and disease in e ac ions, we ha e shown he c ea ion o a KG cen-
e ed a ound he impac o di e en oods and chemicals on CVDs, and he o he a ge ing he composi ion o
he selec ed ood i em “milk”, as well as i s bene icial and de imen al e ec s on di e en NCDs. Fo his pu pose,
h ee NLP pipelines, called FooDis, FoodChem, and ChemDis, we e combined o ex ac “ ood-disease”, “ ood-
chemical”, and “chemical-disease” ela ions om ex ual da a. Seman ically, we dis inguish wo ela ions be ween
ood-disease and chemical-disease en i y pai s, which a e “ ea ” and “cause”. In he case o ood-chemical en i y
pai s, we ex ac ed only one ela ion which is “con ains”. All h ee pipelines we e execu ed wice, on wo di e en
co po a, one ha was collec ed o CVDs and one collec ed o milk p oduc s. In bo h use cases, he sea ched
keywo ds we e selec ed by domain expe s. In he CVDs case, a mo e gene al keywo d was selec ed “hea disease
ood”, since we would like o e ie e b oade aspec s be ween di e en ca dio ascula e en s and ood p oduc s.
3
Vol.:(0123456789)
Scien i ic Repo s | (2023) 13:7815 | h ps://doi.o g/10.1038/s41598-023-34981-4
www.na u e.com/scien i ic epo s/
This ends up wi h 9984 abs ac s. In he milk use case, h ee keywo ds we e selec ed by he domain expe s i.e.,
“milk composi ion”, “milk disease”, and “milk heal h bene i s”.
Table1a p esen s he numbe o abs ac s ha we e e ie ed and used in he analysis o bo h use cases,
oge he wi h he keywo ds used o e ie e hem, while Table1b p esen s he numbe o ela ions ha we e
ex ac ed o bo h use cases.
Figu e1a ea u es he KG cons uc ed by unning he h ee pipelines o he wo applica ion use cases. The
same nodes a e g ouped oge he by no malizing he ex ac ed ood, chemical, and disease en i ies.
To go in o mo e de ail how he KG is cons uc ed, in Fig.1b we p esen an example using he ela ions
ex ac ed o he “hea ailu e” disease en i y. The g een nodes, “mea p oduc s”, “sal ” and “die a y ish oil”
ep esen he ood en i ies o which he FooDis pipeline ex ac ed a ela ion wi h he “hea ailu e” disease
en i y, meaning ha hey ha e some e ec on i s de elopmen o ea men . In pa icula , he ed edges connec -
ing he “hea ailu e” disease en i y and he ood en i ies “mea p oduc s”, and “sal ” indica e ha he pipeline
iden i ied a “cause” ela ion, i.e. mea p oduc s, and sal can con ibu e o he occu ence o hea ailu e. On he
o he hand, he g een edge be ween he “die a y ish oil” en i y and he “hea ailu e” disease en i y indica es
a “ ea ” ela ion, i.e. he pipeline iden i ied ha die a y ish oil has a bene icial e ec o hea ailu e. Simila ly,
he ChemDis pipeline iden i ied ha he chemical en i ies “DHA”, “es e ”, “acid, n-3 a y”, “an idiabe ics cana-
gli lozin”, “omega-3 a y acid” and “calcium” can be used o ea ing “hea ailu e”, while he chemical en i ies
“(-)-cocaine” and “ i amin E” can con ibu e o he de elopmen o “hea ailu e”. Table2 p esen s he suppo ing
sen ences om scien i ic abs ac s om which he ela ions we e ex ac ed and u he used o cons uc ing he
g aph p esen ed in Fig.1b. Nex , such g aphs a e connec ed based on he same en i ies o link he in o ma ion
om di e en abs ac s. Fu he , o alida e he ex ac ed in o ma ion, domain expe s we e in ol ed o check
he ex ac ed ela ions o bo h use cases.
Use case: ca dio ascula diseases. Fo he CVDs use case, a highly-skilled domain expe (an MD wi h
mo e han 40 yea s o wo king expe ience in ca diology) e alua ed he ex ac ions om he h ee pipelines.
The ela ions ha we e e alua ed a e ex ac ed a e he “Final ela ion de e mina ion” s ep om he FooDis,
FoodChem and ChemDis pipelines. All h ee pipelines u ilized he e ollow he same wo k low. Each ex ac ed
ela ion is de e mined by all sen ences whe e in o ma ion abou i is p esen ed. We called hem “suppo ing
sen ences”. The sen ences can be om he same o di e en abs ac s, since in o ma ion abou he same ela ion
can be in es iga ed in di e en pape s.
Domain expe e alua ion. Each pipeline p o ides he esul as a 6- uple i.e., (name o he i s en i y, named o
he second en i y, synonyms o he i s en i y, synonyms o he second en i y, ela ion, suppo ing sen ences),
which is u he e alua ed by he domain expe . The domain expe was asked o assign a bina y indica o o he
u h ulness o he ela ion. The pipelines we e hen e alua ed by aking he mean o he co ec ness indica o s
assigned by he anno a o o each ela ion and pipeline, which we e e o as he p ecision in he emainde o
his sec ion. In pa icula , i a pipeline ex ac ed h ee ela ions, and he expe ma ked wo o hese as co ec
(bina y indica o s 1,0,1), he epo ed p ecision would be 0.66.
Figu e2a p esen s he numbe o ela ions ex ac ed by each o he pipelines o he CVDs s udy, and he
numbe o ela ions ha he domain expe e alua ed. We need o poin ou he e ha all ex ac ed ela ions we e
p o ided o he domain expe , howe e , he e alua ion has been pe o med only on hose ela ions o which he
domain expe has expe knowledge. Because o his, he human e alua ion p ocess co e s 44% o he “con ains”
Table 1. Numbe o p ocessed pape abs ac s and numbe o ex ac ed ela ions o each case s udy.
Case s udy Keywo d Numbe o abs ac s
Numbe o pape abs ac s p ocessed o each sea ch keywo d
Hea disease Hea disease ood 9984
Milk
Milk composi ion 13,500
Milk disease 17,268
Milk heal h bene i s 2343
Case s udy IE pipeline Rela ion Numbe o ela ions ex ac ed
Numbe o ela ions ex ac ed by each pipeline in each case s udy
Hea disease
FooDis Cause 516
T ea 699
ChemDis Cause 635
T ea 1079
FoodChem Con ains 981
Milk
FooDis Cause 1184
T ea 789
ChemDis Cause 670
T ea 597
FoodChem Con ains 1875
4
Vol:.(1234567890)
Scien i ic Repo s | (2023) 13:7815 | h ps://doi.o g/10.1038/s41598-023-34981-4
www.na u e.com/scien i ic epo s/
ela ions ex ac ed by he FoodChem pipeline, 33% o he “ ea ” ela ions ex ac ed by he FooDis pipeline, 26%
o he “cause” ela ions ex ac ed by he FooDis pipeline, 26% o he “cause” ela ions ex ac ed by he ChemDis
pipeline, and 23% o he “ ea ” ela ions ex ac ed by he ChemDis pipeline.
(a) Theen i e knowledge g aph con aining allex ac ed ela ions
(b) Egone wo k o he"hea ailu e" node
Figu e1. Knowledge g aph cons uc ed using he FooDis, FoodChem and ChemDis pipelines. The nodes
in g een ep esen he ood en i ies, he nodes in blue ep esen he chemical en i ies, and he nodes in ed
ep esen he disease en i ies. The ed, g een, and blue edges ep esen he “cause”, “ ea ” and “con ains”
ela ions, espec i ely. The igu es ha e been gene a ed using he py is py hon lib a y42, e sion 0.1.8.2.
5
Vol.:(0123456789)
Scien i ic Repo s | (2023) 13:7815 | h ps://doi.o g/10.1038/s41598-023-34981-4
www.na u e.com/scien i ic epo s/
The mean p ecision o each o he pipelines (FooDis, ChemDis, and FoodChem) in he CVDs use case is
p esen ed in Fig.2b. F om i , he FooDis pipeline achie es he highes p ecision o 0.79 o he “cause” and 0.78
o he “ ea ” ela ion. The lowes p ecision o 0.68 is achie ed by he ChemDis pipeline o he ex ac ion o
he “cause” ela ion.
Since he h ee pipelines ex ac a ela ion based on suppo ing sen ences, in he Supplemen a y Ma e ials,
we ha e p esen ed he dis ibu ion o he numbe o ela ions e sus hei numbe o suppo ing sen ences.
All o he pipelines ex ac mo e han 74% o he ela ions based on a single suppo ing sen ence. The Chem-
Dis and FoodChem pipelines can ind a la ge numbe o suppo ing sen ences o some ela ions compa ed
o he FooDis pipeline. In pa icula , he ChemDis pipeline can ind up o i e suppo ing sen ences o iden i y
“cause” ela ions and up o 14 suppo ing sen ences o iden i y “ ea ” ela ions, while he FooDis pipeline uses
up o h ee, and ou suppo ing sen ences o he “cause” and “ ea ” ela ions, espec i ely.
Nex , o see how he mean p ecision is a ec ed by he numbe o suppo ing sen ences, we analyze o each
seman ic ela ion sepa a ely. The esul s a e p esen ed in Supplemen a y Ma e ials. F om he conduc ed analysis,
we can conclude ha he mean p ecision is p opo ional o he numbe o suppo ing sen ences. Almos o
all ela ions, a p ecision o 1.00 is eached when he numbe o suppo ing ela ions is su icien ly high. This
indica es ha when he numbe o suppo ing sen ences o a ela ion inc eases, he e is an ag eemen be ween
he domain expe alida ion and he esul p o ided by ou pipelines, wi h some excep ions lis ed in he Sup-
plemen a y Ma e ials.
E o analysis. Nex , we analyze he ypes o alse disco e ies p oduced by FooDis, FoodChem, and ChemDis
pipelines.
Figu e3 ea u es he ela ions wi h he highes numbe o suppo ing sen ences o ou chemical en i ies:
“ca bohyd a es”, “ a y acid”, “sodium” and “ i amin d”. He e he esul s o he selec ed chemical en i y om
he wo pipelines ha deal wi h chemical en i ies (i.e., ChemDis and FoodChem) a e p esen ed. The g een ba s
e e o he numbe o sen ences in which he ela ion was co ec ly iden i ied, while he pu ple plo s e e o
he numbe o alse posi i e sen ences o ha ela ion, i.e. sen ences whe e he ela ion was iden i ied, howe e ,
i was ma ked as inco ec by he expe s.
Fo he “ca bohyd a es” en i y, he ChemDis pipeline p oduced he alse posi i e ela ion “ca bohyd a es-
ea -ca diomyopa hy” when he suppo ing sen ences sugges ed ha a low-ca bohyd a e die is ecommended
o ea ing ca diomyopa hy. In his case, he pipeline ails o iden i y ha a educ ion o he chemical en i y is
equi ed o ea he disease. In addi ion, he FoodChem pipeline p oduces a alse disco e ed ela ion “bulk-
con ains-ca bohyd a es”, when he suppo ing sen ence was saying ha hese wo en i ies a e con ained in ano he
en i y, “d y beans”. Fo he “ a y acid” chemical en i y, he ChemDis pipeline p oduced he alse posi i e ela-
ion “ a y acid-cause-dys unc ion endo helial”, when he suppo ing sen ence was saying ha inc eased a y
acid le els and endo helial dys unc ion we e con ibu ing o he de elopmen o ano he disease, “sepsis”. The
FoodChem pipeline p oduced he alse en i ies, “wine-con ains- a y acid” and “acid a y ans-con ains- a y
acid”. In he i s case, he wo en i ies we e co-occu ing in he suppo ing sen ence wi hou any ela ion, while
in he second one, he sen ence was saying ha ans a y acids a e a subca ego y o a y acids. In he case o
Table 2. Suppo ing sen ences o he ela ions o en i y “hea ailu e” o di e en ood and chemical en i ies.
Food/chemical name Rela ion Suppo ing sen ences
(−)-Cocaine Cause 1) Addi ionally, cocaine use has been associa ed wi h le en icula hype ophy, myoca di is, and dila ed ca diomyopa hy, which can lead
o hea ailu e i d ug use is con inued
Vi amin E Cause 1) Ye , high doses o supplemen al i amin E ha e been associa ed wi h an ele a ed isk o hea ailu e and all-cause mo ali y. 2) Vi amin
E supplemen a ion migh be associa ed wi h an inc ease in o al mo ali y, hea ailu e, and hemo hagic s oke
Sal Cause 1) In pa ien s who al eady ha e hea ailu e, a high sal in ake agg a a es he e en ion o sal and wa e , he eby exace ba ing hea ailu e
symp oms and p og ession o he disease
Mea p oduc s Cause 1) The mal p ocessing o mea p oduc s gene a es ca dio oxic compounds capable o inducing hea ailu e in bo h humans and labo a o y
animals
Acid, n-3 a y T ea
1) E idence om epidemiological, clinical and expe imen al s udies indica es a bene icial ole o he omega-3 polyunsa u a ed a y acids
(omega-3 PUFA) ound in ish oils in he p e en ion and managemen o hea ailu e. 2) This e iew summa ise he da a ela ed o use
o omega-3 PUFA supplemen a ion as a po en ial ea men o hea ailu e and discussed possible mechanism o ac ion. 3) The 2017
Ame ican Hea Associa ion science ad iso y on omega-3 a y acid supplemen s sugges ed ha i is easonable o use omega-3 a y acids
o seconda y p e en ion in people wi h co ona y hea disease and hea ailu e
An idiabe ics canagli lozin T ea 1) I has been concluded ha canagli lozin, dapagli lozin, empagli lozin, o e ugli lozin can be ecommended o p e en ing hospi aliza-
ion associa ed wi h hea ailu e in pa ien s wi h ype 2 diabe es and es ablished ca dio ascula disease o hose a high ca dio ascula isk
DHA T ea 1) In ake o ish oil con aining docosahexaenoic acid (DHA) and eicosapen aenoic acid (EPA) p e en s hea ailu e; howe e , he mecha-
nisms a e unclea
Es e T ea 1) Because L-ca ni ine and i s es e s help educe oxida i e s ess, hey ha e been p oposed as a ea men o many condi ions, i.e. hea
ailu e, angina and weigh loss
Omega-3 a y acid T ea 1) The 2017 Ame ican Hea Associa ion science ad iso y on omega-3 a y acid supplemen s sugges ed ha i is easonable o use omega-3
a y acids o seconda y p e en ion in people wi h co ona y hea disease and hea ailu e
Calcium T ea 1) He e we e iew he key obse a ions, con o e sies, and disco e ies ha ha e led o he de elopmen o no el compounds a ge ing he
RyR2/calcium elease channel o ea ing hea ailu e and o p e en ing le hal a hy hmias
Die a y ish oil T ea 1) In ake o ish oil con aining docosahexaenoic acid (DHA) and eicosapen aenoic acid (EPA) p e en s hea ailu e; howe e , he mecha-
nisms a e unclea
6
Vol:.(1234567890)
Scien i ic Repo s | (2023) 13:7815 | h ps://doi.o g/10.1038/s41598-023-34981-4
www.na u e.com/scien i ic epo s/
he “sodium” chemical en i y, mos o he sen ences ex ac ed by he ChemDis pipeline exp ess he co ec ela-
ion, howe e , sodium is inco ec ly ex ac ed as a pa ial ma ch o he en i y “Sodium-glucose co- anspo e 2
inhibi o s (SGLT2is)”. In he case o “ i amin d”, all o he alse posi i e “cause” ela ions ex ac ed by he Chem-
Dis pipeline a e due o he pipeline no ecognizing ha he de iciency o he i amin was causing he diseases.
Figu e4 ea u es he op 10 ela ions wi h a maximal numbe o suppo ing sen ences o h ee disease en i ies.
He e, we p esen he esul s om pipelines ha a e dealing wi h disease en i ies (i.e., FooDis and ChemDis). Fo
635
1079
291
444
759
635
171
254
76
147
339
171
ChemDis
cause
ChemDis
ea
FooDis
cause
FooDis
ea
Fo
odChem
con ains
0200 400600 8001000
legend
ex ac ed ela ions
e alua ed ela ion
s
(a) Numbe o ex ac ed ande alua ed ela ions pe eachpipeline o hehea diseases udy
0.68
0.75
0.79
0.78
0.74
0.68
ChemDis
cause
ChemDis
ea
FooDis
cause
FooDis
ea
FoodChem
con ains
0.6
0.65
0.7
0.75
0.8
0.85
0.9
0.95
1
(b)
Mean p ecision o he ela ions ex ac ed by each pipeline o hehea diseases udy
Figu e2. Numbe o ex ac ed and e alua ed ela ions and mean p ecision o each pipeline o he hea
disease s udy. The plo s ha e been gene a ed using he plo ly py hon lib a y43, e sion 5.7.0.
7
Vol.:(0123456789)
Scien i ic Repo s | (2023) 13:7815 | h ps://doi.o g/10.1038/s41598-023-34981-4
www.na u e.com/scien i ic epo s/
he “gene al ca dio ascula diso de s” en i y, he pipelines ex ac ed he ela ions “die a y ege able-cause-gene al
ca dio ascula diso de s”, “acid, sa u a ed a y- ea -gene al ca dio ascula diso de s”, “acid a y polyunsa u-
a ed-cause-gene al ca dio ascula diso de s”, “choles e ol- ea -gene al ca dio ascula diso de s” due o he
0 1 2
Numbe o suppo ing sen ences
ca dio ascula sy...
obesi y
obesi ies, isce al
hype ension
a y in il a io...
dyslipidaemia
dys unc ion endo ...
3-10 hea diseas...
body ails o es...
a y li e , nona...
Pipeline: ChemDis
Rela ion: cause
0 2
Numbe o suppo ing sen ences
hype lipidemia
co ona y a e y d...
ca diomyopa hy
dehyd a ion
weigh loss
co ona y hea di...
Pipeline: ChemDis
Rela ion: ea
0.0 0.5 1.0
Numbe o suppo ing sen ences
po a o
bulk
Pipeline: FoodChem
Rela ion: con ains
co ec
inco ec
Rela ions wi h maximal suppo o en i y: ca bohyd a es
Figu e3. Top 10 “cause”, “ ea ”, and “con ains” ela ions wi h maximum numbe o suppo ing sen ences o
ou chemical en i ies: “ca bohyd a es”, “ a y acid”, “sodium” and “ i amin d”. The en i ies in he ows o he
ChemDis pipeline a e diseases caused o ea ed by he chemical, while he en i ies in he ows o he FoodChem
pipeline a e ood en i ies in which he chemical is con ained.
8
Vol:.(1234567890)
Scien i ic Repo s | (2023) 13:7815 | h ps://doi.o g/10.1038/s41598-023-34981-4
www.na u e.com/scien i ic epo s/
ac ha he pipelines we e no able o ecognize ha he sen ences we e e e ing o he educ ion o hese ood
o chemical en i ies a ec ing he disease de elopmen o ea men o he gene al ca dio ascula diso de s. This
is also he eason o alse posi i e ela ions ex ac ion o he o he wo disease en i ies ea u ed in he igu e.
Use case: milk. Fo he use case ela ed o he composi ion and heal h e ec s o milk, wo highly-skilled
domain expe s e alua ed he esul s om all h ee pipelines: a chemis and a ood and nu i ional scien is .
Domain expe e alua ion. F om he 33,111 p ocessed abs ac s ela ed o he milk case s udy, he h ee pipe-
lines ex ac ed a o al o 6792 ela ions, om which 5139 we e e alua ed by he wo domain expe s. We need o
poin ou again ha all ex ac ed ela ions we e p o ided o he domain expe s, howe e , hey e alua ed only
hose ela ions o which hey ha e domain expe ise. Figu e5a ea u es he numbe o ela ions ex ac ed by
each pipeline o he milk case s udy, and he numbe o ela ions he expe s e alua ed. The highes numbe o
e alua ed ela ions we e he “con ains” ela ions ex ac ed by he FoodChem pipeline, and he expe s we e able
o e alua e 96% o hem (2849 ou o 2754). The expe s also e alua ed 73% o he “ ea ” and 78% o he “cause”
ela ions p oduced by he FooDis, 34% o he “cause” ela ions, and 35% o he “ ea ” ela ions p oduced by he
ChemDis pipeline.
The mean p ecision o each o he i e seman ic ela ions o bo h domain expe s is p esen ed in Fig.5b
sepa a ely. In addi ion, we ha e also p esen ed he mean p ecision o each ype o ela ion by a e aging he
p ecision ac oss bo h domain expe s. F om he igu e, we can see ha he i s domain expe , who e alua ed
he ela ions which we e suppo ed by a single sen ence, iden i ied mo e inco ec ela ions han he second
domain expe , who e alua ed he ela ions suppo ed by mul iple sen ences.
The o e all mean p ecision o each o he i e ela ions a e aged ac oss bo h domain expe s a e as ollows:
Figu e4. Top 10 “cause” and “ ea ” ela ions wi h maximal numbe o suppo ing sen ences ela ed o h ee
disease en i ies: “gene al ca dio ascula diso de s”, “diabe es”, and “obesi y”. The en i ies lis ed in he ows o
he FooDis pipeline a e ood en i ies, while he en i ies lis ed in he ows o he ChemDis pipeline a e chemical
en i ies, ha cause o ea he speci ied disease.
9
Vol.:(0123456789)
Scien i ic Repo s | (2023) 13:7815 | h ps://doi.o g/10.1038/s41598-023-34981-4
www.na u e.com/scien i ic epo s/
• 0.51 o he “cause” ela ion ex ac ed by he ChemDis pipeline,
• 0.79 o he “ ea ” ela ion ex ac ed by he ChemDis pipeline,
• 0.65 o he “cause” ela ion ex ac ed by he FooDis pipeline,
• 0.70 o he “ ea ” ela ion ex ac ed by he FooDis pipeline,
• 0.70 o he “con ains” ela ion ex ac ed by he FoodChem pipeline.
(a) Numbe o ex ac ed ande alua ed ela ions pe eachpipeline o he milk s udy
(b) Mean p ecision o he ex ac ed ela ions o each o he e iewe s (inblack andg een),
as well as o e allmean p ecision (pu ple), o eachpipeline o he milk s udy.
Figu e5. Numbe o ex ac ed and e alua ed ela ions and mean p ecision o each pipeline o he milk s udy.
The plo s ha e been gene a ed using he plo ly py hon lib a y43, e sion 5.7.0.