A Sec ion Iden i ica ion Tool: owa ds HL7 CDA/CCR
S anda diza ion in Spanish Discha ge Summa ies
Iakes Goenagaa, Xabie Lahue aa, Ai zibe A u xab, Koldo Gojenolab
HiTZ Basque Cen e o Language Technology
h p: // www. hi z. eus
Uni e si y o he Basque Coun y (UPV/EHU), Spain
aFacul y o Compu e Science, PºManuel La dizabal, 1 — 20018 Donos ia-San Sebas i´an
bSchool o Enginee ing, Paseo Ra ael Mo eno Pi xi xi, 3 — 48013 Bilbao
Abs ac
Backg ound. Nowadays, wi h he digi aliza ion o heal hca e sys ems, huge
amoun s o clinical na a i es a e a ailable. Howe e , despi e he weal h o
in o ma ion con ained in hem, in e ope abili y and ex ac ion o ele an
in o ma ion om documen s emains a challenge.
Objec i e. This wo k p esen s an app oach owa ds au oma ically s an-
da dizing Spanish Elec onic Discha ge Summa ies (EDS) ollowing he HL7
Clinical Documen A chi ec u e. We add ess he ask o sec ion anno a ion
in EDSs w i en in Spanish, expe imen ing wi h h ee di e en app oaches,
wi h he aim o boos ing in e ope abili y ac oss heal hca e sys ems and hos-
pi als.
Me hods. The pape p esen s h ee di e en me hods, anging om a
knowledge-based solu ion by means o manually cons uc ed ules o supe -
Email add esses: [email p o ec ed] (Iakes Goenaga),
[email p o ec ed] (Xabie Lahue a), [email p o ec ed]
(Ai zibe A u xa), [email p o ec ed] (Koldo Gojenola)
This is he accep ed manusc ip o he a icle ha appea ed in inal o m in Jou nal o Biomedical In o ma ics 121 : (2021) //
A icle ID 103875, which has been published in inal o m a h ps://doi.o g/10.1016/j.jbi.2021.103875. © 2021 Else ie unde CC
BY-NC-ND license (h p://c ea i ecommons.o g/licenses/by-nc-nd/4.0/)
ised Machine Lea ning app oaches, using s a e o he a algo i hms like he
Pe cep on and ans e lea ning-based Neu al Ne wo ks.
Resul s. The pape p esen s a de ailed e alua ion o he h ee app oaches
on wo di e en hospi als. O e all, he bes sys em ob ains a 93.03% F-
sco e o sec ion iden i ica ion. I is wo h men ioning ha his esul is no
comple ely homogeneous o e all sec ion ypes and hospi als, showing ha
c oss-hospi al a iabili y in ce ain sec ions is bigge han in o he s.
Conclusions. As a main esul , his wo k p o es he easibili y o accu a e
au oma ic de ec ion and s anda diza ion o sec ion blocks in clinical na a-
i es, opening he way o in e ope abili y and seconda y use o clinical da a.
Keywo ds: Sec ion Iden i ica ion, In e ope abili y, Elec onic Discha ge
Summa ies, HL7 Clinical Documen A chi ec u e
1. In oduc ion1
The ou s anding ad ancemen o Machine Lea ning (ML) echnologies2
(e.g., Deep Lea ning) enable us o mo e e icien ly ha ness he la ge amoun s3
o da a collec ed h ough heal hca e p ocesses such as clinical na a i es in4
elec onic heal h eco ds (EHR) as well as elec onic discha ge summa ies5
(EDS). EHRs con ain a li e ime eco d o he pa ien ’s comple e medical6
his o y, diagnoses and ea men , medica ions, alle gies and immuniza ions,7
as well as adiology images and labo a o y esul s [1]. EDSs a e an essen ial8
documen o communica e pa ien jou ney and ca e planning ega ding an9
hospi aliza ion episode o he nex p ac i ione [2]1. In 2016 he p opo ion10
o p ima y ca e p ac ices using elec onic clinical eco ds was abou 80% on11
1Some au ho s use hese wo e ms in e changeably.
2
a e age ac oss 15 EU coun ies [3], and in 2020 in he US he pe cen age12
is o 96% [4]. Digi aliza ion o heal hca e sys ems is con ibu ing o he13
imp o emen o clinical and ansla ional s udies, and in e ope abili y and14
in o ma ion exchange be ween heal hca e sys ems is mo e necessa y han15
e e . Fo ha eason, public policies and ecommenda ions a e pushing on o16
ha way [5, 6, 7].17
The e is an inc easing in e es o in eg a ing he e ogeneous heal h in o -18
ma ion o di e en easons: o acili a e he c oss-bo de in e ope abili y o 19
in o ma ion among heal hca e sys ems, ede al s a es and coun ies o ensu e20
ha ci izens can secu ely access and exchange hei heal h da a whe e e hey21
a e, and also o make digi al heal h in o ma ion mo e usable o he bedside22
and beyond [5, 6]. Se e al s anda ds as openEHR [8], HL7-FHIR [9], HL723
CDA/CCR [10] a e examples o his s anda diza ion e o .24
Howe e , despi e he weal h o in o ma ion con ained in he clinical na -25
a i es, in e ope abili y and ex ac ion o ele an in o ma ion om docu-26
men s emains a challenge. Al hough he a o emen ioned s anda ds exis , so27
a hey ha e no been widely adop ed, and e en i so, he heal hca e sys em28
a la ge s ill has a huge amoun o un apped legacy clinical ex .29
Heal hca e sys ems p o ide guidelines o w i ing clinical documen s,30
which o ope a i e easons ypically ollow some minimal p inciples o ensu e31
he op imal in e ac ions be ween heal h p o essionals and pa ien s like SOAP32
(Subjec i e, Objec i e, Assessmen , Plan), o APIE (Assessmen , Plan, Im-33
plemen a ion, and E alua ion) [11, 12]. Some sys ems assume ha hese34
p inciples a e bes e lec ed by using ee ex , due o lexibili y o exp ess35
any hing ha he heal h-ca e p o ide s need o eco d. On he opposi e36
3
ex eme, some impose s uc u ed o semi-s uc u ed clinical documen s in37
sec ions, whe e each sec ion is a main block o in o ma ion. In all cases,38
he au oma ed p ocessing o clinical ex s is hampe ed by ambigui y, lexical39
a ie y, use o abb e ia ions, e o s due o mis akes, edundancies, e c.40
Unde his scena io, his wo k p esen s a i s app oach owa ds au o-41
ma ically s anda dizing Spanish EDSs ollowing he HL7 Clinical Documen 42
A chi ec u e (CDA) R2 empla e o Discha ge Summa ies [10] o bo h help-43
ing in e ope abili y and seconda y use o Elec onic Discha ge Summa ies.44
The HL7 CDA R2 empla e con ains a se o clinically ele an sec ions,45
and pa o his s anda diza ion ask is known as Sec ion Iden i ica ion. I is46
de ined in [13] as de ec ing he bounda ies o ex sec ions and adding seman-47
ic anno a ions. They de ine a sec ion as a ex segmen ha g oups oge he 48
consecu i e clauses, ph ases o sen ences ha sha e he desc ip ion o one49
dimension o a pa ien , pa ien ’s in e ac ion o clinical indings. A sec ion50
can be ma ked explici ly, h ough s uc u al dema ca ions (headings o sub-51
headings), o i can exis implici ly. The main assump ion o making his52
iden i ica ion is ha uns uc u ed ex s ha e an explici o implici s uc u e.53
Besides i s ele ance in e ms o s anda iza ion and in e ope abili y, sec-54
ion iden i ica ion p o ides a deepe unde s anding o EDSs, o ins ance, by55
ecognizing he sec ion in which a medical en i y is loca ed. The same med-56
ical condi ion ound in he “pas pe sonal medical his o y” o in he “ amily57
medical his o y” sec ion migh lead o di e en conclusions. Se e al wo ks58
on seconda y use o EHRs and EDSs ha e shown ha sec ion iden i ica ion59
can be help ul o a a ie y o asks [14] such as en i y ecogni ion [15], co-60
ho e ie al [16] and empo al ela ion ex ac ion [17], and can help in mos 61
4
au oma ic medical p ocessing asks, as ICD-10 coding [18, 19, 20, 21]. This62
issue is apidly becoming an impo an opic in bo h academia and indus y.63
9027431 XX-XX-XXXX
66 años. VARON. MC: REFERENCIADO EN EL INFORME.
INFORME AL ALTA :
Pacien e de 66 años. No ale gias medicamen osas conocidas.
A. PERSONALES:
En e medad de C ohn diagnos icada en 1997 con a ec ación de íleon e minal
(A3L1B2) po cuad os suboclusi os esuel os con en e medad de íleon e minal
asociada a mesen e i is ib osa. A osis do so-lumba . Ci ugía de he nia inguinal. Ci
T o: Daco in 5: 1-0-0; Pa ie 20: 1-0-0; Pen asa:1-1-1; Kilo 0-1-0, Clinu en: 2/día.
E. ACTUAL:
Acude a U gencias po dolo abdominal gene alizado con eb ícula, sin i i ona, sin
naúseas ni ómi os. Sin al e ación del i mo in es inal. Con pau a descenden e de
co icoides, después del úl imo ing eso po cuad o suboclusi o.
EXPL. FÍSICA:
Pacien e conscien e, o ien ado, colabo ado . Buena colo ación de piel y mucosas.
Cuello: no adenopa ías ce icales. AC: í mica sin soplos. No oncus ni c epi an es.
Abdomen: dis endido, impánico. Pe is al ismo ausen e. Blumbe g nega i o.
EEII: no edemas maleola es. PPP.
RX ABDOMEN:
Suges i o de suboclusión in es inal. Se obje i an dos asas de delgado con ni eles
hid oaé eos incluso en la cáma a gás ica.
ANALÍTICA AL INGRESO.
U ea, C ea inina, GPT, Amilasa den o de lími es no males. Leucoci os14.400.
Segmen ados 80 %. TP 100 %. Plaque as 470.000. Hb 13.5. H o 41.7 %. PCR 14.2.
ANALÍTICA AL ALTA:
GOT, GPT, Gamma GT, FA, Bili ubina o al, Amilasa, LDH, Coles e ol o al,
T iglicé idos, Na, K den o de lími es no males. Al a1 an i ipsina 169, Albúmina 2.4.
PCR 11.6. Fe 18. T ans e ina 241. IS 5.3 %. Fe i ina 24. Vi B12 247. A ólico 6.3.
Hb 11.1. H o 34.4 %. VCM 78.8. Plaque as 367.000. Segmen ados 75 %. VCM 7.
EVOLUCIÓN Y PROCEDIMIENTOS:
Se a a de pacien e con en e medad de C ohn, con a ec ación ileal y mesen e i is
e ac il que ing esa po cuad o suboclusi o, ins au ándose o. co icoideo siendo
dado de al a con disminución p og esi a de dicho a amien o. Es ando en o. con
5 mg de Daco in, ing esa de nue o con cuad o suboclusi o. Se indica colocación
de sonda nasogás ica pa a aspi ación in e mi en e echazando el pacien e. Se
inicia el a amien o co icoideo i a dosis plenas, mejo ando la clínica del pacien e.
DIAGNÓSTICO:
- CUADRO SUBOCLUSIVO INTESTINAL POR ENFERMEDAD DE CROHN CON
AFECTACIÓN ILEAL (A3L1B2) Y MESENTERITIS RETRACTI
TRATAMIENTO:
- Daco in 60: 1-0-0 du an e 1 semana bajando 10 mg cada 10 días has a 10 mg
que man end á 15 días más y luego 5 mg 15 días más y suspende .
H
M
H
C
I
E
E
C
E
V
T
D
Figu e 1: Example EDS and i s sec ions (H: Heading, MH: Medical His o y, CI: Cu en
Illness, E: Explo a ion, EC: Complemen a y Explo a ion, EV: E olu ion, D: Diagnosis, T:
T ea men ).
5
Gi en he di icul y in accu a ely ex ac ing da a om ex , mos non-64
esea ch use o EHR and EDS da a ely only on s uc u ed da a. Howe e ,65
clinical no es con ain highly aluable in o ma ion no ound in s ic ly s uc-66
u ed ields and, mo eo e , hey gi e access o olumes o da a ha a e67
o de s o magni ude bigge and, consequen ly, imp o ing e ie al accu acy68
om ex would ha e g ea alue.69
In his pape , we will explo e he ask o sec ion anno a ion in EDSs70
w i en in Spanish (see Figu e 1). We will expe imen wi h h ee di e en 71
app oaches, anging om a knowledge-based solu ion by means o manually72
cons uc ed ules o supe ised Machine Lea ning app oaches, including he73
s uc u ed Pe cep on algo i hm and Deep Neu al Ne wo ks. The pape will74
p esen a de ailed e alua ion o he h ee app oaches and, as a main esul ,75
will p o e he easibili y o au oma ically de ec ing sec ion blocks in EDSs.76
The main con ibu ions o his wo k a e:77
•We desc ibe an anno a ion o ma o EDSs ha de ines he sec ion78
s uc u e o a documen . We ha e e alua ed i s easibili y anno a ing79
a da ase comp ised o 300 documen s and ha e measu ed a high in e -80
anno a o ag eemen .81
•We implemen h ee di e en app oaches o au oma ic sec ion iden i i-82
ca ion, including a ule-based me hod, he Pe cep on online lea ning83
algo i hm and Neu al Ne wo ks.84
•We conduc exhaus i e expe imen s o explo e he con ibu ion o each85
me hod, also gi ing a de ailed analysis o he s eng hs and weaknesses86
o he p oposed app oaches.87
6
The emainde o his pape is s uc u ed as ollows. Sec ion 2 discusses88
ela ed wo k. The esou ces and co pus a e p esen ed in Sec ion 3. Sec ion 489
ske ches he main esul s, while Sec ion 5 p o ides an analysis o he esul s90
including a compa ison o he di e en app oaches as well as an es ima ion91
o he sys em’s abili y o gene alize ac oss hospi al se ings and a quali a i e92
e alua ion o he encoun e ed e o s. Finally, Sec ion 6 summa izes he main93
conclusions and u u e wo k.94
2. Backg ound95
Poma es-Quimbaya e al. [13] e iewed se e al s udies on clinical sec ion96
iden i ica ion, which a ied on he kind o na a i e, he ype o sec ion, and97
he applica ion. The pape examines he cha ac e is ics o sys ems using a98
s a egy o sec ion iden i ica ion, he me hods used o iden i y implici o 99
explici sec ions wi h di e en deg ees o success, and he main applica ion100
scena ios and con ex s ha ha e been used wi h good pe o mance. F om he101
echnical poin o iew, he me hods we e classi ied in o ule-based me hods102
(59%), machine lea ning me hods (22%) and a combina ion o bo h (19%).103
Acco ding o he au ho s, hyb id me hods showed he bes pe o mance. 46%104
o he s udies we e able o iden i y explici (using headings) and implici 105
sec ions. Rega ding he language o applica ion, mos o he wo ks (78%)106
we e in ended o English ex s.107
A nold e al. [22] p esen SECTOR, a model o segmen documen s in o108
sec ions, unde he hypo hesis ha opics, lea ned in an unsupe ised way,109
cha ac e ize seman ically cohe en ex segmen s (sec ions). Thei deep neu-110
al ne wo k a chi ec u e lea ns a la en opic embedding o e a documen , in111
7
o de o classi y local opics and o segmen a documen a opic shi s. They112
epo a 56.7% F-sco e o segmen a ion and classi ica ion in he domain o 113
diseases. Al hough he app oach seems p omising, i s main incon enien o 114
ou ask is ha , as opics a e lea ned in an unsupe ised manne , he opic115
clus e s do no i well wi h he nine HL7 sec ion ypes o ou documen s,116
because opic clus e s can be ei he ine o mo e coa se-g ained.117
Choi e al. [23] claim ha he s uc u e unde lying EHR da a imp o es118
he pe o mance o p edic ion asks such as hea ailu e p edic ion. As mos 119
EHR da a do no always con ain comple e s uc u e in o ma ion o is com-120
ple ely una ailable, hey expe imen al e na i es o he baseline consis ing121
o ea ing EHR da a as a la -s uc u ed bag-o - ea u es. The p oposed122
model ou pe o med he baseline app oach o a ious p edic ion asks such123
as eadmission and mo ali y p edic ion, indica ing ha he de ec ion o EHR124
s uc u e is bene icial o many asks.125
Rosen hal e al. [24] de eloped a sys em o de ec sec ions in EHRs, based126
on di e en a chi ec u es: an RNN based sys em and a ans e based sys em127
using BERT. To o e come he lack o anno a ed da a hey p opose o use128
o aining pu poses sec ions lea ned om medical li e a u e (jou nals, ex -129
books, web con en ). They conclude ha ou o domain clinical li e a u e130
is help ul when he e is no enough EHR da a, bu i s con ibu ion is no 131
signi ican wi h bigge sizes o he in-domain anno a ed da ase . Thei sys em132
did no exploi he s uc u e o he documen , ha is, he ac ha some imes133
sec ions a e o de ed in a canonical o de (i.e., i s he Chie complain , hen134
he An eceden s, ...), which we plan o use in ou app oach, as i can be135
help ul in deciding sec ion ypes.136
8
Rush e al. [25] sol e he sec ion iden i ica ion p oblem using a CRF137
classi ie o ma k each oken as belonging o a sec ion heade , and hen138
hey apply a ule-based pos -p ocessing module o s uc u e he anno a ed139
sec ions. Compa ing o ou wo k, hey do no pe o m no maliza ion and140
he e o e he numbe o sec ions hey iden i y is no ixed. In hei sys em,141
simila sec ion heade s a e conside ed di e en , while ou aim is o no malize142
each sec ion in o a se o nine HL7 sec ion ypes.143
Apa om he medical domain, o he a eas like legal decision-suppo 144
sys ems also le e age he con en s uc u e o documen s. Fo example,145
B an ing e al. [26] exploi s uc u al and seman ic egula i ies in law case146
co po a o iden i y ex ual pa e ns ha ha e bo h p edic able ela ionships147
o case decisions and explana o y alue o legal decision suppo and ex-148
plainable ou come p edic ion.149
To summa ize, we can see ha he iden i ica ion o sec ions is cu en ly a150
p omising a ea o ac i e esea ch, specially o languages o he han English.151
His o ically, ule-based me hods ha e been he mos widely used app oach,152
al hough he ecen eme gence o new ML and Deep Lea ning echniques ha 153
ha e e olu ionized he s a e o he a on many asks also p esen s a enues154
o new de elopmen s.155
3. Ma e ials and Me hods156
In his sec ion we will explo e all he co po a and ools we ha e used157
in o de o ca y ou he expe imen s. In he i s pa (sec ion 3.1), we158
p esen he anno a ed co pus, he de ined anno a ion model and he in e -159
anno a o ag eemen . Sec ion 3.2 gi es a desc ip ion o he la ge unanno a ed160
9
Figu e 3: Dis ibu ion o sec ions in bo h ain and de elopmen spli s. Sec ion chie
complain , o ins ance, is p esen in 82% o he de elopmen spli , while sligh ly less in
he aining spli (79%).
Documen s Sec ions Tokens
aining 100 744 47,449
de elopmen 100 786 48,461
es 100 754 59,119
Table 3: De ails o he anno a ed co pus.
16
3.1.3. In e -Anno a o Ag eemen 246
The anno a ion p ocess o he co pus was pe o med by anno a ion ex-247
pe s. Anno a o s ollowed an i e a i e p ocess o aining un il a high in e -248
anno a o ag eemen was eached. The inal ag eemen measu e was cal-249
cula ed on a se o 25 EDSs ha we e doubly anno a ed by wo di e en 250
anno a o s, eaching a pai wise ag eemen o 93.47% Cohen’s Kappa, indi-251
ca ing ha he ag eemen is e y high. The e we e di e ences wi h espec 252
o each sec ion ype, anging om 86% o Diagnosis (lowes ag eemen ) o253
100% o some sec ion ypes, hus eaching a signi ican ag eemen o all254
sec ion ypes.255
Ou anno a ion s a egy equi es each sec ion ype o be ma ched exac ly256
while aking in o accoun i s con en , and addi ionally e u ns he i s and257
las lines o each sec ion. While his s a egy migh seem an o e ly s ingen 258
c i e ia, he ask is well de ined as e idenced by he high in e -anno a o 259
ag eemen .260
3.2. Tex ual Co po a261
Deep lea ning echniques usually equi e huge amoun s o da a. Al hough262
manually anno a ed da a gi es he bes esul s, i is e y expensi e and ime263
consuming. Fo ha eason, he idea o acqui ing use ul in o ma ion in an264
unsupe ised manne is e y a ac i e, and e icien and e ec i e me hods265
ha e been de eloped. Vec o ial ep esen a ions o wo ds, also known as wo d266
embeddings [28, 29], ha a e lea ned om ex ual co po a, ha e p o en use ul267
as an in o ma ion sou ce o many Na u al Language P ocessing asks, such268
as Pa -O -Speech (POS) agging, Named En i y Recogni ion o Machine269
T ansla ion, due o hei abili y o acqui e ele an gene aliza ions. These270
17
embeddings a e lea ned h ough sol ing an app op ia e op imiza ion objec-271
i e [28] unde he assump ion ha simila wo ds occu in simila con ex s.272
As a esul , ec o s o simila wo ds de i ed om such op imiza ion end273
o eside in he neighbo hood in he ec o space. The e a e wo kinds o 274
embeddings, s a ic and con ex ual. S a ic embeddings cap u e in a ec o ial275
ep esen a ion in o ma ion o a wo d o m, while con ex ual embeddings a e276
sensi i e o con ex , ep esen ing bo h a wo d and i s con ex .277
This way, an unsupe ised sys em can u ilize he in o ma ion based on278
wo d simila i y in a manne ha associa es unseen wo ds wi h hose al eady279
occu ing in he anno a ed co pus, he eby allowing us o co e unseen and280
misspelled e ms. Fo ins ance, in a c and s oke a e simila e ms bu one281
o hem may no be in he anno a ed da a se . The esul ing wo d ec o s282
will be ed o he neu al ne wo k as inpu du ing aining (see Figu e 4), hus283
p o iding a model o he language ha can help ob ain be e gene aliza ions284
and, consequen ly, inc ease he ecall o he inal ool.285
Fo his wo k we ha e employed he e ogeneous embedding in o ma ion286
bo h s a ic and con ex ual in o de o make he sys em sensi i e o di e en 287
g anula i y and domain speci ici y. Rega ding he g anula i y, du ing he288
sec ion iden i ica ion aining, adding a cha ac e embedding laye allows289
he sys em o lea n a he cha ac e le el. Besides he cha ac e embed-290
dings lea ned du ing he aining, we inco po a ed p e- ained, cha ac e -291
based embeddings based on as Tex [30] ained o e he Spanish e sion292
o Wikipedia. Cha ac e -based embeddings a e able o gene alize o e n-293
g ams, enabling he sys em o ake in o accoun p e ixes and su ixes as well294
as o cap u e in o ma ion abou he di e en n-g am a ia ions on he sec-295
18
ion heading wo ds. They also gene alize o e ze o-sho wo ds, wo ds ha 296
do no appea in he aining co pus as hei building elemen a e cha ac e s297
and no wo ds.298
Table 4 p esen s he de ails o he di e en wo d embeddings we ha e299
used o he ask. S a ic embeddings we e ob ained applying wo d2 ec [30]300
o Elec onic Discha ge Summa ies (50M wo ds), oge he wi h p e ained301
embeddings ha had been calcula ed wi h Wikipedia2Vec [31], ep esen a i e302
o gene al domain. Addi ionally, we also used con ex ual s ing embeddings303
[32] we calcula ed om Elec onic Discha ge Summa ies and Wikipedia .304
Technique Sou ce Embedding De ails
ex ype
wo d2 ec EDS
s a ic
window leng h = 1,
dimensions = 300,
algo i hm = SkipNg am
Wikipedia2Vec gene al window leng h = 5,
domain dimensions = 300,
algo i hm = Skipg am
FLAIR
EDSs
con ex ual
laye s=1, hidden size = 2,048,
sequence leng h = 250,
mini ba ch size = 32
gene al laye s = 1, hidden size = 1,024,
domain sequence leng h = 250,
mini ba ch size = 100
Table 4: O e iew o he di e en embedding ypes used in his wo k (s a ic wo d embed-
dings and con ex ual cha ac e embeddings).
19
3.3. App oaches o Au oma ic Sec ion Iden i ica ion305
In his sec ion we will explain he di e en app oaches we ha e ied wi h306
he aim o au oma ically iden i ying sec ions in medical eco ds. Fi s o 307
all, in subsec ion 3.3.1 we will speci y he se up we used o he ule-based308
ool ha we ha e de eloped. A e ha , in subsec ions 3.3.2, and 3.3.3 we309
will explo e he ML algo i hms we ha e employed, he Pe cep on and Deep310
Lea ning, espec i ely. Fo bo h ML app oaches, we ha e app oached he311
ask as a sequen ial lea ning p ocess [33, 34], whe e he ex is conside ed a312
sequence o okens, and each oken is associa ed wi h one ag indica ing i s313
co esponding sec ion. We ha e used an IOB (Inside, Ou side, Begin) ag314
model, whe e he beginning o each sec ion is ma ked wi h a B ag (e.g., B-315
DIA o he oken s a ing a diagnosis), he okens inside a sec ion a e ma ked316
wi h an I ag (I-DIA will ma k a oken inside a diagnosis sec ion), and using317
he O ag o elemen s ha do no belong o any sec ion (see Figu e 5). This318
way, sec ion iden i ica ion can be iewed as he de ec ion o ex ended and319
long en i ies. This app oach has been success ully used in simila asks as320
he iden i ica ion o elemen a y discou se uni s ( ex segmen s consis ing o 321
one o se e al sen ences) in Discou se p ocessing [35] o opic segmen a ion322
[22]. Figu e 4 p esen s an a chi ec u e o he sys em.323
20
027431 XX-XX-XXXX
66 años. VARON. MC:
REFERENCIADO EN EL INFORME.
INFORME AL ALTA :
Pacien e de 66 años. No ale gias
medicamen osas conocidas.
A. PERSONALES:
En e medad de C ohn diagnos icada
en 1997 con a ec ación de íleon
e minal (A3L1B2) po cuad os
suboclusi os esuel os con
en e medad de íleon e minal
asociada a mesen e i is ib osa.
A osis do so-lumba . Ci ugía de
he nia inguinal. Ci
T o: Daco in 5: 1-0-0; Pa ie 20: 1-0-
0; Pen asa:1-1-1; Kilo 0-1-0,
Clinu en: 2/día.
E. ACTUAL:
Acude a U gencias po dolo
abdominal gene alizado con eb ícula,
sin i i ona, sin naúseas ni ómi os.
Sin al e ación del i mo in es inal. Con
pau a descenden e de co icoides,
después del úl imo ing eso po cuad o
suboclusi o.
EXPL. FÍSICA:
Pacien e conscien e, o ien ado,
colabo ado . Buena colo ación de piel
y mucosas.
Cuello: no adenopa ías ce icales.
AC: í mica sin soplos. No oncus ni
c epi an es.
Abdomen: dis endido, impánico.
Pe is al ismo ausen e. Blumbe g
nega i o.
EEII: no edemas maleola es. PPP.
RX ABDOMEN:
Suges i o de suboclusión in es inal.
Knowledge-based sys em
Pe cep on
Neu al Ne wo k
027431 XX-XX-XXXX
66 años. VARON. MC:
REFERENCIADO EN EL INFORME.
INFORME AL ALTA :
Pacien e de 66 años. No ale gias
medicamen osas conocidas.
A. PERSONALES:
En e medad de C ohn diagnos icada
en 1997 con a ec ación de íleon
e minal (A3L1B2) po cuad os
suboclusi os esuel os con
en e medad de íleon e minal
asociada a mesen e i is ib osa.
A osis do so-lumba . Ci ugía de
he nia inguinal. Ci
T o: Daco in 5: 1-0-0; Pa ie 20: 1-0-
0; Pen asa:1-1-1; Kilo 0-1-0,
Clinu en: 2/día.
E. ACTUAL:
Acude a U gencias po dolo
abdominal gene alizado con eb ícula,
sin i i ona, sin naúseas ni ómi os.
Sin al e ación del i mo in es inal. Con
pau a descenden e de co icoides,
después del úl imo ing eso po cuad o
suboclusi o.
EXPL. FÍSICA:
Pacien e conscien e, o ien ado,
colabo ado . Buena colo ación de piel
y mucosas.
Cuello: no adenopa ías ce icales.
AC: í mica sin soplos. No oncus ni
c epi an es.
Abdomen: dis endido, impánico.
Pe is al ismo ausen e. Blumbe g
nega i o.
EEII: no edemas maleola es. PPP.
RX ABDOMEN:
Suges i o de suboclusión in es inal.
Regula exp essions
(EXPL | PRUEB).*COMPL.*
...
BiLSTM laye
B-CC I-CC
CRF laye
Cha ac e emb. laye
<blank> p a …. <blank> s a <blank>
El pacien e ing esa po ue e dolo
B-CC I-CC …. I-CC
Figu e 4: A chi ec u e o he sys em. Th ee di e en app oaches ha e been used: egula
exp essions, he Pe cep on algo i hm and neu al ne wo ks.
3.3.1. Rule-Based App oach324
Manually de ined ules ha e been used since he ea ly yea s o A i icial325
In elligence, and a e s ill a compe i i e me hod o achie e accep able esul s.326
Thei downside is he e o needed o include knowledge in o he au oma ic327
sys em. Ano he d awback is hei lack o gene aliza ion, because a change328
in he domain may imply a comple e e-implemen a ion o he ule sys em.329
Rega ding he iden i ica ion o sec ions in medical eco ds, his app oach330
21
has been used in many sys ems [13, 36], whe e accep able esul s ha e been331
epo ed, al hough in se e al cases he app oach has no been gene al, bu 332
a he limi ed o a educed se o e y speci ic sec ions o po ions o ex .333
Table 2 p esen s se e al examples o he beginning o di e en sec ion ypes.
The able shows how he e is a high a iabili y di icul o cap u e using
ules, specially wi h implici sec ions wi h no s anda d i le, like in he Chie
Complain and Complemen a y Explo a ion. Examples (1) and (2) p esen
wo ules ha y o cap u e he s a o he Chie Complain and he Cu -
en Illness sec ions, whe e he pa en heses enclose op ional elemen s. The
objec i e was o co e he di e en op ions ound in he aining se .
(1) MOTIVO(S) (DE(L)(A)) INGRESO|PETICION|334
EXP LORACION|ESTUDIO|CONSULT A (ACT UAL)335
(2) (E.|ENFERMEDAD|SIT UACI ´
ON|EP ISODIO|ESTADO)336
(A.|ACT UAL)|SINT OMAT OLOG´
IA337
3.3.2. Machine Lea ning: Pe cep on338
Fo he applica ion o ML o sec ion iden i ica ion, we modeled he p ob-339
lem as a sequence o sequence p oblem. The ask consis s in lea ning o340
map om inpu wo d sequences w1...wm|wi∈W o ou pu ag sequences341
1... m| i∈T.342
Al hough some app oaches o sec ion iden i ica ion used sen ence se-343
quences as inpu uni s [13], we p e e ed o model his p oblem using wo d344
sequences as inpu uni s o cap u e he ac ha indi idual wo ds in he345
22
igh con ex a e good signals o sec ions and also o educe spa si y, be-346
cause sen ence sequences a e mo e spa se han wo d sequences. The p oblem347
is cas as he assignmen o he co ec ag o each oken. Al hough he ag348
assignmen is made oken by oken, he inal e alua ion will be done on he349
de ec ion o comple e sec ions.350
To do so, we employed he A e aged S uc u ed Pe cep on algo i hm351
[37, 38] which combines he Pe cep on algo i hm o lea ning linea classi-352
ie s wi h an in e ence algo i hm and con e s a classi ica ion p oblem in o a353
anking p oblem. The objec i e o he algo i hm is o ind, o each sen ence,354
he sequence o ags wi h he maximum sco e. This p edic ion decision p o-355
cess is di ided in o a sequence o smalle decisions made om le - o- igh .356
Thus, a each s ep he e is a wo d and i s con ex , called he his o y, in357
which he local agging decision is made, namely o p edic he ag gi en he358
his o y. The his o y can be ep esen ed in se e al ways, using he p e ixes o 359
a gi en numbe o p e ious wo ds, and/o he su ixes, o any o he ea u es360
ha could be ele an o he ask and hen con e ed in o a ea u e ec o 361
whe e each ea u e will ge a weigh h ough he lea ning p ocess.362
Fo mally, he p oblem can be s a ed as ollows. Gi en:363
•A sequence o inpu wo ds w1...wm, o simplici y e e ed as w.364
•The sequence o ags 1... mas ( his way, he se o possible ags is365
T).366
•In ou case he con ex in which a agging decision is made is ep e-367
sen ed by he his o y uple h:< −2, −1, w−2, w−1, w0, w+1, sx0, px0, cap,368
num, i >, whe e −2and −1a e he p e ious wo ags, w−2and w−1a e369
23
he p e ious wo wo ds a a gi en posi ion i( his way, Hco esponds370
o he se o all possible his o ies).371
sx and px co espond o di e en sizes o wo d su ixes, (in his wo k,372
x a ying om 2 o 4) and p e ixes o w0.373
cap and num co espond o wo bina y ea u es o accoun o capi al-374
iza ion and numbe s a us a he cu en wo d.375
The ea u e mapping unc ion Φ : H×T→Rdmaps a his o y- ag376
pai o a d-dimensional ea u e ec o we men ioned be o e. The S uc u ed377
Pe cep on models P( |w) as P( |h;α) whe e α∈Rdis a pa ame e ec o 378
ep esen ing he weigh o each ea u e o Φ. P( |h;α) is calcula ed as α·379
Φ(h, ) and he objec i e unc ion is:380
ˆ
=a gmax Pd
1αi·Φi(h, )381
Usually he Vi e bi algo i hm is applied when used on sequence da a,382
in o de o e icien ly calcula e he bes ag sequence using dynamic p o-383
g amming. The algo i hm is compe i i e o o he op ions such as maximum-384
en opy agge s o CRFs [33].385
Figu e 5: Simpli ied A chi ec u e o he S uc u ed Pe cep on ( he uppe h ee ows
exempli y he use o wo d ea u es ( i s ow), 3 le e p e ixes and 3 le e su ixes (second
and hi d ows).
24
We employed ou own implemen a ion o his agge ollowing [37]. We386
ained 100 i e a ions and selec ed he model co esponding o he i e a ion387
ha achie ed he bes sco e on he de elopmen se . Al hough he algo i hm388
achie es a compe i i e pe o mance compa ed o s a e o he a me hods,389
his app oach equi es a ea u e enginee ing e o o iden i y, selec and390
p ope ly encode ele an ea u es.391
3.3.3. Machine Lea ning: Neu al Ne wo ks392
In addi ion o a adi ional neu al ne wo k like Pe cep on, we explo ed393
ans e lea ning me hods. In his case, we used FLAIR [39], a bi-di ec ionally394
ained Language Model (LM) using Recu en Neu al Ne wo ks (RNN),395
whe e he basic elemen is he cha ac e and no he wo d. Based on i s cha -396
ac e s, FLAIR gene a es p e- ained con ex ual embeddings o each wo d by397
conca ena ing he hidden s a e o he las cha ac e o he wo d in he o -398
wa d neu al ne wo k and he i s cha ac e o he wo d in he backwa d399
neu al ne wo k, as shown in Figu e 6. As desc ibed in [39], o mally, he ob-400
jec i e unc ion o a cha ac e -based LM is o maximize he sum o he logs401
o P(x |x0, ..., x −1), ha is o say, an es ima e o he p edic i e dis ibu ion402
o e he nex cha ac e gi en pas cha ac e s. FLAIR allows us o com-403
bine di e en ypes o embeddings by conca ena ing each embedding ec o 404
o o m he inal wo d ec o . We employed a combina ion o embeddings405
as p e iously epo ed in sec ion 3.2. One o he main ad an ages o hese406
me hods is ha he e is no need o ea u e enginee ing.407
25
Figu e 8: E ec o aining and es ing on he same o a di e en hospi al (H1: Galdakao-
Usansolo hospi al, H2: Basu o hospi al), measu ed by F-sco e.
The di e ence is signi ican in almos all ypes o sec ions. Specially503
ele an is he e ec o he sys em ained on hospi al H2 and es ed on504
hospi al H1 o he Heading sec ion ype (H column), whe e he F-sco e is505
e y low. We examined he esul s and concluded ha his happens mos ly506
because he headings show a g ea a ia ion, added o he ac ha he507
da a p esen in he headings is gene a ed au oma ically mos o he imes,508
including eco d numbe s o da es, and his implies ha hey can be di e en 509
enough o con use an au oma ic sys em. Su p isingly, his does no happen510
in he opposi e di ec ion, meaning ha he da a om hospi al H1 shows511
mo e a iabili y and is use ul o accoun o he ins ance ypes o hospi al512
H2. The di e ence is also signi ican o he second sec ion ype (Chie 513
32
Complain , CC), al hough less d as ic. This was due o a cascade e ec 514
as a esul o applying a sequence o sequence app oach, as he e o s in515
delimi ing he i s sec ion o he documen equen ly a e ca ied om one516
sec ion o he nex one. Finally, o some sec ion ypes, like E(xplo a ion),517
EV(olu ion) and D(iagnos ic), we can see how applying a sys em ained on518
a di e en hospi al can ou pe o m he sys em based on da a om he same519
hospi al. This can be due o he ac ha one hospi al ag ees mo e wi h he520
con en ions o he o he hospi al o hese sec ion ypes.521
5.2. E o Analysis522
We looked a he e o s gi en by he di e en sys ems. Fo simplici y, we523
will only examine he esul s o he bes sys em based on neu al ne wo ks.524
An examina ion o he di e gences be ween he ou pu o he sys em and he525
gold s anda d showed us he main causes o e o :526
•E o s gi en by he inhe en di icul y o spon aneously w i en sec ion527
headings. Al hough explici headings a e an impo an clue o delimi 528
sec ions, he a iabili y o hei w i ings oge he wi h he limi ed size529
o he aining se (100 documen s, which means ha he e a e a mos 530
100 ins ances o each sec ion ype) is a sou ce o e o s.531
•Implici sec ions. Some ypes o sec ions ha e a majo i y o ins ances532
wi hou an explici sec ion heading, which means ha he sec ion mus 533
be de ec ed using i s con en wo ds (see Table 2).534
•Mixed sec ions. Al hough he anno a o s ha e decided he exac scope535
o each sec ion wi h a high ag eemen , he use o uns uc u ed and536
33
spon aneously w i en EDSs gi es he w i e s eedom o desc ibe any537
concep in di e en places. As an example, he sec ion co esponding538
o he Medical His o y can con ain passages ela ed o pas diagnoses,539
ea men s and explo a ions, which can pose a challenge o an au o-540
ma ic sys em.541
•A special case o mixed sec ions can be he con usion be ween wo542
ela ed sec ion ypes:543
–Chie Complain and Cu en Illness. These wo sec ions p esen 544
he mos di use de ini ion [27], and a e he cause o se e al e o s.545
–Explo a ion and Complemen a y Explo a ion. Al hough he de i-546
ni ion o each sec ion is p ecise, some imes physicians mix hem547
in he same block o pa ag aph.548
In Sec ion 3.1.2, we men ioned ha he o de ing o sec ion ypes shows549
a g ea a iabili y. In o de o measu e i s impac on he esul s, we spli 550
he es se in wo subse s. The i s subse co esponds o he documen s551
ha ollow he canonical o de (26% o he documen s), while he es o he552
documen s con o m he second subse (non-canonical o de and/o missing553
sec ions, 74% o he documen s). Since ou sequence lea ning-based me hods554
depend on he o de ing o p edic ing he nex oken, his has an e ec in he555
IOB-labeling p edic ion, wi h a F-sco e o 95.40 o he canonical documen s556
and 89.81 o he non-canonical ones.557
Figu e 9 p esen s he main ypes o mis akes made by he au oma ic558
ool. I shows how he e o s a e concen a ed in some sec ions, like Chie 559
Complain (CC), Medical his o y (MH) and Diagnosis (D). O e all, he dis-560
34
inc ion o di e en sec ions is e lec ed in he ex by means o di e en clues,561
anging om seman ics ( he con en o each sec ion) o syn ax (e.g., use o 562
sec ion headings and sepa a e pa ag aphs o ex blocks o each sec ion)563
bu , in mos o he e o s, hese con en ions do no hold, and his causes he564
au oma ic ool o ind an addi ional di icul y.565
Figu e 9: Con usion ma ix, whe e da ke g een means a highe equency, o each ins ance
(H: Heading, CC: Chie Complain , MH: Medical His o y, CI: Cu en Illness, E: Explo-
a ion, EC: Complemen a y Explo a ion, EV: E olu ion, D: Diagnosis, T: T ea men ).
6. Conclusion566
We p esen a sys em o Sec ion Iden i ica ion in Discha ge Summa ies567
w i en in Spanish. We ha e adop ed an anno a ion model based on H7 CDA568
R2 o Elec onic Discha ge Summa ies (EDS) o he Spanish Heal h Sys em,569
and we ha e applied i o manually anno a e a co pus o 300 EDSs, ob aining570
a high in e -anno a o ag eemen .571
We ha e e alua ed he con ibu ion o di e en ule-based and Machine572
Lea ning app oaches and s udy he s eng hs and weaknesses o each op ion.573
Mos p e ious wo ks ha e used sec ion iden i ica ion as an auxilia y module574
35
o ca ying on clinical p ocessing, elying on a ule-based app oach. How-575
e e , ou esul s show ha sec ion iden i ica ion is a ask on i s own, whe e576
simple me hods do no ob ain he bes esul s. The Machine Lea ning sys-577
ems ob ain esul s ha a e good enough o he applica ion o he sys em578
in a p oduc ion se ing. Speci ically, we show ha Language Model uning is579
a key ac o , as a Language Model-based ans e lea ning p o ides he bes 580
pe o mance. The pape has also s udied he gene aliza ion abili y o mod-581
els ained in di e en hospi als, showing ha di e en sec ion ypes ha e582
signi ican di e ences in some cases.583
The de eloped au oma ic anno a ion models and so wa e a e eely a ail-584
able con ac ing he au ho s.585
Acknowledgemen s586
We g a e ully acknowledge he suppo o NVIDIA Co po a ion wi h he587
dona ion o he Ti an X Pascal GPU used o his esea ch. This wo k was588
pa ially unded by he Spanish Minis y o Science and Inno a ion (DOTT-589
HEALTH/PAT-MED PID2019-106942RB-C31), he Eu opean Commission590
(FEDER), he Basque Go e nmen (IXA IT-1343-19), and he EU ERA-591
Ne CHIST-ERA and he Spanish Resea ch Agency (ANTIDOTE PCI2020-592
120717-2).593
Re e ences594
[1] C. Pe e son, C. Hamil on, P. Has old, F om inno a ion o implemen a-595
ion – eHeal h in he WHO Eu opean Region, Wo ld Heal h O ganiza-596
ion, 2016.597
36
[2] M. Adnan, J. Wa en, M. O , A. Ewens, J. Sco , S. T ubshaw, The598
quali y o elec onic discha ge summa ies o pos -discha ge ca e: Hos-599
pi al panel assessmen and i o suppo imp o emen , Heal h Ca e and600
In o ma ics Re iew Online 15 (2011).601
[3] Heal h a a Glance: Eu ope 2018 STATE OF HEALTH IN THE EU CY-602
CLE, h ps://ec.eu opa.eu/heal h/si es/de aul / iles/s a e/603
docs/2018_heal ha glance_ ep_en.pd , 2020. Las Online; accessed604
31-05-2021.605
[4] Heal h IT Da a Summa ies, h ps://dashboa d.heal hi .go /apps/606
heal h-in o ma ion- echnology-da a-summa ies.php, 2021. Las 607
Online; accessed 31-05-2021.608
[5] Connec ing heal h and ca e o he na ion: A sha ed na ionwide in-609
e ope abili y oadmap. O ice o he Na ional Coo dina o o Heal h610
In o ma ion Technology (ONC). Washing on, DC: U.S. Depa men o 611
Heal h and Human Se ices (HHS), 2015.612
[6] Recommenda ion on a Eu opean Elec onic Heal h Reco d exchange613
o ma . Eu opean Commision, 2019.614
[7] S a e o In e ope abili y among U.S. Non- ede al Acu e Ca e Hospi als615
in 2018 , h ps://www.heal hi .go /si es/de aul / iles/page/2020-616
03/S a e-o -In e ope abili y-among-US-Non- ede al-Acu e-Ca e-617
Hospi als-in-2018.pd , 2020. Las Online; accessed 31-05-2021.618
[8] openeh , h ps://www.openeh .o g, 2020. Las Online; accessed 31-619
05-2021.620
37
[9] Heal h Le el Se en (HL7). FHIR, h p://www.hl7.o g, 2019. Las On-621
line; accessed 31-05-2021.622
[10] Heal h Le el Se en (HL7). CDA, h p://www.hl7.o g, 2019. Las On-623
line; accessed 31-05-2021.624
[11] H. K., K. Sa an o, N. P., De ini ion, s uc u e, con en , use and impac s625
o elec onic heal h eco ds: a e iew o he esea ch li e a u e, In J626
Med In o m. 77 (5) (2008) 291–304.627
[12] W. LL., Medical eco ds ha guide and each, N Engl J Med. 14) (1968)628
593–600.629
[13] A. Poma es-Quimbaya, M. K euz hale , S. Schulz, Cu en app oaches630
o iden i y sec ions wi hin clinical na a i es om elec onic heal h631
eco ds: a sys ema ic e iew, BMC Medical Resea ch Me hodology 19632
(2019).633
[14] T. Edinge , D. Demne -Fushman, A. Cohen, S. Bed ick, H. W., E alu-634
a ion o Clinical Tex Segmen a ion o Facili a e Coho Re ie al, in:635
AMIA Annu Symp P oc., pp. 660–669.636
[15] J. Lei, B. Tang, X. Lu, K. Gao, M. Jiang, H. Xu, A comp ehensi e637
s udy o named en i y ecogni ion in chinese clinical ex , Jou nal o he638
Ame ican Medical In o ma ics Associa ion 21 (2014).639
[16] Y. Wang, L. Wang, M. Ras ega -Moja ad, S. Moon, F. Shen, N. A zal,640
S. Liu, Y. Zeng, S. Meh abi, S. Sohn, H. Liu, Clinical in o ma ion ex ac-641
ion applica ions: A li e a u e e iew, Jou nal o Biomedical In o ma ics642
77 (2018) 34 – 49.643
38
[17] H.-J. Lee, Y. Zhang, M. Jiang, J. Xu, C. Tao, H. Xu, Iden i ying di ec 644
empo al ela ions be ween ime and e en s om clinical no es, BMC645
Medical In o ma ics and Decision Making 18 (2018).646
[18] A. P´e ez, K. Gojenola, A. Casillas, M. O onoz, A. D´ıaz de Ila aza,647
Compu e aided classi ica ion o diagnos ic e ms in spanish, Expe 648
Sys ems wi h Applica ions, 42(6), 2949–295 (2015).649
[19] A. A u xa, A. D. de Ila aza, K. Gojenola, M. O onoz, O. P. de Vi˜nasp e,650
In e p e able deep lea ning o map diagnos ic ex s o icd-10 codes, In-651
e na ional Jou nal o Medical In o ma ics 129 (2019) 49 – 59.652
[20] K. Xu, M. Lam, J. Pang, X. Gao, C. Band, P. Ma hu , F. Papay, A. K.653
Khanna, J. B. Cywinski, K. Maheshwa i, P. Xie, E. P. Xing, Mul imodal654
machine lea ning o au oma ed icd coding, in: F. Doshi-Velez, J. Fack-655
le , K. Jung, D. Kale, R. Rangana h, B. Wallace, J. Wiens (Eds.), P o-656
ceedings o he 4 h Machine Lea ning o Heal hca e Con e ence, olume657
106 o P oceedings o Machine Lea ning Resea ch, PMLR, Ann A bo ,658
Michigan, 2019, pp. 197–215.659
[21] A. Duque, H. Fab ega , L. A aujo, J. Ma inezRomo, A keyph asebased660
app oach o in e p e able ICD-10 code classi ica ion o Spanish medical661
epo s, A i icial In elligence in Medicine (2020).662
[22] S. A nold, R. Schneide , P. Cud ´e-Mau oux, F. A. Ge s, A. L¨ose , Sec-663
o : A neu al model o cohe en opic segmen a ion and classi ica ion,664
T ansac ions o he Associa ion o Compu a ional Linguis ics 7 (2019)665
169–184.666
39
[23] E. Choi, Z. Xu, Y. Li, M. W. Dusenbe y, G. Flo es, E. Xue, A. M.667
Dai, Lea ning he g aphical s uc u e o elec onic heal h eco ds wi h668
g aph con olu ional ans o me , in: Associa ion o he Ad ancemen 669
o A i icial In elligence (AAAI).670
[24] S. Rosen hal, K. Ba ke , Z. Liang, Le e aging medical li e a u e o 671
sec ion p edic ion in elec onic heal h eco ds, in: P oceedings o he672
2019 Con e ence on Empi ical Me hods in Na u al Language P ocessing673
and he 9 h In e na ional Join Con e ence on Na u al Language P o-674
cessing (EMNLP-IJCNLP), Associa ion o Compu a ional Linguis ics,675
Hong Kong, China, 2019, pp. 4864–4873.676
[25] E. Rush, I. Danciu, G. Os oucho , K. Cho, B. Maye , Y.-L. Ho, J. Hon-677
e law, L. Cos a, F. Lina es, E. Begoli, Jsonize: A scalable machine678
lea ning pipeline o model medical no es as semi-s uc u ed documen s,679
AMIA Join Summi s on T ansla ional Science p oceedings. AMIA Join 680
Summi s on T ansla ional Science 2020 (2020) 533–541.681
[26] L. K. B an ing, C. P ei e , B. B own, L. Fe o, J. Abe deen, B. Weiss,682
M. P a , B. Liao, Scalable and explainable legal p edic ion, A i icial683
In elligence and Law (2020).684
[27] A. R. Te oba, Mejo a de la calidad del in o me cl´ınico de al a hospi a-685
la ia desde el pun o de is a ling¨u´ıs ico, PhD Thesis, Uni e si y o La686
Rioja (2018).687
[28] T. Mikolo , K. Chen, G. S. Co ado, J. Dean, E icien Es ima ion o 688
Wo d Rep esen a ions in Vec o Space, CoRR abs/1301.3781 (2013).689
40
[29] J. Penning on, R. Soche , C. D. Manning, Glo e: Global ec o s o 690
wo d ep esen a ion, in: Empi ical Me hods in Na u al Language P o-691
cessing (EMNLP), pp. 1532–1543.692
[30] T. Mikolo , E. G a e, P. Bojanowski, C. Puh sch, A. Joulin, Ad ances693
in p e- aining dis ibu ed wo d ep esen a ions, in: P oceedings o he694
In e na ional Con e ence on Language Resou ces and E alua ion (LREC695
2018).696
[31] I. Yamada, A. Asai, H. Shindo, H. Takeda, Y. Take uji, Wikipedia2 ec:697
An op imized ool o lea ning embeddings o wo ds and en i ies om698
wikipedia, a Xi p ep in 1812.06280 (2018).699
[32] A. Akbik, D. Bly he, R. Vollg a , Con ex ual s ing embeddings o 700
sequence labeling, in: P oceedings o he 27 h In e na ional Con e ence701
on Compu a ional Linguis ics, pp. 1638–1649.702
[33] A. McCallum, W. Li, Ea ly esul s o named en i y ecogni ion wi h703
condi ional andom ields, ea u e induc ion and web-enhanced lexicons,704
in: P oceedings o he Se en h Con e ence on Na u al Language Lea n-705
ing a HLT-NAACL 2003 - Volume 4, CONLL ’03, Associa ion o Com-706
pu a ional Linguis ics, S oudsbu g, PA, USA, 2003, pp. 188–191.707
[34] A. Jaganna ha, H. Yu, Bidi ec ional ecu en neu al ne wo ks o med-708
ical e en de ec ion in elec onic heal h eco ds, CoRR abs/1606.07953709
(2016).710
[35] A. A u xa, K. Bengoe xea, A. D. de Ila aza, M. I uskie a, Towa ds a711
op-down app oach o an au oma ic discou se analysis o basque: Seg-712
41