scieee Science in your language
[en] (orig)

Boosting ICD multi-label classification of health records with contextual embeddings and label-granularity

Author: Blanco Garcés, Alberto,Pérez de Viñaspre Garralda, Olatz,Pérez Ramírez, Alicia,Casillas Rubio, Arantza
Publisher: Elsevier
Year: 2020
DOI: 10.1016/j.cmpb.2019.105264
Source: https://addi.ehu.eus/bitstream/10810/70920/3/2019_CMPB__Alberto_TFM_article.pdf
Boos ing ICD mul i-label classi ica ion o heal h eco ds
wi h con ex ual embeddings and label-g anula i y
Albe o Blancoa,∗, Ola z P´e ez de Vi˜nasp ea, Alicia P´e eza, A an za
Casillasa
aIXA Taldea. UPV-EHU. Manuel La dizabal Ibilbidea, 1, Donos ia 20018 Spain
Abs ac
Backg ound and Objec i e
This wo k deals wi h clinical ex mining, a ield o Na u al Language P o-
cessing applied o biomedical in o ma ics. The aim is o classi y Elec onic
Heal h Reco ds wi h espec o he In e na ional Classi ica ion o Diseases,
which is he ounda ion o he iden i ica ion o in e na ional heal h s a is-
ics, and he s anda d o epo ing diseases and heal h condi ions. Wi hin
he amewo k o da a mining, he goal is he mul i-label classi ica ion, as
each heal h eco d has assigned mul iple In e na ional Classi ica ion o Dis-
eases codes. We in es iga e i e Deep Lea ning a chi ec u es wi h a da ase
ob ained om he Basque Coun y Heal h Sys em, and six di e en pe spec-
i es de i ed om shi s in he inpu and he ou pu .
Me hods
We e alua e a Feed Fo wa d Neu al Ne wo k as he baseline and se e al Re-
cu en models based on he Bidi ec ional GRU a chi ec u e, pu ing ou
esea ch ocus on he ex ep esen a ion laye and es ing h ee a ian s,
om s anda d wo d embeddings o me a wo d embeddings echniques and
con ex ual embeddings.
Resul s
The esul s showed ha he ecu en models o e come he non- ecu en
model. The me a wo d embeddings echniques a e capable o bea ing he
∗Co esponding au ho : Albe o Blanco. IXA Taldea. UPV-EHU. Manuel La dizabal
Ibilbidea, 1, Donos ia 20018 Spain
Email add ess: [email p o ec ed] (Albe o Blanco)
This is he accep ed manusc ip o he a icle ha appea ed in inal o m in Compu e Me hods and P og ams in
Biomedicine 188 : (2020) // A icle ID 105264, which has been published in inal o m a h ps://doi.o g/10.1016/
j.cmpb.2019.105264 © 2019 Else ie unde CC BY-NC-ND license (h p://c ea i ecommons.o g/licenses/by-nc-
nd/4.0/)
s anda d wo d embeddings, bu he con ex ual embeddings exhibi as he
mos obus o he downs eam ask o e all. Addi ionally, he label-g anula i y
alone has an impac on he classi ica ion pe o mance.
Conclusions
The con ibu ions o his wo k a e a) a compa ison among i e classi ica ion
app oaches based on Deep Lea ning on a Spanish da ase o cope wi h he
mul i-label heal h ex classi ica ion p oblem; b) he s udy o he impac o
documen leng h and label-se size and g anula i y in he mul i-label con-
ex ; and c) he s udy o measu es o mi iga e mul i-label ex classi ica ion
p oblems ela ed o label-se size and spa seness.
Keywo ds: Elec onic Heal h Reco d, In e na ional Classi ica ion o
Diseases, Mul i-label classi ica ion, Recu en Neu al Ne wo ks, Con ex ual
embeddings, Label-g anula i y
1. In oduc ion1
Me hodical documen a ion o heal hca e da a is undamen al o pub-2
lic heal h. The In e na ional Classi ica ion o Diseases (ICD) is he3
s anda d diagnoses coding sys em o Elec onic Heal h Reco ds (EHR)4
classi ica ion. ICD se es, wo ldwide, o epidemiology, heal h managemen 5
and documen a ion pu poses. O e ime, se e al e sions ha e been de el-6
oped, being he ICD-10 h he cu en e sion. Rega ding he hospi al ne -7
wo k associa ed wi h he Spanish “Minis e io de Sanidad, Se icios Sociales8
e Igualdad”, om Janua y he 1s 2016, he clinical modi ica ion o he ICD-9
10 h is he e e ence e sion, adop ing he Spanish ansla ed CIE-10-ES10
a ian as he coding s anda d. The ICD-10 is designed as an alphanume ic11
code and i is a anged hie a chically [1]. Each code is buil by a se om 312
o 7 alphanume ic cha ac e s as shown in igu e 1.13
Figu e 1: ICD-10 code s uc u e.
2
In his pape we ackle he ask o au oma ically coding he diagnos ic14
e ms p esen in a ee- ex medical eco d acco ding o he ICD coding15
sys em. The ask is amed wi hin he Na u al Language P ocessing (NLP)16
ield. The pu pose is o de e mine which classes a e p esen in he inpu 17
ex . Ou app oach es s on machine lea ning, speci ically on supe ised18
mul i-label classi ica ion.19
Classi ica ion based solely on ex is an open challenge in a i icial in-20
elligence [2,3,4]. We aim o sol e a ex classi ica ion p oblem on medical21
ee- ex , EHRs ha p esen medical ja gon and clinical-speci ic language.22
Fu he mo e, EHRs o en con ain abb e ia ions ( equen ly non-s anda d),23
and misspellings a e also common. The leng h o he ex s plays an impo -24
an ole, he e we ace a b oad spec um, anging om a ew wo ds o se e al25
ens o lines. EHRs seldom exp ess clinical diagnoses as in he s anda d ICD.26
The leng h o he ex s plays an impo an ole, he e we ace a b oad27
spec um, anging om a ew wo ds o se e al ens o lines. EHRs seldom28
exp ess clinical diagnoses as in he s anda d ICD.29
An EHR could en ail many diagnos ics hence o h, mul iple ICD labels30
should be assigned. This ask, Mul i-label Classi ica ion, can be seen as a31
mul i-class classi ica ion (no bina y) ask in which he classes a e no mu u-32
ally exclusi e. Mul i-label classi ica ion ends o be, by a , mo e challenging33
han me e mul i-class classi ica ion. I s complexi y lies in he exponen ial34
g ow h o label combina ions. No e, as well, ha he numbe o labels asso-35
cia ed o each EHR is a iable.36
Mul i-label classi ica ion can be ackled wi h he so-called bina y ele-37
ance app oach. This simplis ic app oach consis s o using as many bina y38
classi ie s as ICD codes o de e mine i each ICD code is p esen o absen 39
om he EHR. The d awback o his app oach es s on he ac ha he40
model is no able o cap u e label-dependencies. While some diagnos ics41
a e p one o co-appea o he s a e incompa ible. Lea ning label-dependencies42
is c ucial o his ask. To his end, we explo e app oaches based on Deep43
Lea ning [5,6,7].44
The con ibu ion o his wo k is o explo e he impac o da ase cha -45
ac e is ics, such as he cha ac e iza ion o he inpu ex ocused on (ei he 46
ull documen o a pa o i ), on he p edic i e abili y o he mul i-label47
neu al models and also o assess he pe o mance wi h espec o label-se 48
ca dinali y and g anula i y. We deal wi h eal EHRs om Osakide za ( he49
3
Basque Public Heal h Sys em) w i en in Spanish1.50
2. Rela ed wo k51
Tex classi ica ion o EHRs is a demanding ask, hence mos wo ks ha e52
ocused on sho English ex s, hough on his wo k we deal wi h no el chal-53
lenges including long EHRs w i en in Spanish wi h housands o wo ds.54
Mul i-label classi ica ion is a challenging ask, especially when he num-55
be o labels is high [8,9,10]. The bina y ele ance app oach ans o ms56
he mul i-label p oblem in mul iple bina y classi ica ion p oblems [11], bu 57
dis ega d he dependencies among labels. Se e al wo ks ha e add essed he58
EHRs classi ica ion acco ding o he ICD [12,13,14,15]. Ye , li le a -59
en ion was paid o dense ea u es and o he app oaches ha could ake60
ad an age o hem. Fu he mo e, much unce ain y s ill exis s abou he61
in e -dependency o labels, ha could enhance he p edic ion pe o mance62
a oiding incong ui ies such as, o example, assigning an adul -speci ic dis-63
ease simul aneously wi h a childhood condi ion. On his wo k, we ackle64
he model and cap u e o label dependencies h ough Deep Lea ning models,65
le e aging he dense ou pu laye wi h Sigmoid ac i a ion unc ion.66
The ex classi ica ion ield has leap o wa d, om linea and p oba-67
bilis ic models o e hand-c a ed enginee ed ea u es [16,17] o non-linea 68
Neu al Ne wo k models and end- o-end lea n inhe en high-le el ex ep-69
esen a ions. I is shown good pe o mance wi h NN [18], as Con olu ional70
Neu al Ne wo ks [19], Recu en Neu al Ne wo ks [20] and Bidi ec ional Long71
Sho -Te m Memo y [21].72
Me hods o me a-embeddings aim o conduc a complemen a y combi-73
na ion o in o ma ion om an ensemble o dis inc wo d embeddings o yield74
an embedding se wi h enhanced quali y and cha ac e is ics o he seman ics75
cap u ed. Yin and Sch¨u ze [22] p esen ed, among o he s, he “conca ena ion”76
me hod, whe ein he me a-embedding is he conca ena ion o se e al embed-77
dings. Coa es and Bollegala [23] assu ed ha di ec a e aging o embedding78
can p o ide an app oxima ion o he e iciency o conca ena ion wi hou in-79
c easing he dimension o he embeddings.80
Con ex ep esen a ions a e i al o NLP asks such as ex classi ica-81
ion. To alle ia e his weakness p esen in gene ic wo d embeddings he con-82
ex ual embeddings eme ged. Melamud e al. p esen ed an unsupe ised83
1The da ase con ains sensi i e, con iden ial da a, and he e o e can no be eleased.
4
model o lea ning con ex embedding o wide con ex s o sen ences using84
bidi ec ional LSTMs. These embeddings a e dependen on he en i e co pus85
om which hey we e in e ed and ca y ein o ced con ex ual meaning. The86
ELMo [25] and BERT [26] ha e become s a e-o - he-a in con ex ual wo d87
ep esen a ions. Much unce ain y s ill exis s abou he ad an ages o apply-88
ing me a and con ex ual embeddings o e he s anda d op ions o clinical89
ex classi ica ion asks, and we ha e ound ha he con ex ual embeddings90
may gi e an ex a edge on he ICD classi ica ion.91
In he au oma ic ICD coding, he e a e also wo ks ha poin owa ds92
he Neu al Ne wo k end bu seems o all sho on he ield. These models93
manage o handle la ge amoun s o ex h ough a dense ep esen a ion o 94
wo ds. Nigam [27] ook ad an age o Recu en Neu al Ne wo ks o pe o m95
mul i-label classi ica ion. Bo h wo ks we e ca ied ou wi h discha ge sum-96
ma ies om he MIMIC-III [28] co pus. Recen ly, his ask has gained mo e97
a en ion h ough he CLEF eHeal h e alua ion labs. Suominen e al. [29]98
p esen ed an o e iew o he six h annual edi ion. The goal o one o he99
asks is o au oma ically assign ICD-10 codes o ew wo ds leng h ex s om100
ee- ex desc ip ions o causes o dea h as epo ed by physicians [30,31].101
The ask is simila o wha we ha e p esen ed on his wo k wi h he Di-102
agnos ic inpu pe spec i e, and he inding is ha he pe o mance o he103
classi ie s could be imp o ed employing he ull documen s.104
Spanish NLP is unde s ong g ow h, among o he s, d i en by he Plan105
de Tecnolog´ıas del Lenguage2. EHRs in Spanish a e cu en ly being collec ed106
[32,33], as well as complimen a y co po a including abs ac s [34]. These107
da a se s enable o de elop se e al asks e.g., Nega ion Ex ac ion [35], Ex-108
ac ion o Ad e se D ug Reac ions [36], Tex Classi ica ion [30,37,31], and109
Nega ion Cue De ec ion [38].110
3. Me hods111
We explo ed ou unique RNN model ins ances plus he baseline model, a112
Feed Fo wa d Neu al Ne wo k wi h Neu al-Ne Language Model (NNLM) as113
he ex ep esen a ion laye . The co e a chi ec u e is a Bidi ec ional Recu -114
en Neu al Ne wo k wi h GRU uni s and pooling echniques [39] (explained115
in sec ion 3.1). The co ne s one o he model is he wo d embedding laye ,116
2h ps://www.plan l.gob.es/ ecnologias-lenguaje/ac i idades/
in aes uc u as/Paginas/in aes uc u as-linguis icas.aspx
5

as i is esponsible o he exp essi eness o he inpu . Thus, we explo ed117
h ee a ian s: s anda d embeddings, me a-embeddings and con ex ual em-118
beddings (explained in dep h in sec ion 3.2). Toge he wi h his wo k, in an119
a emp o p omo e ep oducibili y, we eleased he so wa e package ha 120
we implemen ed3.121
3.1. Bidi ec ional Recu en Neu al Ne wo k wi h GRU uni s and pooling122
We applied a Bidi ec ional laye wi h GRU uni s, which le e ages se-123
quences o ex in o wa d and e e se o de wi h sepa a e hidden s a es,124
and whose ma hema ical o mula ion o he o wa d and backwa d hidden125
s a e and i s combina ion is shown in (1).126
−→
h( )=σ(−→
W x( )+−→
V−→
h( −1) +−→
b)
←−
h( )=σ(←−
W x( )+←−
V←−
h( −1) +←−
b)
h( )= [−→
h( ),←−
h( )]
(1)
The pa ame e s a e he weigh ma ices [−→
W , ←−
W] and [−→
V , ←−
V], and he127
bias e ms [−→
b , ←−
b]. The hidden-s a es a e compu ed h ough he non-linea 128
ac i a ion (σ) applied o he weigh ed sum be ween p e ious hidden-s a es129
[−→
h( −1),←−
h( −1)] and cu en inpu (x( )) wi h hei co esponding ma ices.130
Then, bo h hidden s a es a e combined wi h conca ena ion o p o ide he131
esul ing hidden s a e (h ).132
The ou pu o he Bidi ec ional RNN laye could be ed o he dense133
laye . Howe e , his can be compu a ionally challenging, due o he high134
numbe o pa ame e s. Lea ning a classi ie wi h oo many pa ame e s can135
be unwieldy, and can also be p one o o e - i ing. A popula echnique o136
deal wi h he high dimensionali y o he Bidi ec ional RNN laye ou pu is137
Pooling [40]. We applied a e age and max-pooling, known as 1-dimensional138
global pooling. The pooled ea u es a e conca ena ed and ed in o a inal139
ully-connec ed laye . This laye is esponsible o compu ing he p obabili y140
es ima ion o he labels i.e. ICD codes. Figu e 2shows he ull a chi ec u e141
3The so wa e is a ailable a h p://ixa2.si.ehu.es/p osamed/cmpICD_so and
can be downloaded wi h use CMPB and passwo d IXAcmpb. P o ided ha he so wa e is
used anyhow, his a icle should be ci ed.
6
o he Bidi ec ional Recu en Neu al Ne wo k wi h GRU uni s and pooling142
echniques, i.e. BiG u. The igu e shows a o wa d pass o an example ex .143
The ou pu o he Sigmoid unc ion is he p obabili y es ima ion o each144
label. The dep h o e e y laye indica es he ba ch size. The Recu en laye 145
is un olled, so si∀i∈sb ings he embedded ep esen a ion o he inpu 146
oken {s1=emb(“pa ien ”),s2=emb(“had”),s3=emb(“achalasia”)}.147
GRUGRU
GRUGRU
GRUGRU GRUGRU
["pa ien ", "had", "achalasia"]["pa ien ", "had", "achalasia"]
he
sue s
galls one
gou
achalasia
om
hospi al
he
sue s
galls one
gou
achalasia
om
hospi al
[-0.18, 1.86, ..., -0.22]
[-0.02, 1.88, ..., 0.63]
[ 1.34, 1.14, ..., 0.05]
[-0.39, 2.09, ..., -0.25]
[ 0.49, 1.40, ..., -0.07]
[-1.35, 1.56, ..., 2.82]
[-0.87, -0.45, ..., -1.67]
[-0.18, 1.86, ..., -0.22]
[-0.02, 1.88, ..., 0.63]
[ 1.34, 1.14, ..., 0.05]
[-0.39, 2.09, ..., -0.25]
[ 0.49, 1.40, ..., -0.07]
[-1.35, 1.56, ..., 2.82]
[-0.87, -0.45, ..., -1.67]
[-0.18, ..., -0.22]
[ 1.34, ..., 0.05]
[-1.35, ..., 2.82]
[-0.18, ..., -0.22]
[ 1.34, ..., 0.05]
[-1.35, ..., 2.82]
[ 0.75, ..., 1.25] [ 0.75, ..., 1.25]
← s eps → ← s eps →
[-1.63, ..., 2.11] [-1.63, ..., 2.11]
[ 0.75, ..., 1.25, ..., -1.63, ..., 2.11] [ 0.75, ..., 1.25, ..., -1.63, ..., 2.11]
[-0.36, ..., -0.23]
[ 1.64, ..., 0.32]
[-1.63, ..., 2.11]
[-0.36, ..., -0.23]
[ 1.64, ..., 0.32]
[-1.63, ..., 2.11]
[0.12, 0.54, 0.85] [0.12, 0.54, 0.85]
GRUGRU GRUGRU
INPUT INPUT INPUT
EMBEDDING LAYEREMBEDDING LAYEREMBEDDING LAYER
← embed_size → ← embed_size →
↑
o
c
a
b
↓
↑
o
c
a
b
↓
← embed_size → ← embed_size →
BIDIRECTIONAL GRU-RNNBIDIRECTIONAL GRU-RNNBIDIRECTIONAL GRU-RNN
← hidden_size →← hidden_size →← hidden_size → ← hidden_size →
CONCAT CONCAT CONCAT
← 2 * hidden_size → ← 2 * hidden_size →
FULLYFULLY
CONNECTEDCONNECTED
FULLY
CONNECTED
SIGMOIDSIGMOIDSIGMOID
← num_classes → ← num_classes →
↑
s
↓
↑
s
↓
← hidden_size → ← hidden_size →
s₁s₁s₁s₂s₂s₂s₃s₃s₃
MAX POOLINGMAX POOLINGMAX POOLING
AVERAGE AVERAGE
POOLINGPOOLING
AVERAGE
POOLING
↑
s
↓
↑
s
↓
EMBEDDED INPUTEMBEDDED INPUTEMBEDDED INPUT
Figu e 2: A chi ec u e: Bidi ec ional RNN wi h GRU uni s and pooling model.
The BiG u model can handle all he labels a once, ins ead o ollowing148
a bina y ele ance app oach, aining independen classi ie s o each label.149
The inal dense laye is able o cap u e and model he label dependencies,150
p oducing a non-mu ually exclusi e p obabili y es ima ion o each label wi h151
he Sigmoid ac i a ion unc ion [41].152
7
3.2. Comp ehensi e inpu cha ac e iza ion: embedding laye a ia ions153
A comp ehensi e inpu cha ac e iza ion is c ucial o a aining compe i-154
i e pe o mance. In he aining s age, he embedding laye holds mo e han155
90% o he model’s complexi y in e ms o pa ame e coun . Wha is mo e,156
he p edic i e capaci y es s on he abili y o he model o ex ac knowledge157
om he sou ce p o ided in he inpu s age. Thus, we paid special a en ion158
o his laye . The embedding laye om he igu e 2shows jus a anilla159
embedding laye ha we enhanced la e . Indeed, in his wo k we explo ed160
h ee a ia ions o he embedding laye : i) S anda d embeddings. ii) Me a161
embeddings (sec ions 3.2.1-3.2.2). iii) Con ex ual embeddings (sec ion 3.2.3)162
Mo eo e , acco ding o Yin e al. [42] and Coa es and Bollegala [23],163
di e en p e- ained wo d embeddings ha e subs an ial di e ences in quali y164
and cha ac e is ics o he wo d ep esen a ions. The consequence is some165
wo d embeddings pe o ming be e on some asks han in o he s. Bea ing166
all his in mind, in addi ion o a s anda d p e- ained embedding, we ied167
me a-embeddings, which a e ensemble app oaches (embedding conca ena ion168
and blending) wi h he hope o ge an embedding se wi h he imp o ed169
o e all quali y.170
We u ned o embeddings de i ed om as Tex [43] as he s anda d171
embeddings se up. As o me a-embeddings se up, we employed as Tex ,172
Wo d2Vec [44] and GloVe [45]. E e y embedding se is ained on he same173
co pus, he Spanish Billion Wo d Co pus [46].174
3.2.1. Embedding Conca ena ion175
The me a-embedding is compu ed as he conca ena ion o wo d embed-176
dings, based on he wo k by [22]. Be o e he conca ena ion, each embedding177
se mus be L2-no malized [6], so ha all he alues a e in he ange [−1,1]178
and, he e o e, e e y se con ibu es equally.179
The dimensionali y o he esul ing me a-embeddings is ˆ
dsk=ds1+· · · +180
dsi+dsnwi h dsibeing he dimension o he i- h se conca ena ed. I is181
impo an o no e ha he model’s complexi y inc eases wi h each added182
embedding se , as i inc eases he dimension o he ea u es o he embedding183
laye .184
3.2.2. Embeddings blending185
The me a-embedding a ian is compu ed as he a e age o he embed-186
dings in ol ed, based on he wo k by Coa es and Bollegala [23]. No e ha 187
e en ha ing embedding se s wi h ma ching numbe o dimensions (dsi=188
8
dsj∀i, j), each dimension among embeddings is no ela ed. In any case, a -189
e aging can p o ide an app oxima ion o he pe o mance o conca ena ion190
wi hou he expense o inc easing he dimension [23].191
3.2.3. Con ex ual embeddings192
Recen ly, app oaches ha imp o e he seman ic wo d ep esen a ion by193
le e aging he con ex o encode syn ac ical meaning and handle polysemy194
a e pushing he s a e-o - he-a . Regula wo d embedding echniques use all195
he occu ences o a wo d o ex ac a join ep esen a ion. Howe e , de-196
pending on he con ex , wo ds could ha e di e en meanings. Recen models197
exploi his easoning and p opose con ex ual wo d embeddings. The e is no198
longe a lookup able be ween wo ds and dense ep esen a ions. Ins ead, he199
wo d embedding is compu ed on he ly, aking ad an age o he con ex .200
Embeddings om Language Models (ELMo) [25] ep esen a ions a e ob-201
ained om a bidi ec ional Language Model (biLM) ha has ecen ly p o-202
duced s a e-o - he-a esul s in se e al NLP asks like Co e e ence Resolu-203
ion [47] o Na u al Language In e ence and Sen imen Analysis [25]. The204
embedding o a gi en wo d a ies om one sen ence o documen o ano he 205
wi h i s con ex . As i canno be p e-compu ed, he embedding compu a ion206
is done compu ing a o wa d p opaga ion o he model o each oken o each207
inpu sequence [48].208
4. Expe imen al amewo k209
4.1. Da a210
The da ase s used in ou expe imen s consis o EHRs w i en in Spanish211
om he Basque public heal h sys em (Osakide za). Speci ically, eme gency212
se ices discha ge summa ies om hospi als. The EHRs a e no s uc u ed213
and we e no w i en using empla es wi h sec ions. Table 1in oduces he214
de ails o he da ase used. The e a e 10,707 EHRs. As e ealed by he able,215
we conside ed se e al pe spec i es o he da ase by a ying wo ac o s, he216
inpu and he ou pu explained in wha ollows.217
9
Focusing on he documen inpu , we can obse e ha he beha iou o 297
e e y model is also simila , imp o ing esul s as he g anula i y dec eases.298
One key inding is ha he g anula i y has an impac alone. Wi h less299
g anula i y, he pe o mance inc eases, e en wi h mo e numbe o labels.300
This inding is depic ed by he si ua ion be ween he ull labels (n= 16)301
and he block labels (n= 19), whe e wi h he block labels he pe o mance302
imp o es despi e ha ing 3 mo e labels. This sugges s ha is possible o ge 303
models pe o ming be e wi h he same numbe o labels by jus dec easing304
he label g anula i y.305
4.2.3. Discussion306
Wi h his wo k we gained he ollowing insigh s: Despi e is a di icul 307
ask, Deep Lea ning ecu en models exhibi s ong p edic i e capabili ies308
and can be enhanced by mo e obus ex ep esen a ion echniques such as309
he me a o con ex ual embeddings.310
We a gue ha ou expe imen al esul s h ow one key inding: he g an-311
ula i y o he labels alone has an impac on pe o mance. The signi icance lies312
in he possibili y o pe o mance imp o emen by educing he g anula i y313
wi hou educing he label-se size.314
BiG u powe ed by ELMo is he dominan model in p ac ically e e y si u-315
a ion om bo h he inpu and ou pu pe spec i es (shown in able 2and ig-316
u e 5). Acco dingly, a pe -class e alua ion on he bes -pe o ming da ase 317
pe spec i e is shown in igu e 6.318
16

0
2000
4000
6000
numbe o samples
Class equency
I E Z J N F K R C G D M T B
0
20
40
60
80
100
1sco e
88.3%
73.5%
65.1%
85.5% 80.5% 77.0% 75.7%
51.4%
91.0%
60.5% 67.2%
54.5%
41.7%
54.3%
Figu e 6: Pe -class e alua ion o BiG u ELMo based on F-sco e and class equency o
he {Inp: Documen , Ou : Chap e }sub ask.
D aw a en ion o he ac ha he wo se pe o ming label T(“Inju y,319
poisoning and ce ain o he consequences o ex e nal causes”) ge s 41.7%320
F-sco e, while he bes -pe o ming label C(“Neoplasms”) eaches an ou -321
s anding 91%. Hal o he labels a e abo e 70% and he ≈30% o labels a e322
abo e 80%.323
To assess he s abili y o he models and he s a is ical signi icance o he324
esul s, we pe o med i e uns epea ing he expe imen al se wi h andom325
seeds and ound ha S de . among uns emained unde 0.5 o p ecision326
and ecall and 0.25 o F-sco e o e e y model and se up, which means ha 327
he gi en expe imen al esul s a e bo h ep oducible and ep esen a i e.328
5. Conclusions329
We p esen ed a se o Deep Lea ning me hods o ackle he NLP challenge330
o mul i-label ex classi ica ion wi h medical ee- ex : EHRs w i en in331
Spanish wi h da ase s om he Basque Coun y Heal h Sys em and classi ied332
acco ding o he ICD. Each EHR is assigned mul iple ICD codes, leading o333
mul i-label classi ica ion o ex .334
In his wo k we u ned o deep neu al models and we ound ha con ex-335
ual in o ma ion con eyed by he BiG u ELMo achie ed compe i i e esul s.336
17
BiG u, by con as o main app oaches seen in he li e a u e, has a mecha-337
nism o cope wi h label co-appa i ions and ega d diseases as ela ed.338
We wonde ed i he neu al models we e able o ex ac he in o ma ion339
om en i e EHRs o nea ly a housand wo ds o could be boos ed by selec -340
ing a small hough ep esen a i e sec ion (diagnoses). Expe imen al esul s341
showed ha i is wo hy p o iding he model wi h he ull documen as342
i migh con ey meaning ul in o ma ion. Pa icula ly, BiG u powe ed wi h343
con ex ual embeddings o m Elmo (BiG u+Elmo) ou pe o med he es o 344
he models explo ed. In ac , BiG u+Elmo ou pe o med e e y model in345
all he se ups. The di icul y o co ec ly p edic ing a label is no he same346
ac oss labels. A pe -class e alua ion e ealed he compe i i e pe o mance347
o his app oach on mino i y classes. Tha is, BiG u+Elmo esul ed obus 348
ega ding he class imbalance and, ob iously, le e aged equen ICDs.349
Finally, we explo ed he pe o mance a ained a ying he ou pu la-350
bel g anula i y ( ully-speci ied code, block, chap e ) and label-se ca dinali y351
( om 14 o 19). This is o in e es o decide whe he o c ea e a ully au o-352
ma ic ICD classi ica ion engine o , depending on he pe o mance equi ed,353
make he decision o le he model jus p edic a highe o de in he hie a chy.354
The e a e se e al open di ec ions o u u e wo k. Fi s , ou models355
le e age ELMo based con ex ual embeddings, bu he e a e o he no el ap-356
p oaches o con ex ual embeddings based on Language Models, like BERT357
[26]. Second, he co e a chi ec u e o his wo k is he Recu en Neu al Ne -358
wo k, bu he e a e o he in iguing a chi ec u es like Con olu ional Neu al359
Ne wo ks, especially Capsule Ne wo k [52] o he a chi ec u e behind BERT,360
he new RNN al e na i e p omising app oach called T ans o me [53]. Thi d,361
he me hods o add ess he ela ion among labels, such as s a is ical d i en362
app oaches (e.g., co ela ion analysis [54]) and s a egies le e aging he hi-363
e a chically s uc u ed ICD and ela ed on ologies (e.g., Hie a chical Mul i-364
label Classi ica ion [55] and SNOMED-CT [56]).365
Re e ences366
[1] W. H. O ganiza ion, In e na ional s a is ical classi ica ion o diseases367
and ela ed heal h p oblems, olume 1, Wo ld Heal h O ganiza ion,368
2004.369
[2] H. T. Madabushi, M. Lee, High accu acy ule-based ques ion classi ica-370
ion using ques ion syn ax and seman ics, in: P oceedings o COLING371
18
2016, he 26 h In e na ional Con e ence on Compu a ional Linguis ics:372
Technical Pape s, pp. 1220–1230.373
[3] D. Ce , Y. Yang, S.-y. Kong, N. Hua, N. Lim iaco, R. S . John, N. Con-374
s an , M. Guaja do-Cespedes, S. Yuan, C. Ta , B. S ope, R. Ku zweil,375
Uni e sal sen ence encode o English, in: P oceedings o he 2018 Con-376
e ence on Empi ical Me hods in Na u al Language P ocessing: Sys em377
Demons a ions, Associa ion o Compu a ional Linguis ics, B ussels,378
Belgium, 2018, pp. 169–174.379
[4] J. Howa d, S. Rude , Uni e sal language model ine- uning o ex clas-380
si ica ion, in: P oceedings o he 56 h Annual Mee ing o he Associa ion381
o Compu a ional Linguis ics (Volume 1: Long Pape s), olume 1, pp.382
328–339.383
[5] I. Good ellow, Y. Bengio, A. Cou ille, Deep Lea ning, MIT P ess, 2016.384
h p://www.deeplea ningbook.o g.385
[6] Y. Goldbe g, A p ime on neu al ne wo k models o na u al language386
p ocessing, Jou nal o A i icial In elligence Resea ch 57 (2016) 345–420.387
[7] Y. LeCun, Y. Bengio, G. Hin on, Deep lea ning, Na u e 521 (2015)388
436–444.389
[8] K. Bha ia, H. Jain, P. Ka , M. Va ma, P. Jain, Spa se local embeddings390
o ex eme mul i-label classi ica ion, in: Ad ances in neu al in o ma ion391
p ocessing sys ems, pp. 730–738.392
[9] H. Jain, Y. P abhu, M. Va ma, Ex eme mul i-label loss unc ions o 393
ecommenda ion, agging, anking & o he missing label applica ions,394
in: P oceedings o he 22nd ACM SIGKDD In e na ional Con e ence on395
Knowledge Disco e y and Da a Mining, ACM, pp. 935–944.396
[10] K. Jasinska, K. Dembczynski, R. Busa-Feke e, K. P annschmid ,397
T. Kle x, E. Hulle meie , Ex eme F-measu e maximiza ion using spa se398
p obabili y es ima es, in: In e na ional Con e ence on Machine Lea n-399
ing, pp. 1435–1444.400
[11] M.-L. Zhang, Y.-K. Li, X.-Y. Liu, X. Geng, Bina y ele ance o mul i-401
label lea ning: an o e iew, F on ie s o Compu e Science 12 (2018)402
191–202.403
19
[12] P. F anz, A. Zaiss, S. Schulz, U. Hahn, R. Kla , Au oma ed coding404
o diagnoses– h ee me hods compa ed., in: P oceedings o he AMIA405
Symposium, Ame ican Medical In o ma ics Associa ion, p. 250.406
[13] A. Pe o e, R. Pi o a o , K. Na a ajan, N. Weiskop , F. Wood, N. El-407
hadad, Diagnosis code assignmen : models and e alua ion me ics, Jou -408
nal o he Ame ican Medical In o ma ics Associa ion 21 (2013) 231–237.409
[14] M. Saeed, M. Villa oel, A. T. Reisne , G. Cli o d, L.-W. Lehman,410
G. Moody, T. Held , T. H. Kyaw, B. Moody, R. G. Ma k, Mul ipa am-411
e e In elligen Moni o ing in In ensi e Ca e II (MIMIC-II): a public-412
access in ensi e ca e uni da abase, C i ical ca e medicine 39 (2011)413
952.414
[15] J. P´e ez, A. P´e ez, A. Casillas, K. Gojenola, Ca diology eco d mul i-415
label classi ica ion using La en Di ichle Alloca ion, Compu e me hods416
and p og ams in biomedicine 164 (2018) 111–119.417
[16] T. Joachims, Tex ca ego iza ion wi h suppo ec o machines: Lea n-418
ing wi h many ele an ea u es, in: Eu opean con e ence on machine419
lea ning, Sp inge , pp. 137–142.420
[17] A. McCallum, K. Nigam, e al., A compa ison o e en models o nai e421
bayes ex classi ica ion, in: AAAI-98 wo kshop on lea ning o ex 422
ca ego iza ion, olume 752, Ci esee , pp. 41–48.423
[18] J. Nam, J. Kim, E. L. Menc´ıa, I. Gu e ych, J. F¨u nk anz, La ge-424
scale mul i-label ex classi ica ion e isi ing neu al ne wo ks, in: Join 425
eu opean con e ence on machine lea ning and knowledge disco e y in426
da abases, Sp inge , pp. 437–452.427
[19] Y. Kim, Con olu ional neu al ne wo ks o sen ence classi ica ion, in:428
P oceedings o he 2014 Con e ence on Empi ical Me hods in Na u al429
Language P ocessing (EMNLP), Associa ion o Compu a ional Linguis-430
ics, Doha, Qa a , 2014, pp. 1746–1751.431
[20] D. Tang, B. Qin, X. Feng, T. Liu, Ta ge -dependen sen imen classi i-432
ca ion wi h long sho e m memo y, CoRR, abs/1512.01100 (2015).433
[21] P. Zhou, Z. Qi, S. Zheng, J. Xu, H. Bao, B. Xu, Tex classi ica ion434
imp o ed by in eg a ing bidi ec ional LSTM wi h wo-dimensional max435
20
pooling, in: P oceedings o COLING 2016, he 26 h In e na ional Con-436
e ence on Compu a ional Linguis ics: Technical Pape s, The COLING437
2016 O ganizing Commi ee, Osaka, Japan, 2016, pp. 3485–3495.438
[22] W. Yin, H. Sch¨u ze, Lea ning wo d me a-embeddings, in: P oceed-439
ings o he 54 h Annual Mee ing o he Associa ion o Compu a ional440
Linguis ics (Volume 1: Long Pape s), Associa ion o Compu a ional441
Linguis ics, Be lin, Ge many, 2016, pp. 1351–1360.442
[23] J. Coa es, D. Bollegala, F us a ingly easy me a-embedding – compu ing443
me a-embeddings by a e aging sou ce wo d embeddings, in: P oceed-444
ings o he 2018 Con e ence o he No h Ame ican Chap e o he Asso-445
cia ion o Compu a ional Linguis ics: Human Language Technologies,446
Volume 2 (Sho Pape s), Associa ion o Compu a ional Linguis ics,447
New O leans, Louisiana, 2018, pp. 194–198.448
[24] O. Melamud, J. Goldbe ge , I. Dagan, Con ex 2Vec: Lea ning gene ic449
con ex embedding wi h bidi ec ional LSTM, in: P oceedings o The450
20 h SIGNLL Con e ence on Compu a ional Na u al Language Lea n-451
ing, pp. 51–61.452
[25] M. Pe e s, M. Neumann, M. Iyye , M. Ga dne , C. Cla k, K. Lee,453
L. Ze lemoye , Deep con ex ualized wo d ep esen a ions, in: P o-454
ceedings o he 2018 Con e ence o he No h Ame ican Chap e o he455
Associa ion o Compu a ional Linguis ics: Human Language Technolo-456
gies, Volume 1 (Long Pape s), Associa ion o Compu a ional Linguis-457
ics, New O leans, Louisiana, 2018, pp. 2227–2237.458
[26] J. De lin, M.-W. Chang, K. Lee, K. Tou ano a, Be : P e- aining459
o deep bidi ec ional ans o me s o language unde s anding, CoRR460
abs/1810.04805 (2018).461
[27] P. Nigam, Applying deep lea ning o ICD-9 mul i-label classi ica ion462
om medical eco ds, 2016.463
[28] A. E. Johnson, T. J. Polla d, L. Shen, H. L. Li-wei, M. Feng, M. Ghas-464
semi, B. Moody, P. Szolo i s, L. A. Celi, R. G. Ma k, MIMIC-III, a465
eely accessible c i ical ca e da abase, Scien i ic da a 3 (2016) 160035.466
[29] H. Suominen, L. Kelly, L. Goeu io , A. N´e ´eol, L. Ramadie , A. Robe ,467
E. Kanoulas, R. Spijke , L. Azzopa di, D. Li, e al., O e iew o he468
21

CLEF eHeal h E alua ion Lab 2018, in: In e na ional Con e ence o he469
C oss-Language E alua ion Fo um o Eu opean Languages, Sp inge ,470
pp. 286–301.471
[30] A. A u xa, A. Casillas, N. Ezeiza, V. F esno, I. Goenaga, K. Gojenola,472
R. Ma ´ınez, M. O. Ancho doqui, O. Pe ez-de Vi˜nasp e, IxaMed a 473
CLEF eHeal h 2018 Task 1: ICD10 coding wi h a Sequence- o-Sequence474
App oach., in: CLEF (Wo king No es), p. 1.475
[31] M. Almag o, S. Mon al o, A. D. de Ila aza, A. P´e ez, MAMTRA-MED476
a CLEF eHeal h 2018: A Combina ion o In o ma ion Re ie al Tech-477
niques and Neu al Ne wo ks o ICD-10 Coding o Dea h Ce i ica es.,478
in: CLEF (Wo king No es), p. 1.479
[32] M. O onoz, K. Gojenola, A. P´e ez, A. D. de Ila aza, A. Casillas, On he480
c ea ion o a clinical gold s anda d co pus in spanish: Mining ad e se481
d ug eac ions, Jou nal o biomedical in o ma ics 56 (2015) 318–332.482
[33] M. Ma imon, B. Fisas, N. Bel, J. Vi aldi, S. To ne , M. Lo en e,483
S. V´azquez, M. Villegas, The IULA T eebank., in: L ec, pp. 1920–484
1926.485
[34] A. Duque, M. S e enson, J. Ma inez-Romo, L. A aujo, Co-occu ence486
g aphs o wo d sense disambigua ion in he biomedical domain, A i i-487
cial in elligence in medicine 87 (2018) 9–19.488
[35] S. M. Jim´enez-Za a, M. Taul´e, M. T. Ma ´ın-Valdi ia, L. A. U e˜na-489
L´opez, M. A. Ma ´ı, SFU Re iew SP-NEG: a Spanish co pus anno a ed490
wi h nega ion o sen imen analysis. a ypology o nega ion pa e ns,491
Language Resou ces and E alua ion 52 (2018) 533–569.492
[36] S. San iso, A. P´e ez, A. Casillas, Explo ing Join AB-LSTM wi h em-493
bedded lemmas o Ad e se D ug Reac ion disco e y, IEEE jou nal o 494
biomedical and heal h in o ma ics (2018).495
[37] M. Almag o, R. Ma ´ınez Unanue, V. F esno Fe n´andez, S. Mon-496
al o He anz, Es udio p elimina de la ano aci´on au om´a ica de c´odigos497
CIE-10 en in o mes de al a hospi ala ios, SEPLN (2018).498
22
[38] H. Fab ega , A. Duque, J. Ma inez-Romo, L. A aujo, Ex ending a Deep499
Lea ning App oach o Nega ion Cues De ec ion in Spanish, in: P o-500
ceedings o he Ibe ian Languages E alua ion Fo um (Ibe LEF 2019).501
CEUR Wo kshop P oceedings, CEUR-WS, Bilbao, Spain, p. 1.502
[39] H. Sak, A. Senio , F. Beau ays, Long sho - e m memo y ecu en neu-503
al ne wo k a chi ec u es o la ge scale acous ic modeling, in: Fi een h504
annual con e ence o he in e na ional speech communica ion associa-505
ion, p. 1.506
[40] Y.-T. Zhou, R. Chellappa, Compu a ion o op ical low using a neu-507
al ne wo k, in: IEEE In e na ional Con e ence on Neu al Ne wo ks,508
olume 1998, pp. 71–78.509
[41] J. Liu, W.-C. Chang, Y. Wu, Y. Yang, Deep lea ning o ex eme mul i-510
label ex classi ica ion, in: P oceedings o he 40 h In e na ional ACM511
SIGIR Con e ence on Resea ch and De elopmen in In o ma ion Re-512
ie al, ACM, pp. 115–124.513
[42] Y. Yin, Y. Song, M. Zhang, Nnembs a seme al-2017 ask 4: Neu al514
wi e sen imen classi ica ion: a simple ensemble me hod wi h di e en 515
embeddings, in: P oceedings o he 11 h In e na ional Wo kshop on516
Seman ic E alua ion (SemE al-2017), pp. 621–625.517
[43] P. Bojanowski, E. G a e, A. Joulin, T. Mikolo , En iching wo d ec o s518
wi h subwo d in o ma ion, T ansac ions o he Associa ion o Compu-519
a ional Linguis ics 5 (2017) 135–146.520
[44] T. Mikolo , I. Su ske e , K. Chen, G. S. Co ado, J. Dean, Dis ibu ed521
ep esen a ions o wo ds and ph ases and hei composi ionali y, in:522
Ad ances in neu al in o ma ion p ocessing sys ems, pp. 3111–3119.523
[45] J. Penning on, R. Soche , C. Manning, Glo e: Global ec o s o wo d524
ep esen a ion, in: P oceedings o he 2014 con e ence on empi ical525
me hods in na u al language p ocessing (EMNLP), pp. 1532–1543.526
[46] C. Ca dellino, Spanish Billion Wo ds Co pus and Embeddings, 2016.527
[47] K. Lee, L. He, L. Ze lemoye , Highe -o de co e e ence esolu ion wi h528
coa se- o- ine in e ence, in: P oceedings o he 2018 Con e ence o he529
23
No h Ame ican Chap e o he Associa ion o Compu a ional Linguis-530
ics: Human Language Technologies, Volume 2 (Sho Pape s), Associ-531
a ion o Compu a ional Linguis ics, New O leans, Louisiana, 2018, pp.532
687–692.533
[48] M. Fa es, A. Ku uzo , S. Oepen, E. Velldal, Wo d ec o s, euse, and534
eplicabili y: Towa ds a communi y eposi o y o la ge- ex esou ces,535
in: P oceedings o he 21s No dic Con e ence on Compu a ional Lin-536
guis ics, Associa ion o Compu a ional Linguis ics, Go henbu g, Swe-537
den, 2017, pp. 271–276.538
[49] M. De mouche, J. Velcin, R. Flico eaux, S. Che e , N. Ta igh , Su-539
pe ised opic models o diagnosis code assignmen o discha ge sum-540
ma ies, in: In e na ional Con e ence on In elligen Tex P ocessing and541
Compu a ional Linguis ics, Sp inge , pp. 485–497.542
[50] C. Manning, P. Ragha an, H. Sch¨u ze, In oduc ion o in o ma ion543
e ie al, Na u al Language Enginee ing 16 (2010) 100–103.544
[51] R. Pascanu, T. Mikolo , Y. Bengio, On he di icul y o aining ecu -545
en neu al ne wo ks, in: In e na ional con e ence on machine lea ning,546
pp. 1310–1318.547
[52] S. Sabou , N. F oss , G. E. Hin on, Dynamic ou ing be ween capsules,548
in: Ad ances in neu al in o ma ion p ocessing sys ems, pp. 3856–3866.549
[53] A. Vaswani, N. Shazee , N. Pa ma , J. Uszko ei , L. Jones, A. N. Gomez,550
L. Kaise , I. Polosukhin, A en ion is all you need, in: Ad ances in551
Neu al In o ma ion P ocessing Sys ems, pp. 5998–6008.552
[54] Y. Zhang, J. Schneide , Mul i-label ou pu codes using canonical co e-553
la ion analysis, in: P oceedings o he ou een h in e na ional con e -554
ence on a i icial in elligence and s a is ics, pp. 873–882.555
[55] J. Weh mann, R. Ce i, R. Ba os, Hie a chical mul i-label classi ica ion556
ne wo ks, in: In e na ional Con e ence on Machine Lea ning, pp. 5225–557
5234.558
[56] K. Donnelly, Snomed-c : The ad anced e minology and coding sys em559
o eheal h, S udies in heal h echnology and in o ma ics 121 (2006)560
279.561
24