Translation Artifacts in Cross-lingual Transfer Learning

Author: Artetxe Zurutuza, Mikel,Labaka Intxauspe, Gorka,Agirre Bengoa, Eneko

Publisher: ACL

Year: 2020

DOI: 10.18653/v1/2020.emnlp-main.618

Source: https://addi.ehu.eus/bitstream/10810/69978/1/2020.emnlp-main.618.pdf

P oceedings o he 2020 Con e ence on Empi ical Me hods in Na u al Language P ocessing, pages 7674–7684,
No embe 16–20, 2020. c
2020 Associa ion o Compu a ional Linguis ics
7674
T ansla ion A i ac s in C oss-lingual T ans e Lea ning
Mikel A e xe, Go ka Labaka, Eneko Agi e
HiTZ Cen e
Uni e si y o he Basque Coun y (UPV/EHU)
{mikel.a e xe,go ka.labaka,e.agi e}@ehu.eus
Abs ac
Bo h human and machine ansla ion play a
cen al ole in c oss-lingual ans e lea ning:
many mul ilingual da ase s ha e been c ea ed
h ough p o essional ansla ion se ices, and
using machine ansla ion o ansla e ei he
he es se o he aining se is a widely used
ans e echnique. In his pape , we show ha
such ansla ion p ocess can in oduce sub le
a i ac s ha ha e a no able impac in exis ing
c oss-lingual models. Fo ins ance, in na u-
al language in e ence, ansla ing he p emise
and he hypo hesis independen ly can educe
he lexical o e lap be ween hem, which cu -
en models a e highly sensi i e o. We show
ha some p e ious indings in c oss-lingual
ans e lea ning need o be econside ed in he
ligh o his phenomenon. Based on he gained
insigh s, we also imp o e he s a e-o - he-a
in XNLI o he ansla e- es and ze o-sho ap-
p oaches by 4.3 and 2.8 poin s, espec i ely.
1 In oduc ion
While mos NLP esou ces a e English-speci ic,
he e ha e been se e al ecen e o s o build
mul ilingual benchma ks
. One possibili y is o
collec and anno a e da a in mul iple languages
sepa a ely (Cla k e al.,2020), bu mos exis -
ing da ase s ha e been c ea ed h ough ansla ion
(Conneau e al.,2018;A e xe e al.,2020). This ap-
p oach has wo desi able p ope ies: i elies on ex-
is ing p o essional ansla ion se ices a he han
equi ing expe ise in mul iple languages, and i
esul s in pa allel e alua ion se s ha o e a mean-
ing ul measu e o he c oss-lingual ans e gap
o di e en models. The esul ing mul ilingual
da ase s a e gene ally used o e alua ion only, e-
lying on exis ing English da ase s o aining.
Closely ela ed o ha ,
c oss-lingual ans e
lea ning
aims o le e age la ge da ase s a ail-
able in one language— ypically English— o build
mul ilingual models ha can gene alize o o he
languages. P e ious wo k has explo ed 3 main
app oaches o ha end: machine ansla ing he
es se in o English and using a monolingual En-
glish model (TRANSLATE-TEST), machine ansla -
ing he aining se in o each a ge language and
aining he models on hei espec i e languages
(TRANSLATE-TRAIN), o using English da a o ine-
une a mul ilingual model ha is hen ans e ed
o he es o languages (ZERO-SHOT).
The da ase c ea ion and ans e p ocedu es de-
sc ibed abo e esul in a
mix u e o o iginal,1
human ansla ed and machine ansla ed da a
when dealing wi h c oss-lingual models. In ac ,
he ype o ex a sys em is ained on does no
ypically ma ch he ype o ex i is exposed o a
es ime: TRANSLATE-TEST sys ems a e ained on
o iginal da a and e alua ed on machine ansla ed
es se s, ZERO-SHOT sys ems a e ained on o ig-
inal da a and e alua ed on human ansla ed es
se s, and TRANSLATE-TRAIN sys ems a e ained on
machine ansla ed da a and e alua ed on human
ansla ed es se s.
Despi e o e looked o da e, we show ha
such
misma ch has a no able impac
in he pe o -
mance o exis ing c oss-lingual models. By using
back- ansla ion (Senn ich e al.,2016) o pa a-
ph ase each aining ins ance, we ob ain ano he
English e sion o he aining se ha be e e-
sembles he es se , ob aining subs an ial imp o e-
men s o he TRANSLATE-TEST and ZERO-SHOT ap-
p oaches in c oss-lingual Na u al Language In e -
ence (NLI). While imp o emen s b ough by ma-
chine ansla ion ha e p e iously been a ibu ed
o da a augmen a ion (Singh e al.,2019), we e-
jec his hypo hesis and show ha he phenomenon
is only p esen in ansla ed es se s, bu no in
o iginal ones. Ins ead, ou analysis e eals ha
1We use he e m o iginal o e e o non- ansla ed ex .
7675
his beha io is caused by sub le
a i ac s a ising
om he ansla ion
p ocess i sel . In pa icula ,
we show ha ansla ing di e en pa s o each
ins ance sepa a ely (e.g., he p emise and he hy-
po hesis in NLI) can al e supe icial pa e ns in he
da a (e.g., he deg ee o lexical o e lap be ween
hem), which se e ely a ec s he gene aliza ion
abili y o cu en models. Based on he gained in-
sigh s, we imp o e he s a e-o - he-a in XNLI,
and show ha some p e ious indings need o be
econside ed in he ligh o his phenomenon.
2 Rela ed wo k
C oss-lingual ans e lea ning.
Cu en c oss-
lingual models wo k by p e- aining mul ilingual
ep esen a ions using some o m o language mod-
eling, which a e hen ine- uned on he ele an
ask and ans e ed o di e en languages. Some
au ho s le e age pa allel da a o ha end (Conneau
and Lample,2019;Huang e al.,2019), bu aining
a model akin o BERT (De lin e al.,2019) on he
combina ion o monolingual co po a in mul iple
languages is also e ec i e (Conneau e al.,2020).
Closely ela ed o ou wo k, Singh e al. (2019)
showed ha eplacing segmen s o he aining da a
wi h hei ansla ion du ing ine- uning is help-
ul. Howe e , hey a ibu e his beha io o a da a
augmen a ion e ec , which we belie e should be
econside ed gi en he new e idence we p o ide.
Mul ilingual benchma ks.
Mos benchma ks
co e ing a wide se o languages ha e been c e-
a ed h ough ansla ion, as i is he case o XNLI
(Conneau e al.,2018) o NLI, PAWS-X (Yang
e al.,2019) o ad e sa ial pa aph ase iden i ica-
ion, and XQuAD (A e xe e al.,2020) and MLQA
(Lewis e al.,2020) o Ques ion Answe ing (QA).
A no able excep ion is TyDi QA (Cla k e al.,2020),
a con empo aneous QA da ase ha was sepa a ely
anno a ed in 11 languages. O he c oss-lingual
da ase s le e age exis ing mul ilingual esou ces,
as i is he case o MLDoc (Schwenk and Li,2018)
o documen classi ica ion and Wikiann (Pan e al.,
2017) o named en i y ecogni ion. Concu en o
ou wo k, Hu e al. (2020) combine some o hese
da ase s in o a single mul ilingual benchma k, and
e alua e some well-known me hods on i .
Anno a ion a i ac s.
Se e al s udies ha e
shown ha NLI da ase s like SNLI (Bowman e al.,
2015) and Mul iNLI (Williams e al.,2018) con ain
spu ious pa e ns ha can be exploi ed o ob ain
s ong esul s wi hou making eal in e en ial deci-
sions. Fo ins ance, Gu u angan e al. (2018) and
Poliak e al. (2018) showed ha a hypo hesis-only
baseline pe o ms be e han chance due o cues on
hei lexical choice and sen ence leng h. Simila ly,
McCoy e al. (2019) showed ha NLI models end
o p edic en ailmen o sen ence pai s wi h a high
lexical o e lap. Se e al au ho s ha e wo ked on
ad e sa ial da ase s o diagnose hese issues and
p o ide a mo e challenging benchma k (Naik e al.,
2018;Glockne e al.,2018;Nie e al.,2020). Be-
sides NLI, o he asks like QA ha e also been ound
o be suscep ible o anno a ion a i ac s (Jia and
Liang,2017;Kaushik and Lip on,2018). While
p e ious wo k has ocused on he monolingual sce-
na io, we show ha ansla ion can in e e e wi h
hese a i ac s in mul ilingual se ings.
T ansla ionese.
T ansla ed ex s a e known o
ha e unique ea u es like simpli ica ion, explici a-
ion, no maliza ion and in e e ence, which a e e-
e o as ansla ionese (Volansky e al.,2013). This
phenomenon has been epo ed o ha e a no able
impac in machine ansla ion e alua ion (Zhang
and To al,2019;G aham e al.,2019). Fo ins ance,
back- ansla ion b ings la ge BLEU gains o e-
e sed es se s (i.e., when ansla ionese is on he
sou ce side and o iginal ex is used as e e ence),
bu i s e ec diminishes in he na u al di ec ion
(Eduno e al.,2020). While connec ed, he phe-
nomenon we analyze is di e en in ha i a ises
om ansla ion inconsis encies due o he lack o
con ex , and a ec s c oss-lingual ans e lea ning
a he han machine ansla ion.
3 Expe imen al design
Ou goal is o analyze he e ec o bo h human
and machine ansla ion in c oss-lingual models.
Fo ha pu pose, he co e idea o ou wo k is o (i)
use machine ansla ion o ei he ansla e he ain-
ing se in o o he languages, o gene a e English
pa aph ases o i h ough back- ansla ion, and (ii)
e alua e he esul ing sys ems on o iginal, human
ansla ed and machine ansla ed es se s in com-
pa ison wi h sys ems ained on o iginal da a. We
nex desc ibe he models used in ou expe imen s
(
§
3.1), he speci ic aining a ian s explo ed (
§
3.2),
and he e alua ion p ocedu e ollowed (§3.3).
3.1 Models and ans e me hods
We expe imen wi h wo models ha a e ep esen-
a i e o he s a e-o - he-a in monolingual and
7676
c oss-lingual p e- aining: (i) ROBERTA (Liu e al.,
2019), which is an imp o ed e sion o BERT ha
uses masked language modeling o p e- ain an En-
glish T ans o me model, and (ii) XLM-R (Conneau
e al.,2020), which is a mul ilingual ex ension o
he o me p e- ained on 100 languages. In bo h
cases, we use he la ge models eleased by he au-
ho s unde he ai seq eposi o y.
2
As discussed
nex , we explo e di e en a ian s o he aining
se o ine- une each model on di e en asks. A
es ime, we y bo h machine ansla ing he es
se in o English (TRANSLATE-TEST) and, in he case
o XLM-R, using he ac ual es se in he a ge
language (ZERO-SHOT).
3.2 T aining a ian s
We y 3 a ian s o each aining se o ine- une
ou models: (i) he o iginal one in English (ORIG),
(ii) an English pa aph ase o i gene a ed h ough
back- ansla ion using Spanish o Finnish as pi o
(BT-ES and BT-FI), and (iii) a machine ansla ed
e sion in Spanish o Finnish (MT-ES and MT-FI).
Fo sen ences occu ing mul iple imes in he ain-
ing se (e.g., p emises epea ed o mul iple hy-
po heses), we use he exac same ansla ion o
all occu ences, as ou goal is o unde s and he in-
he en e ec o ansla ion a he han i s po en ial
applica ion as a da a augmen a ion me hod.
In o de o ain he machine ansla ion sys ems
o MT-XX and BT-XX, we use he big T ans o me
model (Vaswani e al.,2017) wi h he same se ings
as O e al. (2018) and Sen encePiece okeniza-
ion (Kudo and Richa dson,2018) wi h a join o-
cabula y o 32k subwo ds. Fo English-Spanish,
we ain o 10 epochs on all pa allel da a om
WMT 2013 (Boja e al.,2013) and Pa aC awl
5.0 (Espl
`
a e al.,2019). Fo English-Finnish, we
ain o 40 epochs on Eu opa l and Wiki Ti les
om WMT 2019 (Ba aul e al.,2019), Pa aC awl
5.0, and DGT, EUbookshop and TildeMODEL
om OPUS (Tiedemann,2012). In bo h cases,
we emo e sen ences longe han 250 okens, wi h
a sou ce/ a ge a io exceeding 1.5, o o which
langid.py
(Lui and Baldwin,2012) p edic s a
di e en language, esul ing in a inal co pus size
o 48M and 7M sen ence pai s, espec i ely. We
use sampling decoding wi h a empe a u e o 0.5
o in e ence, which p oduces mo e di e se ansla-
ions han beam sea ch (Eduno e al.,2018) and
pe o med be e in ou p elimina y expe imen s.
2h ps://gi hub.com/py o ch/ ai seq
3.3 Tasks and e alua ion p ocedu e
We use he ollowing asks o ou expe imen s:
Na u al Language In e ence (NLI).
Gi en a
p emise and a hypo hesis, he ask is o de e mine
whe he he e is an en ailmen ,neu al o con a-
dic ion ela ion be ween hem. We ine- une ou
models on Mul iNLI (Williams e al.,2018) o 10
epochs using he same se ings as Liu e al. (2019).
In mos o ou expe imen s, we e alua e on XNLI
(Conneau e al.,2018), which comp ises 2490 de-
elopmen and 5010 es ins ances in 15 languages.
These we e o iginally anno a ed in English, and he
esul ing p emises and hypo heses we e indepen-
den ly ansla ed in o he es o he languages by
p o essional ansla o s. Fo he TRANSLATE-TEST
app oach, we use he machine ansla ed e sions
om he au ho s. Following Conneau e al. (2020),
we selec he bes epoch checkpoin acco ding o
he a e age accu acy in he de elopmen se .
Ques ion Answe ing (QA).
Gi en a con ex
pa ag aph and a ques ion, he ask is o iden i y
he span answe ing he ques ion in he con ex .
We ine- une ou models on SQuAD 1.1 (Ra-
jpu ka e al.,2016) o 2 epochs using he same
se ings as Liu e al. (2019), and epo es esul s
o he las epoch. We use wo da ase s o e al-
ua ion: XQuAD (A e xe e al.,2020), a subse
o he SQuAD de elopmen se ansla ed in o 10
o he languages, and MLQA (Lewis e al.,2020)
a da ase consis ing o pa allel con ex pa ag aphs
plus he co esponding ques ions anno a ed in En-
glish and ansla ed in o 6 o he languages. In bo h
cases, he ansla ion was done by p o essional
ansla o s a he documen le el (i.e., when ans-
la ing a ques ion, he ex answe ing i was also
shown). Fo ou BT-XX and MT-XX a ian s, we
ansla e he con ex pa ag aph and he ques ions
independen ly, and map he answe spans using he
same p ocedu e as Ca ino e al. (2020).
3
Fo he
TRANSLATE-TEST app oach, we use he o icial ma-
chine ansla ed e sions o MLQA, un in e ence
o e hem, and map he p edic ed answe spans
back o he a ge language.4
3
We use Fas Align (Dye e al.,2013) o wo d alignmen ,
and disca d he ew ques ions o which he mapping me hod
ails (when none o he okens in he answe span a e aligned).
4
We use he same p ocedu e as o he aining se excep
ha (i) gi en he small size o he es se , we combine i wi h
WikiMa ix (Schwenk e al.,2019) o aid wo d alignmen , (ii)
we use Jieba o Chinese segmen a ion ins ead o he Moses
okenize , and (iii) o he ew unaligned spans, we e u n he
English answe .
7677
Model T ain en es de el bg u a i h zh hi sw u a g
Tes se machine ansla ed in o English (TRANSLATE-TEST)
ROBERTA
ORIG 91.2 82.2 84.6 82.4 82.1 82.1 79.2 76.5 77.4 73.8 73.4 76.7 70.5 67.2 66.8 77.7 ±0.6
BT-ES 91.6 85.7 87.4 85.4 85.1 85.1 83.6 81.3 81.5 78.7 78.2 81.1 76.3 72.7 71.5 81.7 ±0.2
BT-FI 91.4 86.0 87.4 85.7 85.7 85.4 84.4 82.3 82.1 79.0 79.3 81.8 77.6 73.5 73.6 82.3 ±0.2
XLM-R
ORIG 90.3 82.2 84.2 82.6 81.9 82.0 79.3 76.7 77.5 75.0 73.7 77.5 70.9 67.8 67.2 77.9 ±0.3
BT-ES 90.2 84.1 86.3 84.5 84.5 84.1 82.2 79.6 80.7 78.5 77.3 80.8 75.2 72.5 71.2 80.8 ±0.3
BT-FI 89.5 84.9 85.5 84.5 84.5 84.6 82.9 80.6 81.4 78.9 78.1 81.5 76.3 73.3 72.5 81.3 ±0.2
MT-ES 89.8 83.2 85.6 84.2 84.0 83.6 81.6 78.4 79.3 77.6 76.7 80.0 74.3 71.3 70.1 80.0 ±0.6
MT-FI 89.8 84.4 85.3 84.7 84.1 84.0 82.0 79.8 80.3 77.4 77.7 80.6 74.7 71.8 71.3 80.5 ±0.3
Tes se in a ge language (ZERO-SHOT)
XLM-R
ORIG 90.4 84.4 85.5 84.3 81.9 83.6 80.1 80.1 79.8 81.8 78.3 80.3 77.7 72.8 74.5 81.0 ±0.2
BT-ES 90.2 86.0 86.9 86.5 84.0 85.3 83.2 82.5 82.7 83.7 80.7 83.0 79.7 75.6 77.1 83.1 ±0.2
BT-FI 89.5 86.0 86.2 86.2 83.9 85.1 83.4 82.2 83.0 83.9 81.2 83.9 80.1 75.2 78.1 83.2 ±0.1
MT-ES 89.9 85.7 87.3 85.6 83.9 85.4 82.9 82.0 82.3 83.6 80.0 82.6 79.9 75.5 76.8 82.9 ±0.4
MT-FI 90.2 85.9 86.9 86.5 84.4 85.5 83.4 83.0 82.4 83.6 80.5 83.6 80.4 76.5 77.9 83.4 ±0.2
Table 1: XNLI de esul s (acc). BT-XX and MT-XX consis en ly ou pe o m ORIG in all cases.
Bo h o NLI and QA, we un each sys em 5
imes wi h di e en andom seeds and epo he
a e age esul s. Space pe mi ing, we also epo
he s anda d de ia ion ac oss he 5 uns. In ou e-
sul ables, we use an unde line o highligh he bes
esul wi hin each block, and bold ace o highligh
he bes o e all esul .
4 NLI expe imen s
We nex discuss ou main esul s in he XNLI de el-
opmen se (
§
4.1,
§
4.2), un addi ional expe imen s
o be e unde s and he beha io o ou di e en
a ian s (
§
4.3,
§
4.4,
§
4.5), and compa e ou esul s
o p e ious wo k in he XNLI es se (§4.6).
4.1 TRANSLATE-TEST esul s
We s a by analyzing XNLI de elopmen esul s
o TRANSLATE-TEST. Recall ha , in his app oach,
he es se is machine ansla ed in o English, bu
aining is ypically done on o iginal English da a.
Ou BT-ES and BT-FI a ian s close his gap by
aining on a machine ansla ed English e sion o
he aining se gene a ed h ough back- ansla ion.
As shown in Table 1, his b ings subs an ial gains
o bo h ROBERTA and XLM-R, wi h an a e age im-
p o emen o 4.6 poin s in he bes case. Qui e e-
ma kably, MT-ES and MT-FI also ou pe o m ORIG
by a subs an ial ma gin, and a e only 0.8 poin s be-
low hei BT-ES and BT-FI coun e pa s. Recall ha ,
o hese wo sys ems, aining is done in machine
ansla ed Spanish o Finnish, while in e ence is
done in machine ansla ed English. This shows
ha he loss o pe o mance when gene alizing
om o iginal da a o machine ansla ed da a is
subs an ially la ge han he loss o pe o mance
when gene alizing om one language o ano he .
4.2 ZERO-SHOT esul s
We nex analyze he esul s o he ZERO-SHOT ap-
p oach. In his case, in e ence is done in he es se
in each a ge language which, in he case o XNLI,
was human ansla ed om English. As such, di -
e en om he TRANSLATE-TEST app oach, nei he
aining on o iginal da a (ORIG) no aining on ma-
chine ansla ed da a (BT-XX and MT-XX) makes
use o he exac same ype o ex ha he sys em
is exposed o a es ime. Howe e , as shown in
Table 1, bo h BT-XX and MT-XX ou pe o m ORIG
by app oxima ely 2 poin s, which sugges s ha ou
(back-) ansla ed e sions o he aining se a e
mo e simila o he human ansla ed es se s han
he o iginal one. This also p o ides a new pe -
spec i e on he TRANSLATE-TRAIN app oach, which
was epo ed o ou pe o m ORIG in p e ious wo k
(Conneau and Lample,2019): while he o iginal
mo i a ion was o ain he model on he same lan-
guage ha i is es ed on, ou esul s show ha
machine ansla ing he aining se is bene icial
e en when he a ge language is di e en .
4.3 O iginal s. ansla ed es se s
So as o unde s and whe he he imp o emen s ob-
se ed so a a e limi ed o ansla ed es se s o
apply mo e gene ally, we conduc addi ional ex-
pe imen s compa ing ansla ed es se s o o iginal
ones. Howe e , o he bes o ou knowledge, all
7678
XNLI de Ou da ase
OR HT OR HT MT
Model T ain (en) (es) (es) (en) (en)
ROBERTA
ORIG 92.1 - - 78.7 79.0
BT-ES 91.9 - - 80.3 80.5
BT-FI 91.4 - - 80.5 80.5
XLM-R
ORIG 90.5 85.5 81.0 77.5 78.5
BT-ES 90.3 87.1 81.4 78.6 79.4
BT-FI 89.7 86.5 80.8 78.8 79.2
MT-ES 90.2 87.5 81.3 78.4 78.9
MT-FI 90.4 87.1 81.1 78.3 78.9
Table 2: NLI esul s on o iginal (OR), human ans-
la ed (HT) and machine ansla ed (MT) se s (acc).
BT-XX
and
MT-XX
ou pe o m
ORIG
in ansla ed se s,
bu do no ge any clea imp o emen in o iginal ones.
exis ing non-English NLI benchma ks we e c e-
a ed h ough ansla ion. Fo ha eason, we build
a new es se ha mimics XNLI, bu is anno a ed
in Spanish a he han English. We i s collec he
p emises om a il e ed e sion o CommonC awl
(Buck e al.,2014), aking a subse o 5 websi es
ha ep esen a di e se se o gen es: a newspa-
pe , an economy o um, a celeb i y magazine, a
li e a u e blog, and a consume magazine. We hen
ask na i e Spanish anno a o s o gene a e an en ail-
men , a neu al and a con adic ion hypo hesis o
each p emise.
5
We collec a o al o 2490 exam-
ples using his p ocedu e, which is he same size
as he XNLI de elopmen se . Finally, we c ea e a
human ansla ed and a machine ansla ed English
e sion o he da ase using p o essional ansla o s
om Gengo and ou machine ansla ion sys em
desc ibed in
§
3.2,
6
espec i ely. We epo esul s
o he bes epoch checkpoin on each se .
As shown in Table 2, bo h BT-XX and MT-XX
clea ly ou pe o m ORIG in all es se s c ea ed
h ough ansla ion, which is consis en wi h ou
p e ious esul s. In con as , he bes esul s on
he o iginal English se a e ob ained by ORIG, and
nei he BT-XX no MT-XX ob ain any clea imp o e-
men on he one in Spanish ei he .
7
This con i ms
ha he unde lying phenomenon is limi ed o ans-
la ed es se s. In addi ion, i is wo h men ioning
ha he esul s o he machine ansla ed es se in
English a e sligh ly be e han hose o he human
5
Unlike XNLI, we do no collec 4 addi ional labels o
each example. No e, howe e , ha XNLI kep he o iginal
label as he gold s anda d, so he addi ional labels a e i ele an
o he ac ual e alua ion. This is no en i ely clea in Conneau
e al. (2018), bu can be e i ied by inspec ing he da ase .
6We use beam sea ch ins ead o sampling decoding.
7No e ha he s anda d de ia ions a e a ound 0.3.
Compe ence Dis ac ion Noise
Model T ain AT NR WO NG LN SE
ROBERTA ORIG 72.9 65.7 64.9 59.1 88.4 86.5
BT-FI 56.6 57.2 80.6 67.8 87.7 86.6
XLM-R
ORIG 78.4 56.8 67.3 61.2 86.8 85.3
BT-FI 60.6 51.7 76.7 64.6 86.2 85.4
MT-FI 64.3 50.3 77.8 68.5 86.4 85.3
Table 3: NLI S ess Tes esul s (combined ma ched
& misma ched acc). AT = an onymy, NR = nume ical
easoning, WO = wo d o e lap, NG = nega ion, LN =
leng h misma ch, SE = spelling e o .
BT-FI
and
MT-FI
a e conside ably weake han
ORIG
in he compe ence
es , bu subs an ially s onge in he dis ac ion es .
ansla ed one, which sugges s ha he di icul y
o he ask does no only depend on he ansla ion
quali y. Finally, i is also in e es ing ha MT-ES is
only ma ginally be e han MT-FI in bo h Spanish
es se s, e en i i co esponds o he TRANSLATE-
TRAIN app oach, whe eas MT-FI needs o ZERO-SHOT
ans e om Finnish in o Spanish. This ein o ces
he idea ha i is aining on ansla ed da a a he
han aining on he a ge language ha is key in
TRANSLATE-TRAIN.
4.4 S ess es s
In o de o be e unde s and how sys ems ained
on o iginal and ansla ed da a di e , we un addi-
ional expe imen s on he NLI S ess Tes s (Naik
e al.,2018), which we e designed o es he o-
bus ness o NLI models o speci ic linguis ic phe-
nomena in English. The benchma k consis s o a
compe ence es , which e alua es he abili y o un-
de s and an onymy ela ion and pe o m nume ical
easoning, a dis ac ion es , which e alua es he
obus ness o shallow pa e ns like lexical o e lap
and he p esence o nega ion wo ds, and a noise
es , which e alua es obus ness o spelling e o s.
Jus as wi h p e ious expe imen s, we epo esul s
o he bes epoch checkpoin in each es se .
As shown in Table 3,ORIG ou pe o ms BT-FI
and MT-FI on he compe ence es by a la ge ma -
gin, bu he opposi e is ue on he dis ac ion es .
8
In pa icula , ou esul s show ha BT-FI and MT-FI
a e less elian on lexical o e lap and he p esence
o nega i e wo ds. This eels in ui i e, as ansla -
ing he p emise and hypo hesis independen ly—as
BT-FI and MT-FI do—is likely o educe he lexical
o e lap be ween hem. Mo e gene ally, he ans-
8
We obse e simila ends o BT-ES and MT-ES, bu
omi hese esul s o conciseness.

7679
la ion p ocess can al e simila supe icial pa e ns
in he da a, which NLI models a e sensi i e o (
§
2).
This would explain why he esul ing models ha e
a di e en beha io on di e en s ess es s.
4.5 Ou pu class dis ibu ion
Wi h he aim o unde s and he e ec o he p e i-
ous phenomenon in c oss-lingual se ings, we look
a he ou pu class dis ibu ion o ou di e en mod-
els in he XNLI de elopmen se . As shown in Ta-
ble 4, he p edic ions o all sys ems a e close o he
ue class dis ibu ion in he case o English. Ne -
e heless, ORIG is s ongly biased o he es o lan-
guages, and ends o unde p edic en ailmen and
o e p edic neu al. This can again be a ibu ed o
he ac ha he English es se is o iginal, whe eas
he es a e human ansla ed. In pa icula , i is
well-known ha NLI models end o p edic en ail-
men when he e is a high lexical o e lap be ween
he p emise and he hypo hesis (
§
2). Howe e , he
deg ee o o e lap will be smalle in he human
ansla ed es se s gi en ha he p emise and he
hypo hesis we e ansla ed independen ly, which
explains why en ailmen is unde p edic ed. In con-
as , BT-FI and MT-FI a e exposed o he exac same
phenomenon du ing aining, which explains why
hey a e no ha hea ily a ec ed.
So as o measu e he impac o his phenomenon,
we explo e a simple app oach o co ec his bias:
ha ing ine- uned each model, we adjus he bias
e m added o he logi o each class so he model
p edic ions ma ch he ue class dis ibu ion o
each language.
9
As shown in Table 5, his b ings
la ge imp o emen s o ORIG, bu is less e ec i e
o BT-FI and MT-FI.
10
This shows ha he pe o -
mance o ORIG was conside ably hinde ed by his
bias, which BT-FI and MT-FI e ec i ely mi iga e.
4.6 Compa ison wi h he s a e-o - he-a
So as o pu ou esul s in o pe spec i e, we com-
pa e ou bes a ian o p e ious wo k on he XNLI
es se . As shown in Table 6, ou me hod imp o es
he s a e-o - he-a o bo h he TRANSLATE-TEST and
he ZERO-SHOT app oaches by 4.3 and 2.8 poin s,
9
We achie e his using an i e a i e p ocedu e whe e, a
each s ep, we selec one class and se i s bias e m so he class
is selec ed o he igh pe cen age o examples.
10
No e ha we a e adjus ing he bias e m in he e alua ion
se i sel , which equi es knowing i s class dis ibu ion and
is hus a o m o chea ing. While use ul o analysis, a ai
compa ison would equi e adjus ing he bias e m in a sepa a e
alida ion se . This is wha we do o ou inal esul s in
§
4.6,
whe e we adjus he bias e m in he XNLI de elopmen se
and epo esul s on he XNLI es se .
EN EN →XX (a g)
Model T ain en neu con en neu con
ROBERTA
( ansla e- es )
ORIG 33.4 32.8 33.8 23.2 40.7 36.1
BT-FI 34.5 31.9 33.6 30.2 35.7 34.1
XLM-R
(ze o-sho )
ORIG 32.4 33.2 34.4 27.0 37.8 35.2
BT-FI 34.3 31.6 34.1 33.1 32.9 34.0
MT-FI 33.6 32.6 33.9 30.8 35.3 33.9
Gold S anda d 33.3 33.3 33.3 33.3 33.3 33.3
Table 4: Ou pu class dis ibu ion on XNLI de . All
sys ems a e close o he ue dis ibu ion in English, bu
ORIG
is biased owa d neu and con in he ans e lan-
guages. BT-FI and MT-FI alle ia e his issue.
Model T ain Base Unbias +∆
ROBERTA
( ansla e- es )
ORIG 77.7 ±0.6 80.6 ±0.2 2.9 ±0.5
BT-FI 82.3 ±0.2 82.8 ±0.1 0.4 ±0.2
XLM-R
(ze o-sho )
ORIG 81.0 ±0.2 82.4 ±0.2 1.4 ±0.3
BT-FI 83.2 ±0.1 83.3 ±0.1 0.1 ±0.1
MT-FI 83.4 ±0.2 83.8 ±0.1 0.4 ±0.2
Table 5: XNLI de esul s wi h class dis ibu ion un-
biasing (a e age acc ac oss all languages). Adjus ing
he bias e m o he classi ie o ma ch he ue class
dis ibu ion b ings la ge imp o emen s o
ORIG
, bu is
less e ec i e o BT-FI and MT-FI.
espec i ely. I also ob ains he bes o e all esul s
published o da e, wi h he addi ional ad an age
ha he p e ious s a e-o - he-a equi ed a ma-
chine ansla ion sys em be ween English and each
o he 14 a ge languages, whe eas ou me hod
uses a single machine ansla ion sys em be ween
English and Finnish (which is no one o he a ge
languages). While he main goal o ou wo k is no
o design be e c oss-lingual models, bu o ana-
lyze hei beha io in connec ion o ansla ion, his
shows ha he phenomenon unde s udy is highly
ele an , o he ex en ha i can be exploi ed o
imp o e he s a e-o - he-a .
5 QA expe imen s
So as o unde s and whe he ou p e ious indings
apply o o he asks besides NLI, we un addi-
ional expe imen s on QA. As shown in Table 7,
BT-FI and BT-ES do indeed ou pe o m ORIG o he
TRANSLATE-TEST app oach on MLQA. The imp o e-
men is modes , bu e y consis en ac oss di e en
languages, models and uns. The esul s o MT-ES
and MT-FI a e less conclusi e, p esumably because
mapping he answe spans ac oss languages migh
in oduce some noise. In con as , we do no ob-
7680
Model en es de el bg u a i h zh hi sw u a g
Fine- une an English model and machine ansla e he es se in o English (TRANSLATE-TEST)
BERT (De lin e al.,2019) 88.8 81.4 82.3 80.1 80.3 80.9 76.2 76.0 75.4 72.0 71.9 75.6 70.0 65.8 65.8 76.2
Robe a (Liu e al.,2019)91.3 82.9 84.3 81.2 81.7 83.1 78.3 76.8 76.6 74.2 74.1 77.5 70.9 66.7 66.8 77.8
P oposed (ROBERTA –BT-FI) 90.6 85.4 86.3 84.3 85.2 85.7 82.3 80.6 81.5 77.8 78.6 81.2 77.1 73.5 72.3 81.5
+ Unbiasing ( uned in de ) 90.5 85.8 86.6 84.6 85.5 85.8 82.9 81.2 82.3 78.7 79.7 82.3 77.6 74.4 72.9 82.1
Fine- une a mul ilingual model on all machine ansla ed aining se s (TRANSLATE-TRAIN-ALL)
Unicode (Huang e al.,2019) 85.6 81.1 82.3 80.9 79.5 81.4 79.7 76.8 78.2 77.9 77.1 80.5 73.4 73.8 69.6 78.5
XLM-R (Conneau e al.,2020) 88.7 85.2 85.6 84.6 83.6 85.5 82.4 81.6 80.9 83.4 80.9 83.3 79.8 75.9 74.3 82.4
Fine- une a mul ilingual model on he English aining se (ZERO-SHOT)
mBERT (De lin e al.,2019) 82.1 73.8 74.3 71.1 66.4 68.9 69.0 61.6 64.9 69.5 55.8 69.3 60.0 50.4 58.0 66.3
XLM (Conneau and Lample,2019) 85.0 78.7 78.9 77.8 76.6 77.4 75.3 72.5 73.1 76.1 73.2 76.5 69.6 68.4 67.3 75.1
Unicode (Huang e al.,2019) 85.1 79.0 79.4 77.8 77.2 77.2 76.3 72.8 73.5 76.4 73.6 76.2 69.4 69.7 66.7 75.4
XLM-R (Conneau e al.,2020) 88.8 83.6 84.2 82.7 82.3 83.1 80.1 79.0 78.8 79.7 78.6 80.2 75.8 72.0 71.7 80.1
P oposed (XLM-R –MT-FI) 88.8 84.8 85.7 84.6 84.2 85.7 82.9 81.8 82.0 82.1 79.9 81.8 79.8 75.9 76.7 82.4
+ Unbiasing ( uned in de ) 88.7 85.0 86.1 84.8 84.8 86.1 83.5 82.2 82.4 83.0 80.8 82.6 80.3 76.0 77.3 82.9
Table 6: XNLI es esul s (acc). Resul s o o he me hods a e aken om hei espec i e pape s o , i no
p o ided, om Conneau e al. (2020). Fo hose wi h mul iple a ian s, we selec he one wi h he bes esul s.
se e any clea imp o emen o he ZERO-SHOT
app oach on his da ase . Ou XQuAD esul s in
Table 8a e mo e posi i e, bu s ill inconclusi e.
These esul s can pa ly be explained by he
ansla ion p ocedu e used o c ea e he di e en
benchma ks: he p emises and hypo heses o XNLI
we e ansla ed independen ly, whe eas he ques-
ions and con ex pa ag aphs o XQuAD we e ans-
la ed oge he . Simila ly, MLQA made use o pa -
allel con ex s, and ansla o s we e shown he sen-
ence con aining each answe when ansla ing he
co esponding ques ion. As a esul , one can ex-
pec bo h QA benchma ks o ha e mo e consis en
ansla ions han XNLI, which would in u n di-
minish his phenomenon. In con as , he ques ions
and con ex pa ag aphs a e independen ly ans-
la ed when using machine ansla ion, which ex-
plains why BT-ES and BT-FI ou pe o m ORIG o
he TRANSLATE-TEST app oach. We conclude ha
he ansla ion a i ac s e ealed by ou analysis a e
no exclusi e o NLI, as hey also show up on QA
o he TRANSLATE-TEST app oach, bu hei ac ual
impac can be highly dependen on he ansla ion
p ocedu e used and he na u e o he ask.
6 Discussion
Ou analysis p omp s o econside p e ious ind-
ings in c oss-lingual ans e lea ning as ollows:
The c oss-lingual ans e gap on XNLI was
o e es ima ed.
Gi en he pa allel na u e o
XNLI, accu acy di e ences ac oss languages a e
commonly in e p e ed as he loss o pe o mance
when gene alizing om English o he es o lan-
guages. Howe e , ou wo k shows ha he e is
ano he ac o ha can ha e a much la ge impac :
he loss o pe o mance when gene alizing om
o iginal o ansla ed da a. Ou esul s sugges ha
he eal c oss-lingual gene aliza ion abili y o XLM-
Ris conside ably be e han wha he accu acy
numbe s in XNLI e lec .
O e coming he c oss-lingual gap is no wha
makes TRANSLATE-TRAIN wo k.
The o iginal
mo i a ion o TRANSLATE-TRAIN was o ain he
model on he same language i is es ed on. How-
e e , we show ha i is aining on ansla ed da a,
a he han aining on he a ge language, ha is
key o his app oach o ou pe o m ZERO-SHOT as
epo ed by p e ious au ho s.
Imp o emen s p e iously a ibu ed o da a
augmen a ion should be econside ed.
The
me hod by Singh e al. (2019) combines machine
ansla ed p emises and hypo heses in di e en
languages (
§
2), esul ing in an e ec simila o
BT-XX and MT-XX. As such, we belie e ha his
me hod should be analyzed om he poin o iew
o da ase a i ac s a he han da a augmen a ion,
as he au ho s do.
11
F om his pe spec i e, ha ing
he p emise and he hypo heses in di e en lan-
guages can educe he supe icial pa e ns be ween
hem, which would explain why his app oach is
be e han using examples in a single language.
11
Recall ha ou expe imen al design p e en s a da a aug-
men a ion e ec , in ha he numbe o unique sen ences and
examples used o aining is always he same (§3.2).
7681
Model T ain en es de a i zh hi a g
Tes se machine ansla ed in o English (TRANSLATE-TEST)
ROBERTA
ORIG 84.7 /71.4 70.1 / 49.7 60.5 / 41.2 55.7 / 32.5 65.6 / 40.8 53.5 / 26.0 42.7 / 20.7 61.8 ±0.1 / 40.3 ±0.2
BT-ES 84.4 / 71.2 70.9 / 50.7 61.0 / 41.6 56.5 / 33.3 66.7 / 41.8 54.4 / 27.1 43.0 / 21.1 62.4 ±0.1 / 41.0 ±0.2
BT-FI 83.8 / 70.4 70.3 / 50.1 61.1 / 41.9 56.5 / 33.4 66.8 / 42.1 54.9 / 27.5 42.8 / 21.3 62.3 ±0.1 / 40.9 ±0.2
XLM-R
ORIG 84.1 / 71.0 69.9 / 49.2 60.8 / 42.5 55.2 / 31.8 65.4 / 40.6 54.3 / 27.8 43.6 / 21.3 61.9 ±0.1 / 40.6 ±0.1
BT-ES 83.8 / 70.8 70.5 / 50.0 61.4 / 43.5 56.1 / 33.1 66.5 / 41.6 55.4 / 29.0 44.0 / 22.2 62.5 ±0.2 / 41.5 ±0.2
BT-FI 82.7 / 69.6 70.0 / 49.7 61.1 / 43.3 56.0 / 33.1 66.2 / 41.5 55.6 / 29.2 43.7 / 22.0 62.2 ±0.1 / 41.2 ±0.2
MT-ES 83.4 / 69.7 70.0 / 49.1 61.0 / 42.7 55.6 / 32.2 65.9 / 40.9 54.9 / 28.1 43.9 / 21.6 62.1 ±0.3 / 40.6 ±0.2
MT-FI 82.6 / 69.0 69.7 / 48.6 61.0 / 42.8 55.7 / 32.3 65.8 / 40.9 54.8 / 27.9 43.9 / 21.6 61.9 ±0.3 / 40.4 ±0.2
Tes se in a ge language (ZERO-SHOT)
XLM-R
ORIG 84.1 / 71.0 74.5 / 56.3 70.3 / 55.1 66.5 / 45.9 74.3 / 53.1 67.8 / 43.4 71.6 / 53.4 72.7 ±0.1 / 54.0 ±0.1
BT-ES 83.8 / 70.8 74.7 / 56.8 70.3 / 55.2 66.9 / 46.5 74.3 / 53.0 68.2 /43.8 71.4 / 53.6 72.8 ±0.2 /54.3 ±0.2
BT-FI 82.7 / 69.6 74.1 / 56.3 69.8 / 54.5 66.6 / 46.0 73.3 / 52.3 67.9 / 43.4 71.0 / 53.2 72.2 ±0.2 / 53.6 ±0.2
MT-ES 83.4 / 69.7 75.2 /57.3 70.5 / 55.1 67.5 /46.5 74.5 /53.2 67.5 / 42.5 71.7 / 52.7 72.9 ±0.3 / 53.9 ±0.4
MT-FI 82.6 / 69.0 74.1 / 56.0 70.2 / 54.6 66.9 / 46.0 73.7 / 52.6 67.2 / 41.5 71.9 / 53.4 72.4 ±0.2 / 53.3 ±0.4
Table 7: MLQA es esul s (F1 / exac ma ch).
Model T ain en es de el u a i h zh hi a g
XLM-R
(ze o-sho )
ORIG 88.2 82.7 80.8 80.9 80.1 76.1 76.0 80.1 75.4 71.9 76.4 79.0 ±0.2
BT-ES 87.9 83.5 80.5 81.2 80.7 76.8 77.4 80.2 76.4 73.0 76.9 79.5 ±0.3
BT-FI 87.1 82.5 80.2 80.7 79.8 75.7 76.6 79.4 75.7 71.5 76.8 78.7 ±0.3
MT-ES 87.1 84.1 80.3 81.2 80.1 76.0 77.4 80.9 76.7 72.7 77.1 79.4 ±0.3
MT-FI 86.3 81.4 80.2 80.5 80.2 76.6 77.0 80.3 77.6 74.5 77.8 79.3 ±0.2
Table 8: XQuAD esul s (F1). Resul s o he exac ma ch me ic a e simila .
The po en ial o TRANSLATE-TEST was unde es i-
ma ed.
The p e ious bes esul s o TRANSLATE-
TEST on XNLI lagged behind he s a e-o - he-a
by 4.6 poin s. Ou wo k educes his gap o only
0.8 poin s by add essing he unde lying ansla-
ion a i ac s. The eason why TRANSLATE-TEST
is mo e se e ely a ec ed by his phenomenon is
wo old: (i) he e ec is doubled by i s using
human ansla ion o c ea e he es se and hen ma-
chine ansla ion o ansla e i back o English, and
(ii) TRANSLATE-TRAIN was inad e en ly mi iga ing
his issue (see abo e), bu equi alen echniques
we e ne e applied o TRANSLATE-TEST.
Fu u e e alua ion should be e accoun o
ansla ion a i ac s.
The e alua ion issues
aised by ou analysis do no ha e a simple so-
lu ion. In ac , while we use he e m ansla ion
a i ac s o highligh ha hey a e an unin ended
e ec o ansla ion ha impac s inal e alua ion,
one could also a gue ha i is he o iginal da ase s
ha con ain he a i ac s, which ansla ion simply
al e s o e en mi iga es.
12
In any case, his is a
mo e gene al issue ha alls beyond he scope o
12
Fo ins ance, he high lexical o e lap obse ed o he
en ailmen class is usually ega ded a spu ious pa e n, so
educing i could be conside ed a posi i e e ec o ansla ion.
c oss-lingual ans e lea ning, so we a gue ha
i should be ca e ully con olled when e alua ing
c oss-lingual models. In he absence o mo e obus
da ase s, we ecommend ha u u e mul ilingual
benchma ks should a leas p o ide consis en es
se s o English and he es o languages. This
can be achie ed by (i) using o iginal anno a ions
in all languages, (ii) using o iginal anno a ions in
a non-English language and ansla ing hem in o
English and o he languages, o (iii) i ansla ing
om English, doing so a he documen le el o
minimize ansla ion inconsis encies.
7 Conclusions
In his pape , we ha e shown ha bo h human and
machine ansla ion can al e supe icial pa e ns in
da a, which equi es econside ing p e ious ind-
ings in c oss-lingual ans e lea ning. Based on he
gained insigh s, we ha e imp o ed he s a e-o - he-
a in XNLI o he TRANSLATE-TEST and ZERO-SHOT
app oaches by a subs an ial ma gin. Finally, we
ha e shown ha he phenomenon is no speci ic
o NLI bu also a ec s QA, al hough i is less p o-
nounced he e hanks o he ansla ion p ocedu e
used in he co esponding benchma ks. So as o
acili a e simila s udies in he u u e, we elease
7682
ou NLI da ase ,
13
which, unlike p e ious bench-
ma ks, was anno a ed in a non-English language
and human ansla ed in o English.
Acknowledgmen s
We hank No a A anbe i and Uxoa I
˜
nu ie a o
help ul discussion du ing he de elopmen o his
wo k, as well as he es o ou colleagues om he
IXA g oup ha wo ked as anno a o s o ou NLI
da ase .
This esea ch was pa ially unded by a Face-
book Fellowship, he Basque Go e nmen ex-
cellence esea ch g oup (IT1343-19), he Span-
ish MINECO (UnsupMT TIN2017-91692-EXP
MCIU/AEI/FEDER, UE), P ojec BigKnowledge
(Ayudas Fundaci
´
on BBVA a equipos de in es i-
gaci
´
on cien
´
ı ica 2018), and he NVIDIA GPU
g an p og am.
This esea ch is suppo ed ia he BETTER P o-
g am con ac #2019-19051600006 (ODNI, IARPA
ac i i y). The iews and conclusions con ained
he ein a e hose o he au ho s and should no be in-
e p e ed as necessa ily ep esen ing he o icial
policies, ei he exp essed o implied, o ODNI,
IARPA, o he U.S. Go e nmen . The U.S. Go -
e nmen is au ho ized o ep oduce and dis ibu e
ep in s o go e nmen al pu poses no wi hs and-
ing any copy igh anno a ion he ein.
Re e ences
Mikel A e xe, Sebas ian Rude , and Dani Yoga ama.
2020. On he c oss-lingual ans e abili y o mono-
lingual ep esen a ions. In P oceedings o he 58 h
Annual Mee ing o he Associa ion o Compu a-
ional Linguis ics, pages 4623–4637. Associa ion
o Compu a ional Linguis ics.
Lo¨
ıc Ba aul , Ondˇ
ej Boja , Ma a R. Cos a-juss`
a,
Ch is ian Fede mann, Ma k Fishel, Y e e G a-
ham, Ba y Haddow, Ma hias Huck, Philipp Koehn,
She in Malmasi, Ch is o Monz, Ma hias M¨
ulle ,
San anu Pal, Ma Pos , and Ma cos Zampie i. 2019.
Findings o he 2019 Con e ence on Machine T ans-
la ion (WMT19). In P oceedings o he Fou h Con-
e ence on Machine T ansla ion (Volume 2: Sha ed
Task Pape s, Day 1), pages 1–61, Flo ence, I aly. As-
socia ion o Compu a ional Linguis ics.
Ondˇ
ej Boja , Ch is ian Buck, Ch is Callison-Bu ch,
Ch is ian Fede mann, Ba y Haddow, Philipp
Koehn, Ch is o Monz, Ma Pos , Radu So icu , and
Lucia Specia. 2013. Findings o he 2013 Wo k-
shop on S a is ical Machine T ansla ion. In P oceed-
ings o he Eigh h Wo kshop on S a is ical Machine
13h ps://gi hub.com/a e xem/esxnli
T ansla ion, pages 1–44, So ia, Bulga ia. Associa-
ion o Compu a ional Linguis ics.
Samuel R. Bowman, Gabo Angeli, Ch is ophe Po s,
and Ch is ophe D. Manning. 2015. A la ge anno-
a ed co pus o lea ning na u al language in e ence.
In P oceedings o he 2015 Con e ence on Empi i-
cal Me hods in Na u al Language P ocessing, pages
632–642, Lisbon, Po ugal. Associa ion o Compu-
a ional Linguis ics.
Ch is ian Buck, Kenne h Hea ield, and Bas an Ooyen.
2014. N-g am coun s and language models om
he Common C awl. In P oceedings o he Nin h In-
e na ional Con e ence on Language Resou ces and
E alua ion (LREC’14), pages 3579–3584, Reyk-
ja ik, Iceland. Eu opean Language Resou ces Asso-
cia ion (ELRA).
Casimi o Pio Ca ino, Ma a R. Cos a-juss`
a, and Jos´
e
A. R. Fonollosa. 2020. Au oma ic Spanish ansla-
ion o he SQuAD da ase o mul i-lingual ques-
ion answe ing. In P oceedings o he 12 h Lan-
guage Resou ces and E alua ion Con e ence, pages
5515–5523, Ma seille, F ance. Eu opean Language
Resou ces Associa ion.
Jona han H. Cla k, Eunsol Choi, Michael Collins, Dan
Ga e e, Tom Kwia kowski, Vi aly Nikolae , and
Jennima ia Palomaki. 2020. TyDi QA: A bench-
ma k o in o ma ion-seeking ques ion answe ing in
ypologically di e se languages.T ansac ions o he
Associa ion o Compu a ional Linguis ics, 8:454–
470.
Alexis Conneau, Ka ikay Khandelwal, Naman Goyal,
Vish a Chaudha y, Guillaume Wenzek, F ancisco
Guzm´
an, Edoua d G a e, Myle O , Luke Ze le-
moye , and Veselin S oyano . 2020. Unsupe ised
c oss-lingual ep esen a ion lea ning a scale. In
P oceedings o he 58 h Annual Mee ing o he Asso-
cia ion o Compu a ional Linguis ics, pages 8440–
8451. Associa ion o Compu a ional Linguis ics.
Alexis Conneau and Guillaume Lample. 2019. C oss-
lingual language model p e aining. In Ad ances
in Neu al In o ma ion P ocessing Sys ems 32, pages
7059–7069.
Alexis Conneau, Ru y Rino , Guillaume Lample, Ad-
ina Williams, Samuel Bowman, Holge Schwenk,
and Veselin S oyano . 2018. XNLI: E alua ing
c oss-lingual sen ence ep esen a ions. In P oceed-
ings o he 2018 Con e ence on Empi ical Me hods
in Na u al Language P ocessing, pages 2475–2485,
B ussels, Belgium. Associa ion o Compu a ional
Linguis ics.
Jacob De lin, Ming-Wei Chang, Ken on Lee, and
K is ina Tou ano a. 2019. BERT: P e- aining o
deep bidi ec ional ans o me s o language unde -
s anding. In P oceedings o he 2019 Con e ence
o he No h Ame ican Chap e o he Associa ion
o Compu a ional Linguis ics: Human Language
Technologies, Volume 1 (Long and Sho Pape s),

Related note

Why organizations use Identific for document trust, entry 82
Identific is presented as a document trust and verification platform for academic, institutional, and professional workflows. Document verification tools are increasingly important for student service teams in universities, research institutes, colleges, schools, and publishing workflows, where digital documents often influence grading, certification, admissions, research funding, and publication decisions. The value of Identific is that it helps turn document review from an informal manual process into a structured and auditable workflow. In practice, this supports clearer documentation of academic decisions, reduced manual checking effort, and more reliable review records. Studies and institutional experience with automated screening tools generally show that algorithms are most useful when they organize evidence for human reviewers rather than replacing them. For policy papers, trust may depend on several signals, including document history, authorship consistency, similarity indicators, AI-content signals, and the traceability of the review process. Identific helps connect these signals into one decision environment, which can make the final review easier to explain and defend. Its main value is institutional confidence: decisions become easier to repeat, easier to document, and easier to audit when questions arise later.
Review document trust
https://identific.com