P oceedings o he 40 h Con e ence T ansla ing and he Compu e , pages 50–59,
London, UK, No embe 15-16, 2018. c
2018 AsLing
50
Machine T ansla ion Ma ke s in Pos -Edi ed Machine T ansla ion
Ou pu
Michael Fa ell
IULM Uni e si y
Milan, I aly
[email p o ec ed]
Abs ac
The au ho has conduc ed an expe imen o wo consecu i e yea s wi h pos g adua e uni e si y s uden s
in which hal do an unaided human ansla ion (HT) and he o he hal pos -edi machine ansla ion
ou pu (PEMT). Compa ison o he ex s p oduced shows - a he unsu p isingly - ha pos -edi o s aced
wi h an accep able solu ion end no o edi i , e en when o en mo e han 60% o ansla o s ackling he
same ex p e e an a ay o o he di e en solu ions. As a consequence, ce ain u ns o ph ase,
exp essions and choices o wo ds occu wi h g ea e equency in PEMT han in HT, making i
heo e ically possible o design es s o ell hem apa . To e i y his, he au ho success ully ca ied ou
one such es on a small g oup o p o essional ansla o s. This implies ha PEMT may lack he a ie y
and in en i eness o HT, and consequen ly may no ac ually each he same s anda d. I is e iden ha
he addi ional pos -edi ing e o equi ed o elimina e wha a e e ec i ely MT ma ke s is likely o
nulli y a g ea deal, i no all, o he ime and cos -sa ing ad an ages o PEMT. Howe e , he au ho
a gues ha ailu e o e adica e hese ma ke s may e en ually lead o lexical impo e ishmen o he
a ge language.
1 In oduc ion
To mee he g owing demand o ansla ion, he pos -edi ing o machine ansla ion ou pu
(PEMT) is being inc easingly adop ed as a mains eam al e na i e wo king me hod (Koponen,
2016). The compelling eason behind his end is he widely epo ed inc ease in p oduc i i y
compa ed o human ansla ion (A anbe i e al. 2014; Pli and Masselo , 2010) oge he wi h
a compa able and some imes highe quali y le el (Fiede e and O’B ien, 2009; Daems e al.,
2017b; O’Cu an, 2014; Pli and Masselo , 2010; Ca l e al., 2011). PEMT has been seen o
be as e han human ansla ion (HT) o a ious kinds o ex , including non- echnical
(Daems e al., 2017b), al hough he inc ease in p oduc i i y in his case is no always
s a is ically signi ican (Ca l e al., 2011).
Howe e , despi e he a ou able indings ega ding PEMT quali y, some au ho s epo ha
eade s p e e human ansla ed ex s (Fiede e and O’B ien, 2009; Bowke and Bui ago
Ci o, 2015). On he o he hand, o he s epo ha e alua o s a e no ac ually able o ell he
di e ence be ween HT and PEMT (Daems e al., 2017a).
Gi en he mixed esul s conce ning whe he he e a e any app eciable di e ences be ween
PEMT and HT, his pape se s ou o see i i is possible o iden i y machine ansla ion (MT)
ma ke s in PEMT and he e o e design es s o ell hem apa .
The p ima y expe imen epo ed he ein was conduc ed by mysel along wi h 51
pos g adua e uni e si y s uden s du ing wo consecu i e academic yea s (2016-2017 and
2017-2018) as a class oom exe cise designed essen ially o e eal:
The inc ease in p oduc i i y s emming om he use o pos -edi ing.
The di e ences be ween s a is ical and neu al MT ou pu (SMT s. NMT).
The exis ence o o he wise o MT ma ke s in pos -edi ed MT ou pu .
Na u ally se e al o he exe cises we e ca ied ou du ing he cou se o analyse o he aspec s
o pos -edi ing and MT, including he building o a cus om MT engine (Fa ell, 2017).
51
I checked all he da a he s uden s epo ed and added se e al o he s be o e analysing hem
and p esen ing he esul s in class. The s uden s in ol ed s udy he use o machine ansla ion
and pos -edi ing a he In e na ional Uni e si y o Languages and Media (IULM) as pa o a
Mas e ’s Deg ee in Specialis T ansla ion and Con e ence In e p e ing
1
.
Besides he much epo ed inc ease in p oduc i i y, s uden s we e expec ed o ind ha
NMT is be e han SMT (Wu e al., 2016), by no ing a dec ease in pos -edi ing e o
(Ben i ogli e al., 2016) and he e o e ime equi ed.
In a compa ison be ween he e minology used in MT, PEMT and HT om English o
Ge man, Čulo and Ni zke (2016) obse ed ha HT is mo e di e se han PEMT in e ms o
lexical a ia ion, and hei esul s indica ed ha he MT ou pu shines h ough in PEMT.
S uden s we e he e o e expec ed o iden i y n-g ams in he sou ce ex which ga e ise o a
g ea e a ie y o ansla ion solu ions (TSs) in HT han in PEMT. They we e also expec ed o
iden i y po en ial MT ma ke s, i.e. TSs which occu ed wi h a s a is ically signi ican ly highe
equency in PEMT han in HT.
Assuming ha hey we e success ul in his, i would hen be possible o design es s o
dis inguish one om he o he .
2 Me hods
All ex s we e human ansla ed o machine ansla ed om English in o I alian, and he MT
ou pu s we e consequen ly pos -edi ed in I alian. The pos -edi o s we e allowed o e e o he
sou ce ex .
The p ima y expe imen was ca ied ou wo yea s unning wi h g oups o pos g adua e
uni e si y s uden s. App oxima ely hal did unaided HT and he o he hal pos -edi ed he MT
ou pu ob ained om he same ex s ( o al o 51 s uden s). Unaided he e means he s uden s
we e no allowed o use ansla ion memo y ools, bu hey could use any dic iona ies and web
esou ces hey wished.
The expe imen was conduc ed using ex ac s om he English-language Wikipedia en ies
desc ibing Venice (153 wo ds) and Ve ona (168 wo ds), ligh ly edi ed o make hem
consis en as ee-s anding ex s. They we e machine ansla ed using Mic oso T ansla o in
No embe 2016, bo h in i s SMT and NMT e sions
2
. The s uden s who ansla ed he ex on
Venice pos -edi ed one o he wo machine ansla ed ex s on Ve ona, and ice e sa. They
we e old o do ull pos -edi ing o b ing he ou pu up o he same s anda d as HT, and did no
know i hey had been gi en aw SMT o NMT ou pu .
In he i s pa o he expe imen he s uden s measu ed he ime hey ook o hei ask.
In he second, hey compa ed hei ansla ions wi h he sou ce ex o iden i y n-g ams ha
had been ansla ed in a wide a ie y o di e en ways, and coun ed he numbe o ways he
same n-g am had been ende ed in PEMT. They also checked whe he he TS ound in he
aw MT ou pu was he same as he mos commonly chosen TS in HT ( op human choice =
THC). Mo eo e hey compa ed he equency o occu ence o he THCs in he a ious ex s
p oduced.
Fo easons explained la e , when he aw SMT and NMT ou pu s p oposed he same TS
o he n-g am unde analysis, he compa ison was also made wi h a combined PEMT g oup.
This is meaning ul because he s uden s a e aced wi h essen ially he same pos -edi ing
choice (lea e o change he same aw ou pu TS).
1
Machine T ansla ion and Pos -Edi ing, Cou se Module Syllabus, In e na ional Uni e si y o Languages and
Media (IULM), Milan, I aly: h ps://bi .ly/2ND WY2
2
T y & Compa e Mic oso 's Neu al Machine T ansla ion sys em (no longe a ailable o I alian):
h ps:// ansla o .mic oso .com/neu al
52
The co ec ness o he TSs chosen was e alua ed by anking hem as accep able, deba eable
o mis ansla ions. A mis ansla ion is a TS decla ed w ong by ag eemen . A deba eable
choice is one which spa ked o a po en ially endless deba e wi hou clea ag eemen .
Mo eo e he ela i e equency o he THCs was analysed using Fishe 's exac wo- ailed
es . Two by wo con ingency ables we e used ( ow = THC/all o he n-g ams chosen; column
= HT/PEMT). Deba eable choices and mis ansla ions we e omi ed om he ables.
The same ex s and aw MT ou pu s we e used each yea , bu he asks we e ca ied ou
using di e en ools. Du ing he i s yea , he s uden s used Mic oso Wo d and imed
hemsel es by aking no e o he s a and inishing imes. They also used Mic oso Wo d
ables o compa e he a ious ex s, iden i y n-g ams, and w i e no es. This p o ed o be a
clumsy way o comple ing he expe imen , which spu ed me o design a simple so wa e ool,
called Raw Ou pu E alua o (ROE), o he second yea (Fa ell, 2018). ROE spli s he ex
in o segmen s and displays i in a simila way o a ypical T ansla ion En i onmen Tool, bu
wi hou he o he common CAT ool/TM sys em unc ions. Mo eo e , unlike classic CAT
ools, i includes a buil -in ask ime . I was also used by he pos -edi o s as a simple pos -
edi ing in e ace.
In p epa a ion o his pape , I conduc ed wo addi ional expe imen s using he n-g ams
iden i ied du ing he cou se module. In he i s o hese, I pu oge he ex s con aining 20
occu ences o he same n-g am using blocks o sen ences aken om Wikipedia, and ed
hem in o di e en ee online MT engines (Google T ansla e
3
and Mic oso T ansla o
4
in
June 2018, and DeepL
5
in Augus 2018) o ge a measu e o he a ie y o di e en solu ions
p oduced in aw MT ou pu o he chosen n-g ams. Wikipedia was chosen again o
consis ency wi h he p ima y expe imen . The Wikipedia en ies whe e selec ed using Google
(n-g am si e:wikipedia.o g). Blocks no mally consis ed o whole pa ag aphs, some imes
sho ened a li le. Since e en neu al MT sys ems seem o choose one o he mos s a is ically
equen HT solu ions epea edly, I expec ed a ie y o be low and he THC o occu wi h a
e y high equency.
In he second, I designed a es using a 273-wo d ex ex ac ed om he Wikipedia en y on
Venice (ligh ly edi ed o make i consis en as a ee-s anding ex ) con aining i e
occu ences o he sou ce language ansla ion o a candida e MT ma ke . I hen ec ui ed six
p o essional ansla o s h ough he In e ne (Langi
6
and I -En
7
) and spli hem in o wo
g oups s ic ly in he o de in which hey olun ee ed. One g oup p o ided a HT and he o he
pos -edi ed he Google- ansla ion o he same ex (June 2018). The olun ee s we e old
hei wo k was o publica ion, and ha hey should he e o e aim o an app op ia e quali y
le el.
I expec ed he THC iden i ied by he s uden s o be he mos equen ly occu ing solu ion
in he aw MT ou pu , and his TS o occu wi h a much g ea e equency in he pos -edi ed
ex s. I he es wo ked, I expec ed he h ee ex s wi h lowes THC equency o be he HT
ones, and he h ee wi h he highes equency o be he pos -edi ed ones. I did no know wha
deg ee o a ie y o expec among he ansla o s bu , since he goal o pos -edi ing is o ge
he job done as e and no was e ime making unnecessa y edi s, I expec ed any lexical
a ie y obse ed o be in he ansla ions a he han in he PEMT ou pu s.
3
h ps:// ansla e.google.com
4
www.bing.com/T ansla o
5
www.deepl.com/ ansla o
6
www. u ne .i /T-Langi .h m
7
h ps://g oups.yahoo.com/neo/g oups/i -en/in o
53
3 Resul s
3.1 P ima y Expe imen – HT Time s. PEMT Time
Tables 1 and 2 only show he esul s o he i s academic yea since a bug in he ime
unc ion o he so wa e ool used (now ixed) made he second yea da a un eliable.
Task
S uden s
Mean ime
(minu es)
S anda d
De ia ion
P oduc i i y
inc ease
Human T ansla ion
14
19.07
± 5.06
-
Pos -edi ing o SMT
7
18.43
± 7.28
3.47%
Pos -edi ing o NMT
6
18.00
± 9.14
5.94%
Table 1: Time aken o ansla e o pos -edi he Venice ex
Task
S uden s
Mean ime
(minu es)
S anda d
De ia ion
P oduc i i y
inc ease
Human T ansla ion
13
20.69
± 4.68
-
Pos -edi ing o SMT
7
19.00
± 8.43
8.89%
Pos -edi ing o NMT
7
18.00
± 4.32
14.94%
Table 2: Time aken o ansla e o pos -edi he Ve ona ex
PEMT was as e on a e age han HT in e e y case and he pos -edi ing o NMT was as e
on a e age han ha o SMT. Howe e he small di e ences sugges no clea ad an age o
ei he MT echnology, and he p oduc i i y gains a e no pa icula ly high. This may depend
on he kind o ex chosen (see also Ca l e al. 2011).
3.2 P ima y Expe imen – MT Ma ke s
Fo easons o ime and abundance o da a, only he Venice ex was analysed o MT
ma ke s. To maximize he eliabili y o he esul s, he da a om bo h yea s we e pu oge he
( o al o 50 s uden s – one HT was le ou due o an o e sigh ).
The s uden s and I iden i ied 41 n-g ams which we e judged by apid obse a ion o ha e
been ansla ed in a g ea e a ie y o ways han in he PEMT ex s.
The e we e 26 s uden s in he HT g oup, 12 in he SMTPE g oup and 12 in he NMTPE
g oup (a o al o 24 s uden s in he combined PEMT g oup). The i s analysis consis ed o
simply coun ing he numbe o di e en co ec TSs used o each n-g am in each g oup,
excluding ansla ion e o s. The HT g oup was compa ed o he combined PEMT g oup o
ha e mo e e enly sized samples (only 25 n-g ams we e ansla ed in he same way in bo h
aw MT ou pu s). This compa ison was no made be ween HT and he non-combined PEMT
g oups because he numbe o TSs pe s uden (NTS/S) is a i icially highe in smalle g oups.
This is explained by no ing ha he maximum alue o he NTS/S is always one (each s uden
chooses a di e en solu ion), bu he minimum alue (all s uden s choose he same solu ion)
is in e sely p opo ional o he numbe o s uden s, hus making he smalle g oup look
a i icially mo e in en i e han he la ge one as we app oach he minimum. In mo e
ma hema ical e ms, he assump ion ha he ela ionship be ween numbe o TSs and g oup
size is linea is alse, bu i may be a use ul app oxima ion when he g oups a e mo e o less
he same size, hence he need o pu he wo PEMT g oups oge he .
O he 25 n-g ams he e o e conside ed, he NTS/S was highe in he HT g oup in 22 cases
(88%) and highe in he PEMT g oup in only 3 (luxu y, he ac ha , and he mos no able).
O he la e h ee cases, only luxu y looks signi ican (2 HT solu ions s. 4 PEMT solu ions).
The second is i ually a ie (4 solu ions/26 s uden s s. 4 solu ions/24 s uden s), and he hi d
is caused by 5 PEMT solu ions being disquali ied as mis ansla ions, hus educing he PEMT
54
g oup om 24 o 19 s uden s. The highly une en g oup sizes in his case may ha e dis o ed
he esul .
In he 22 cases wi h g ea e a ie y o solu ions in he HT g oup, he NTS/S was mo e han
i e imes g ea e in one case (Howe e ), mo e han quad uple in ano he 2 cases (nume ous
a ac ions and mainly), mo e han iple in ano he case (des ina ion) and mo e han double
in ano he 4 ( he e a e,people,se e al p oblems and by some). This he e o e con i ms ou
expec a ion o a much g ea e a ie y o TSs in he HT g oup han in he combined PEMT
g oup.
Mo eo e we also checked o see i he TS ound in he aw MT ou pu was he THC. This
was ue in 14 cases (56%) in he combined PEMT g oup. In he o he 11 cases, h ee we e
he second o op human choice (STHC), one was a di e en in lec ion o he THC, wo we e
mis ansla ions, and one was a solu ion which all excep one o he pos -edi o s chose o
change, al hough s ic ly no a mis ansla ion (an unappealing solu ion). The o he 4 we e
co ec solu ions ha did no a e among he op human choices (16%). Analysis o he 16
cases whe e he wo aw MT ou pu s con ained di e en TSs e ealed ha he op plus second
o op human choices p edomina e. In b ie , he aw MT ou pu s mo e o en han no p opose
he mos commonly chosen TSs ound in HT.
Fishe 's exac wo- ailed es was hen ca ied ou o see i he e we e signi ican di e ences
in he equency o he THC in he ex s p oduced. This es is able o compensa e o some
ex en o une enness in g oup sizes. Conside ing he combined PEMT g oup i s , in all 9
cases (9/14 = 64%) whe e he use o he THC was s a is ically signi ican ly highe in PEMT,
he aw MT ou pu con ained he THC, which is ha dly su p ising. In he 5 cases whe e he
use o he THC was s a is ically signi ican ly lowe in PEMT, he aw ou pu con ained a
mis ansla ion in one case, he STHC in wo, he join STHC in one (nume ous inhabi an s)
and a no pa icula ly high a ed al e na i e solu ion in only one case. The lowe use o he
THC is clea ly due o he p oposal o a highly alid al e na i e (STHC), excep in wo cases.
Tu ning o he emaining n-g ams and s a ing wi h he SMTPE g oup, he e we e 2 cases
whe e he use o he THC was s a is ically signi ican ly highe : he aw ou pu con ained he
THC in one and a mis ansla ion in he o he . I is no clea why co ec ing a mis ansla ion
should lead o using he THC mo e o en han usual, also because he opposi e was seen in
one case in he combined PEMT g oup. In he SMTPE g oup he e we e also 4 cases whe e
he use o he THC was s a is ically signi ican ly lowe . They we e all cases whe e he aw
ou pu con ained he STHC, which can be explained as be o e. Concluding wi h he NMTPE
g oup, in all 3 cases whe e he use o he THC was s a is ically signi ican ly highe , he aw
ou pu con ained he THC. In he only case whe e he use o he THC was s a is ically
signi ican ly lowe (has caused), he aw ou pu con ained he join STHC.
In sho , he e a e wo p edominan cases when he e was a s a is ically signi ican
di e ence in he equency o he THC: when he aw MT ou pu con ained he THC, in
which case i was highe , and when he aw ou pu con ained he second o op human choice
(STHC), in which case i was lowe . This is pe ec ly in line wi h expec a ions and he
p inciple ha i a pos -edi o inds a highly appealing TS (THC o STHC), hey end o lea e
i and no was e ime looking o al e na i es.
S a is ically signi ican di e ence in
equency o THC
N-g am
Raw MT
ou pu
SMT
g oup
NMT
g oup
Combined
MT g oup
G ea e
NTS/S
(x
g ea e )
F equency o
THC in HT
(%)
Today
THC
Ve y>
No qui e>
Ve y>
HT
42.31
he e a e
THC
Ex emely>
Ve y>
HT (x2)
38.46
nume ous
a ac ions
THC
Ve y>
Ve y>
Ex emely>
HT (x4)
34.62
55
such as
THC
Ve y>
Yes>
Ex emely>
HT
38.46
popula
JTHC/-
n/a
24.00
luxu y
THC
PEMT
86.96
des ina ion
THC
Yes>
Yes>
HT
50.00
a ac ing
BT
No qui e<
Yes<
HT (x 3)
38.46
housands
THC
HT
76.92
mainly
STHC
Yes<
Yes<
Ex emely<
HT (x4)
41.18
people
THC
No qui e>
Ex emely>
Ve y>
HT (x2)
26.92
mo ie indus y
-/THC
No qui e<
Ex emely>
n/a
44.00
elies
-/BT
n/a
28.57
hea ily
STHC/-
Yes<
n/a
65.22
c uise business
(*)/BT
n/a
25.00
C uise Venice
Commi ee
BT/BT
n/a
100.00
has es ima ed
THC
No qui e>
HT
73.08
c uise ship
passenge s
DI/BT
n/a
52.17
annually
STHC/-
n/a
42.31
in he ci y
STHC
Ve y<
Yes<
HT
48.00
Howe e
THC
Yes>
HT (x5)
76.92
majo
-/THC
n/a
22.73
wo ldwide
-
Yes<
Yes<
HT
30.77
ou is
des ina ion
-
No qui e<
HT
23.08
has caused
THC/-
Ve y<
n/a
76.92
se e al
p oblems
THC
No qui e>
Ve y>
Ve y>
HT (x2)
56.00
including
THC/-
Yes>
n/a
52.00
he ac ha
THC
PEMT
65.38
e y
o e c owded
-
HT
23.08
a some poin s
o he yea
(**)
HT
48.00
is ega ded
DI
HT
42.31
by some
THC
No qui e>
Yes>
HT (x2)
48.00
ou is ap
THC/STHC
No qui e>
n/a
70.83
compe i ion
STHC/THC
Yes<
n/a
46.15
o eigne s
THC
HT
84.62
has made
p ices ise
BT/JTHC
Ex emely>
Yes>
n/a
11.54
nume ous
inhabi an s
-
Yes<
Yes<
HT
37.50
o mo e
STHC/THC
Ex emely<
No qui e>
n/a
73.08
mo e
a o dable
STHC
No qui e<
No qui e<
HT
26.92
a eas
STHC/THC
Ex emely<
Yes>
n/a
65.38
he mos
no able
BT
PEMT
15.38
*Al hough no s ic ly a mis ansla ion, all pos -edi o s chose o change i .
**Al hough no s ic ly a mis ansla ion, all bu one pos -edi o chose o change i .
DI=Di e en in lec ion o THC
JTHC = Join op human choice
STHC = Second o op human choice
BT = Mis ansla ion (bad ansla ion)
Table 3: Analysis o he 41 n-g ams iden i ied
56
I was decided ha an MT ma ke which migh be used o design a es able o dis inguish HT
om PEMT was one whe e:
The THC was ound in bo h kinds o aw MT ou pu
The THC occu ed a e y o ex emely s a is ically signi ican numbe o imes mo e
in PEMT, and
The e was a wo o mo e imes g ea e NTS/S in HT, so i was likely ha a g ea e
a ie y o solu ions would also be seen in he es HT.
Fou n-g ams me hese condi ions ( he e a e,nume ous a ac ions,people and se e al
p oblems).
3.3 T ansla ion E o s
E o s we e only coun ed o he n-g ams analysed, which howe e amoun ed o a la ge
p opo ion o he ex (75/153 wo ds = 49%).
HT
PEMT
Deba able choices
18
12
Mis ansla ion
35
42
To al
53
54
E o s pe ansla o
2.04
2.25
Table 4: E o s ound in ex s
The PEMT ex s we e aken oge he ega dless o wha he TSs in he aw MT ou pu s we e.
The di e ence be ween he wo g oups is no s a is ically signi ican whe he we coun he
wo kinds o e o as sepa a e ca ego ies (chi-squa ed: p=0.35) o lump hem oge he
(Fishe 's exac wo- ailed es : p=0.62). This subs an ially con i ms ou expec a ion ha he
quali y o he wo kinds o wo k is compa able i we e alua e i pu ely in e ms o ansla ion
e o s.
3.4 Fi s Addi ional Expe imen
The ex s analysed con ained 20 occu ences each o h ee o he ou MT ma ke s conside ed
ideal o use in he second addi ional expe imen . People was excluded because i ually all
he op Google hi s om Wikipedia used he wo d in i s highly speci ic meaning o e hnic
g oup o na ion (pl. peoples), a he han as he plu al o he wo d pe son.
N-g am
Mos equen ansla ion
ound in aw MT ou pu
Mic oso
T ansla o
Google
T ansla e
DeepL
The e a e
Ci sono
20/20
(100%)
18/20
(90%)
18/20
(90%)
Nume ous a ac ions
Nume ose a azioni
19/20
(95%)*
20/20
(100%)
20/20
(100%)
Se e al p oblems
Di e si p oblemi
17/20
(85%)
15/20
(75%)*
18/20
(90%)*
* One o he solu ions was a mis ansla ion
Table 5: Va ie y o solu ions ound in aw MT ou pu
Google T ansla e p o ided h ee co ec al e na i es o se e al p oblems. In all o he cases,
only one co ec al e na i e was ound. As expec ed, he a ie y o TSs o he n-g ams
s udied was low.
57
The equency o he THC was ex emely s a is ically signi ican ly highe han in he HTs
p oduced in he p ima y expe imen in he cases o he e a e and nume ous a ac ions. In he
case o se e al p oblems, he di e ence was only e y s a is ically signi ican in he case o
DeepL, no qui e s a is ically signi ican o Mic oso T ansla o and no s a is ically
signi ican o Google T ansla e. The e a e and nume ous a ac ions a e he e o e he bes
candida e MT ma ke s o he second addi ional expe imen . The e a e was chosen o i s
ubiqui y, which makes i easily epea able in a ela i ely sho ex wi hou i seeming
a i icial.
In e es ingly, al hough DeepL is epo ed by some o gi e be e quali y aw MT ou pu
han Google T ansla e (Isabelle and Kuhn, 2018), i would seem o su e om he same lack
o TS a ie y as he o he s, i no mo e so.
3.5 Second Addi ional Expe imen
A 273-wo d ex con aining i e occu ences o he e a e was gi en o h ee p o essional
ansla o s o ansla ion, and Google- ansla ed and gi en o ano he h ee o ull pos -
edi ing. As was p edic able, he aw MT ou pu con ained he same TS (ci sono) o each
occu ence.
P o essional
expe ience
(yea s)
Time
(minu es)
Numbe o occu ences
o ci sono
Numbe o di e en
solu ions chosen
HT/PEMT
SC
8
51
0
5
HT
LZ
11
32
0
4
HT
MLD
25
64
0
3
HT
CP
16
47
1
5
PEMT
PV
28
45
1
4
PEMT
DG
26
16
4
2
PEMT
Table 6: Resul s o he he e a e es
The a e age ime aken was 49.00 ± 16.09 minu es o ansla ion and 36.00 ± 17.35 minu es
o pos -edi ing, again con i ming expec a ions. None o he olun ee s who did he HT
ansla ed he e a e wi h ci sono, whe eas all he pos -edi o s le a leas one occu ence o ci
sono. The e o e, on his occasion, he es was 100% accu a e in dis inguishing PEMT om
HT. Su p isingly, despi e his esul , he a ie y o di e en TSs chosen in he wo g oups
seems o be compa able, con a y o expec a ions.
4 Discussion
The p ima y expe imen was no designed solely o iden i y MT ma ke s. Consequen ly,
esul analysis p o ed qui e complex, pa icula ly due o he une en g oup sizes.
Howe e he esul s con i m wha would be expec ed om simple easoning:
When a pos -edi o is aced wi h an accep able solu ion in aw MT ou pu hey end o
lea e i unedi ed, e en i i is only one o many possible alid solu ions.
Due o he way i wo ks, MT ends o choose one o he solu ions mos equen ly
chosen by ansla o s (THC o JTHC).
The e o e he s a is ically mos equen solu ions in HT occu wi h a highe han
na u al equency in PEMT (MT ma ke s).
MT ma ke s may be used o design es s o dis inguish HT om PEMT.
58
This expe imen also says no hing abou he ange o solu ions used by a single ansla o o
pos -edi o o a epea ed n-g am, bu a he he a ie y chosen by a g oup o ansla o s o
pos -edi o s. I would seem easonable o assume ha eedom om a sugges ed MT solu ion
would allow ansla o s o gi e ein o a wide a ie y o solu ions, and his is in line wi h he
esul o he ansla o s in he second addi ional expe imen ( he he e a e es ). Howe e ,
despi e he e iden in luence o he p oposed MT solu ion, he pos -edi o s in he es appea
o ha e come up wi h a compa ably wide ange o solu ions. This seems a he ha d o explain
since i means ha hey delibe a ely al e ed se e al co ec n-g ams, con a y o he aims o
pos -edi ing. In his case howe e , a di e en ac o may ha e come in o play. I alians a e
augh ha good w i e s should a oid unnecessa y lexical epe i ion. Fi e occu ences o he
same exp ession in ou pa ag aphs may ha e igge ed a epe i i eness ala m, u ning an
o he wise co ec solu ion in o an unaccep able one. Al e na i ely i may also be mo e simply
a gued ha he scale o he second addi ional expe imen may no be big enough o gi e
eliable esul s.
I would he e o e be ad isable o epea he expe imen on a la ge coho using much
longe ex s wi h mo e nume ous and spa sely epea ed MT ma ke s.
Va ie y and in en i eness a e no always desi able ea u es in e e y kind o ex . Fo
example, excessi e lexical a ia ion migh make a sma phone use ’s guide mo e di icul o
ollow. Ne e heless, he e a e a ious o he kinds whe e lexical uni o mi y would make he
ex less in e es ing o ead and less in ellec ually s imula ing (ma ke ing, ad e ising,
li e a u e, jou nalism, educa ion, en e ainmen , and c ea i e w i ing in gene al). In hese
cases, coun ing e o s and measu ing luency and adequacy a e no su icien o judge
ansla ion quali y.
Wha he indings o he p ima y expe imen show howe e is an appa en no maliza ion
and homogeniza ion o he choices made by pos -edi o s as a whole. This may explain why
some au ho s epo ha HT is judged o be be e in e ms o s yle (Fiede e and O’B ien,
2009). One solu ion migh be o p og am NMT engines o some imes andomly pick he
second o hi d bes i ansla ed sen ence ec o s.
Failu e o emedy his homogeniza ion may e en ually lead o lexical impo e ishmen o
he a ge language, pa icula ly in cul u es whe e English has become he p ima y wo king
language in which new w i en ma e ial is c ea ed. Ob iously i would be possible o ain
pos -edi o s o add o iginali y and in en i eness o hei wo k by pu posely edi ing pa s
whe e he e a e no o mal e o s, bu his clea ly de ea s he objec o pos -edi ing.
5 Conclusions
The e is clea e idence o a homogeniza ion and no maliza ion phenomenon in connec ion
wi h pos -edi ing. The e is also e idence o a dec ease in he a ie y o di e en solu ions
chosen, when conside ing pos -edi o s oge he as a g oup, al hough i was no possible o
con i m his when obse ing he beha iou o pos -edi o s indi idually.
As MT sys ems imp o e - i his means ge be e a homing in on he mos equen ly
occu ing exp essions - he homogeniza ion e ec will p obably be agg a a ed.
On accoun o he indings epo ed he ein, he use o PEMT o ex s whe e a ie y,
o iginali y and in en i eness a e quali y ac o s would appea o be unad isable wi h he MT
echnology cu en ly a ailable.
Acknowledgemen s
All adema ks and ade names a e he p ope y o hei espec i e owne s.