scieee Science in your language
[en] (orig)

Tuning of language models in Eastern European languages on Twitter/X

Author: Filip, Tomáš; Pavlíček, Martin; Sosik, Petr
Publisher: Zenodo
DOI: 10.5281/zenodo.17609598
Source: https://zenodo.org/records/17609598/files/paper19.pdf
Tuning o language models in Eas e n Eu opean languages
on Twi e /X
Tomáš Filip1,†,Ma in Pa líček1,†and Pe Sosík1,2,∗,†
1Ins i u e o Resea ch Applica ions o Fuzzy Modeling, Uni e si y o Os a a, 30. dubna 22, Os a a, 70200, Czech Republic
2Ins i u e o Compu e Science, Facul y o Philosophy and Science, Silesian Uni e si y in Opa a, Bez učo o náměs í 1150/13,
Opa a, 746 01, Czech Republic
Abs ac
We add ess he p oblem o ine- uning la ge language models (LLMs) o sen imen analysis on Twi e /X in
unde ep esen ed Eas e n Eu opean languages (Czech, Slo ak, Polish, and Hunga ian). We s udy he in luence
o a numbe o expe imen al se ings on he e iciency o ine- uning in wo g oups o LLMs: ans e -lea ning
models (BERT, BERTwee o XLM-T, he la e wo p e- ained on a Twi e co pus) and popula mid-sized
uni e sal models (Llama, Mis al). We show ha adap e ine- uning wi h as ew as
≈
600 wee s imp o ed sco es
o ou uni e sal models o he le el p e iously epo ed by Twi e /X-specialised models on popula da ase s,
while ou ans e -lea ning models pe o med wo se. We also show ha , despi e p e ious success ul expe imen s
wi h mul ilingual models, ansla ing om unde ep esen ed languages in o English s ill imp o es he esul s o
all models es ed. Se e al o he ac o s ha in luence he success o ine- uning a e also included in he s udy.
Keywo ds
La ge language model, Sen imen analysis, Twi e , Eas e n-Eu opean language, Russo-Uk aine con lic , Llama,
Mis al, BERTwee , BERT, XLM-T, GPT-4
1. In oduc ion
Sen imen analysis is one o he mos common opics in na u al language p ocessing, wi h apidly
eme ging echniques [
1
]. Recen ly, machine lea ning me hods, especially la ge language models (LLM),
ha e been conside ed he s a e o he a on su icien ly la ge aining da ase s. As end-use deploymen
o language models is now common and a o dable, hei pe o mance in unde ep esen ed languages
is becoming impo an .
This pape ocusses on ine- uning LLMs o sen imen analysis in Eas e n Eu opean languages
(Czech, Slo ak, Polish, and Hunga ian) belonging o he so-called Viseg ád (V4) g oup. As a case s udy,
we chose he opic o he Uk aine wa c isis on Twi e /X, p o iding a la ge ex ual co pus wi h ich
sen imen pola i y. This opic is also he a ge o in ensi e cybe bullying a acks and, simul aneously,
a c ucial sou ce o Open Sou ce In elligence (OSINT), u he unde lining i s ele ance. The no el y
aspec s:
•
Twi e /X s udies in Eas e n Eu opean (EE) languages a e a e in LLM-based sen imen analysis,
and we a e no awa e o any s udies ocussing on he Russo-Uk aine con lic .
•
The aspec s o he unabili y o a ious LLMs on Twi e /X (o simila ) EE da a ha e no been
adequa ely esea ched.
•
The pe o mance o mid-sized o la ge models (Llama, Mis al, o GPT-4) e sus ans e lea ning
models (BERT, BERTwee , RoBERTa) in Twi e /X-based asks has been poo ly s udied, wi h e y
ew excep ions, such as [2,3].
25 h Con e ence on In o ma ion Technologies – Applica ions and Theo y (ITAT)
∗Co esponding au ho .
†These au ho s con ibu ed equally.
En elope-Open[email p o ec ed] (T. Filip); [email p o ec ed] (M. Pa líček); [email p o ec ed] (P. Sosík)
O cid0009-0001-4386-0620 (T. Filip); 0000-0003-1429-2668 (M. Pa líček); 0000-0001-7624-3816 (P. Sosík)
© 2025 Copy igh o his pape by i s au ho s. Use pe mi ed unde C ea i e Commons License A ibu ion 4.0 In e na ional (CC BY 4.0).
CEUR
Wo kshop
P oceedings
ceu -ws.o g
ISSN 1613-0073
published 2025-11-13
Figu e 1: Expe imen al pipeline o e iew. The downloaded da ase was spli in o h ee language-
speci ic pa s. Th ee e sion e sions o ansla ion (Helsinki, DeepL, none) we e p epa ed, ob aining 9
indi idual da ase s. The es ed models we e ine- uned in ou a ian s, combining classi ica ion in o
wo/ h ee classes and aining wi h/wi hou e e ence wee s.
We downloaded and anno a ed h ee monolingual da ase s (CS/SK, PL, HU) om Twi e /X. The
da ase was used o ine- une h ee ans e lea ning models (BERT, BERTwee , XLM-T) and h ee
mid-sized LLMs (Llama 2, Llama 3, Mis al) in a numbe o expe imen al se ings illus a ed in Fig. 1.
The aining objec i e was he sen imen pola i y owa ds ei he Uk aine o Russia. We e alua ed he
in luence o a ious se ings, such as he size o he da ase , he ansla ion in o English, o he p esence
o he e e ence wee ( he one o which he wee eac ed) on he e iciency o ine- uning. The key
indings a e as ollows.
•
Fine- uning wi h as ew as
≈
600 wee s in unde ep esen ed Eas e n Eu opean languages im-
p o ed he F1 sco e o he Llama and Mis al models by 30–40%, eaching he le el o specialised
models on Twi e /X benchma ks.
•
Fine- uned gene al mid-sized LLM such as Llama o Mis al signi ican ly ou pe o med equally
ine- uned ans e lea ning models (BERTwee , XLM-T) p e- ained on a la ge Twi e /X co pus.
•
All models (including mul ilingual XLM-T o GPT-4) pe o med bes when ine- uned on a da ase
ansla ed in o English by DeepL.
•
Unsu p isingly, in-con ex lea ning did no help he small- and mid-sized models, bu nei he he
con ex o he e e ence wee s imp o ed he ine- uning.
The es o he pape is o ganised as ollows. Sec ion 2b ie ly esumes sen imen analysis in ex s,
wi h a ocus on Twi e /X da ase s. Sec ion 3desc ibes he cons uc ion o ou da ase , ollowed by Sec.
4 ha ou lines he expe imen al se ings. Sec ion 5con ains an o e iew o he esul s, which a e hen
discussed in mo e de ail in Sec. 6. Sec ion 7p o ides an abla ion s udy ha ocusses on he impac o
selec ed expe imen al a iables. Finally, Sec ion 8summa izes he esul s.
2. Backg ound
Wi h he apid g ow h o social ne wo ks and e-comme ce, sen imen analysis has eme ged as one o he
as es -g owing esea ch a eas in compu e science. To cap u e sen imen wi h g ea e g anula i y, Hu
and Liu [
4
] in oduced he concep o aspec -based sen imen analysis (ABSA) in hei ounda ional wo k,
which has since inspi ed nume ous ollow-up s udies. A comp ehensi e e iew o ecen de elopmen s
in NLP-based sen imen analysis is p o ided by Jim e al. [1].
Recen p og ess in ABSA has been signi ican ly d i en by he in eg a ion o la ge language models
(LLMs).
1
Fo example, Zhang e al. [
5
] p oposed a gene a i e amewo k ha o mula es ABSA as a
ex gene a ion p oblem, o e ing a lexible al e na i e o adi ional classi ica ion app oaches. Building
on he s eng hs o ins uc ion-based lea ning in LLMs, Sca ia e al. [
6
] in oduced he Ins uc ABSA
model, which le e ages ask ins uc ions o imp o e pe o mance. Pe iodic su ey s udies, such as
ha by B auwe s and F asinca [
7
], con inue o p o ide s uc u ed o e iews o he e ol ing ABSA
landscape.
Sen imen classi ica ion can be challenging in Twi e /X da a due o he lack o explici con ex and
speci ic s yle. Twee E al benchma k [
8
] e alua ed models ha analyse sen imen in wee s on de ec ion
asks o emo ion, i ony, ha e speech, o ensi e language, s ance, emoji p edic ion and sen imen analysis.
The Twee E al leade boa d on Gi Hub lis s BERTwee [
9
] as he cu en SoTA model, closely ollowed
by TimeLM-21. The amily o TimeLM models [
10
] e lec s he cu en con ex p oblem by pe iodic
upda es wi h wee da ase s, and ou pe o med BERTwee in many asks.
Ba bie i e al. [
11
] expanded he ocus on mul ilingual wee analysis and p esen ed a uni ied wee
benchma k in eigh languages (UMSAB). The pape also in oduced he XLM-Twi e model (XLM-T)
de eloped by p e- aining he XLM-R [
12
] using 198M mul ilingual wee s. XLM-T was u he ine-
uned in UMSAB, and he esul ing model was named XLM-T Sen imen . Ba e o e al. [
13
] s udied,
among o he opics, he pe o mance o BERT, RoBERTa and BERTwee in Twi e ABSC asks.
K ugmann e al. [
2
] compa ed he pe o mance o es ablished ans e lea ning models (BERT,
BERTwee , RoBERTa) wi h ecen LLM (GPT-3.5, GPT-4, and Llama 2) on Twi e /X da a, wi h he
supe io i y o he la e . In con as o hese esul s, S igall e al. [
3
] p esen ed a ine- uned model
EmoBERT
Tiny
o emo ion and sen imen classi ica ion asks and epo ed i s supe io i y o e non- uned
Llama-2-7B-cha and Mis al-7B-Ins uc ac oss all me ics. These and o he au ho s also epo ed on
he domain sensi i i y o he models.
Finally, o many exis ing sen imen s udies on Russia–Uk aine wa on social ne wo ks, we men ion
wo. An e alua ion o adi ional ML models (logis ic eg ession, decision ees, andom o es s, SVMs
e c.) on Twi e da a was p o ided in [
14
]. A deep lea ning app oach combining mul i- ea u e CNN
wi h BiLSTM was applied in [
15
] o an analogous ask. Bo h s udies elied on monolingual English
da ase s.
3. Da ase cons uc ion
Ou da a we e collec ed using he academic Twi e /X API du ing he pe iod 4/2/2023 o 20/5/2023.
Fil e ing by languages (Czech/Slo ak, Polish, Hunga ian), and keywo ds (Uk aine, Russia, Zelensky,
Pu in) esul ed in 34,124 ele an wee s spli in o h ee monolingual pa s acco ding o he language.
The e was no il e a ailable o Slo ak so i was mixed wi h Czech. In e e y monolingual da ase , we
manually anno a ed a andom subse o wee s by hei sen imen owa d Uk aine o Russia, keeping
he classes oughly balanced. Ce ain class imbalance esul ed om he lack o ele an wee s neu al
o a gi en aspec . To a oid anno a ion bias, he anno a o s ollowed he p inciples o he CAMEO
2
con lic ing opic codebook, and he anno a ed wee s we e c oss- alida ed among he anno a o s. The
anno a ed da ase s a e no he same size (see Table 1), o s udy he impac o he size on he models’
1h ps://pape swi hcode.com/ ask/aspec -based-sen imen -analysis
2h p://da a.gdel p ojec .o g/documen a ion/CAMEO.Manual.1.1b3.pd
Table 1
Size o language-speci ic subda ase s (No. o wee s)
Lang. Aspec To al Pos. Neu al Neg.
cs/sk Uk aine 1638 632 447 559
cs/sk Russia 1716 579 537 600
pl Uk aine 640 205 263 172
pl Russia 570 202 164 204
hu Uk aine 628 202 203 223
hu Russia 556 181 145 230
Table 2
Language models used in he expe imen s.
Model To al Tuned Pape Web page
pa ams pa ams
BERT base 110M 110M [18] h ps://hugging ace.co/google-be /be -base-uncased
BERTwee la ge 355M 355M [9] h ps://hugging ace.co/ inai/be wee -la ge
XLM-T Sen imen 279M 279M [11] h ps://hugging ace.co/ca di nlp/ wi e -xlm
- obe a-base-sen imen
Llama-2 7B 6.7B 4M [19] h ps://hugging ace.co/me a-llama/Llama-2-7b-h
Llama-3 8B 8B 3.4M [20] h ps://hugging ace.co/me a-llama/Me a-Llama-3-8B
Mis al 7B 7.2B 3.4M [21] h ps://hugging ace.co/mis alai/Mis al-7B- 0.1
pe o mance. Each anno a ed da ase was spli in o a aining se (75 %) and a es ing se (25 %). The
da ase s a e a ailable in he supplemen a y da a on Gi Hub; see he link in Conclusions.
4. Me hods
Language models
The models we es ed (Table 2) belong o wo ca ego ies: (i) ans e lea ning models popula in he ABSA
li e a u e and in he Twee EVAL and UMSAB benchma ks: BERT, BERTwee , and XLM-T. The la e wo
ha e been p e- ained on la ge Twi e /X co puses. As we in ended o s udy he unabili y o uni e sal
models, we did no use language-speci ic a ian s as he PolBERT
3
, huBERT
4
o he Slo akBERT
5
. (ii)
Mid-sized open-sou ce models (up o 10B pa ame e s) which a e ine- unable on limi ed end-use GPU
ha dwa e: Llama-2 7B, Llama-3 8B, and Mis al 7B. Recen s udies such as [
2
,
3
,
16
] poin ou missing
s udies on ABSA using hese and simila models. Fu he mo e, Cha GPT-4 [
17
] was used as a e e ence
model o wee classi ica ion.
T ansla ion
When applying p e- ained LLMs o da ase s in unde ep esen ed languages, some sou ces such as
[
22
,
23
] epo be e esul s wi h machine ansla ion o English, while o he s ely on ollow-up aining
o ine- uning in o iginal languages [
11
,
12
]. To compa e he e ec i eness o bo h app oaches, he
anno a ed da ase s we e used o bo h aining and es ing in h ee di e en language modes:
• ansla ed o English using he Helsinki Neu al Machine T ansla ion Sys em6;
3h ps://gi hub.com/klda ek/polbe
4h ps://hugging ace.co/SZTAKI-HLT/hube -base-cc
5h ps://hugging ace.co/ge ula a/slo akbe
6h ps://hugging ace.co/Helsinki-NLP
• ansla ed o English using he DeepL API7;
• no ansla ion, o iginal languages (CS/SK, PL, HU).
T aining
We ained each decode model by using a wee as inpu and gene a ed a single ou pu oken. The loss
unc ion was he c oss-en opy be ween he gene a ed oken and he g ound u h label. Each model in
Table 2in combina ion wi h each ansla ion mode was ine- uned on each language-speci ic aining
se (no hei combina ion). Fo Llama 2/3 and Mis al we used he PEFT adap e -based echnique [
24
]
using he Py hon PEFT lib a y
8
. The numbe o uned pa ame e s a ied be ween 3.5–4 million. The
aining was un o 10 epochs on all models. The lea ning a e was se o 3𝑒−4,ba ch size was 4. The
lea ning a e schedule was linea ly g owing o maximum du ing wa m-up ( he i s 100 i e a ions)
and hen linea ly dec easing owa ds ze o. The emaining hype pa ame e s we e lib a y-de aul . All
me ics we e calcula ed a he bes checkpoin o he model. Bo h aining and in e ence we e un on a
se e 2 x 2060 RTX (8GB) o smalle BERT-de i ed models, and ano he se e wi h 2 x NVIDIA V100
(32GB) o la ge models.
In e ence
A e ine- uning in a speci ic language, all models in 2we e p omp ed he same way using he es ing
se in he same language. The expe imen s we e ca ied ou wi h and wi hou he use o he e -
e ence wee ( o which he classi ied wee eac ed). We used a simple English p omp in all expe imen s:
wee : { wee }
The sen imen o he wee owa ds {aspec } is…
Fo GPT-4 we did no use ine- uning bu ins ead applied in-con ex ins uc ion lea ning (ICL), ha
is, expanding he p omp wi h con ex in o ma ion ela ed o he ques ion asked. The expanded p omp
can be ound in he online Appendices o he pape ; please ollow he link in he “Supplemen a y
ma e ial” sec ion.
5. Resul s
We conduc ed an ex ensi e se ies o wee sen imen classi ica ion expe imen s ha a ied in he
ollowing se ings:
• sen imen aspec (Russia/Uk aine)
• language o he wee (CS/SK, HU, PL)
• language model (BERT, BERTwee , XLM-T, Llama 2, Llama 3, Mis al, GPT-4)
• wee ansla ion (DeepL, Helsinki ansla o , none)
• posi i e/neu al/nega i e classi ica ion, o only posi i e/nega i e
• he p esence o a e e ence wee
S anda d me ics we e used o e alua e he esul s: accu acy and mac o-a e aged ecall, p ecision,
and F1 sco e [
25
]. The mac o-a e aged F1 was chosen as ou p ima y e alua ion measu e due o
i s balanced assessmen o he e alua ion o model pe o mance ac oss mul iple classes (nega i e,
neu al, posi i e). Unless s a ed o he wise, ables and g aphs show esul s o posi i e/neu al/nega i e
sen imen classi ica ion. Wi h he excep ion o Cha GPT-4, all esul s we e ob ained wi hou using
e e ence wee s. The comple e esul s a e con ained in he supplemen a y da a on Gi Hub; see he link
in Conclusions.
7h ps://www.deepl.com/ ansla o
8h ps://hugging ace.co/docs/pe

Figu e 2: Mac o-a e aged F1 sco e by language models and ansla ion.
Resul s by ansla ion
Figu e 2summa ises he main esul s o ganised by language models and ype o ansla ion. Conce ning
he pe o mance o indi idual models, su p isingly, Llama 3 sco ed app ox. 6% F1 wo se han Llama 2
and BERTwee la ge pe o med wo se han BERT base; pe haps p e- aining on olde wee s could ha e
a ec ed unabili y o BERTwee o a newe con ex . Nei he did XLM-T each he le el o he la ge
models, al hough i was p e- ained on a la ge mul ilingual wee co pus. The o de o magni ude la ge
model size seems o be he p e ailing ac o . Finally, all models bene i ed om he DeepL ansla ion.
The e o e, he emaining esul s included in he pape a e es ic ed o DeepL- ansla ed da ase s.
Resul s by languages
Figu e 3summa ises expe imen s on indi idual languages using DeepL ansla ion. Qui e su p isingly,
almos all models pe o med poo e o he Polish language. These esul s do no co ela e wi h he
suppo o he indi idual languages da ase s (see Table 1) no wi h he ype o ansla ion, and canno
be a ibu ed o p e- aining ei he (e.g., GPT-4 pe o med well in Polish on he MMLU benchma k [
17
]).
The esul s we e simila also o he anilla models es ed; see Table 3. A de ailed analysis showed
ha many posi i e Polish wee s we e classi ied as nega i e by he models. These wee s con ained
mo e complex hough s abou he his o ical in e connec ion o Poland wi h Uk aine. Some examples o
misclassi ied wee s can be ound in appendices in he supplemen a y ma e ial, please see he link in
Conclusions.
6. Discussion
Rela ion o he SoTA
Ou ocus on unde ep esen ed EE languages does no allow di ec compa ison wi h popula Twi e /X
benchma ks, and he ollowing igu es p o ide only an app oxima e pic u e. The Twee EVAL leade -
boa d [
8
] ma ks TIMELM-21 as he SoTA model wi h mac o-a e aged ecall 73.7 o h ee- alued ABSA,
ollowed by BERTwee wi h ecall 73.4. Ou bes mac o-a e aged esul (Llama 2, ansla ion by DeepL,
a e aged o e all aspec s and languages) p o ided he F1 sco e 73.7. Ou ask is on he one hand much
na owe han Twee EVAL. On he o he hand, Twee EVAL is monolingual and BERTwee was ained
on 850M English wee s, while we ine- uned ou models using h ee da ase s wi h a ew hund eds o
wee s in unde ep esen ed languages.
Figu e 3: F1 sco e by models and languages o wee s o posi i e/neu al/nega i e classi ica ion, mac o-a e aged
o e bo h aspec s UA/RU.
Figu e 4: F1 sco e by models and languages o wee s o posi i e/nega i e classi ica ion, mac o-a e aged o e
bo h aspec s UA/RU.
The UMSAB Twi e benchma k [
11
] epo s XLM-Tw Mul i as he bes model wi h an F1 sco e o
69.4, mac o-a e aged in eigh languages. Again, his ask is wide han ou s, bu XLM-Tw Mul i used a
much la ge ine- uning da ase ; he e o e, we canno p o ide an exac compa ison.
Size o he aining se s
The suppo o he CZ/SK aining se was app oxima ely h ee imes ha o HU o PL which we e
almos equal. This imbalance allowed o some in e es ing obse a ions. In he simple ask o wo-
alued classi ica ion, almos all ine- uned models e u ned sco es i ele an o he language, implying
ha he aining se s wi h abou 600 wee s we e su icien o b idge he language di e ences. Howe e ,
in he case o h ee- alued classi ica ion, he CZ / SK da ase was a ou ed by all ine- uned models.
Hence, o his ha de ask, he smalle HU/PL aining se was insu icien . The e ec was s onge o
smalle models (BERT, BERTwee ), con i ming he mul iplica i e join scaling law o LLM ine- uning
[26].
Model and human bias
In he con ex o he cu en si ua ion whe e Russia is desc ibed as he agg esso , human anno a o s
who know mo e abou he con ex may end o see he si ua ion in e ms o cause and e ec , and
he e o e hei sen imen de e mina ion is usually biased di e en ly han he models [
27
]. In pa icula ,
LLMs s uggled wi h wee s neu al (o posi i e) o a gi en aspec bu gene ally nega i e, o example,
add essing bombing, wa , a ack. Models such as Llama 2 o Mis al showed signi ican ly lowe p ecision
and ecall o he neu al class han o he nega i e o posi i e one.
Sco es o indi idual classes
All expe imen s in Sec ion 5used mac o-a e aged ecall, p ecision and F1 sco es, since he sco es we e
mos ly simila o all classes, wi h a ew excep ions. In pa icula , in Hunga ian, he ecall o he posi i e
class was o en app oxima ely 10% lowe han ha o he nega i e class, and he end was opposi e in
p ecision, meaning ha he models ended o classi y Hunga ian wee s mo e nega i ely han human
anno a o s. This migh possibly be due o he ac ha he o e all a io o nega i e samples in he
Hunga ian da ase was a bi highe han in he o he languages.
Non e ec i e in-con ex lea ning
When employing small, compu a ionally inexpensi e models, in-con ex lea ning (ICL) o en en ails
no able ade-o s. Due o hei mo e limi ed ep esen a ional capaci y, hese models may be unable o
le e age ICL e ec i ely. Ano he con ibu ing ac o may be insu icien p e- aining alignmen wi h
he a ge domain o opic. Fu he mo e, he addi ional complexi y in oduced by ICL can inc ease
ask ambigui y in aspec -based sen imen analysis (ABSA). Fine- uning may also o e ide any ma ginal
gains ha ICL migh p o ide. To igo ously iden i y he p ima y ac o s unde lying he lack o ICL
e ec i eness, u he ine-g ained expe imen al analyses a e equi ed.
7. Abla ion s udy
In his sec ion, we discuss he con ibu ion o se e al componen s o he expe imen al pipeline o he
classi ica ion pe o mance.
Re e ence wee use
The e e ence wee was always used in he in-con ex p omp o GPT-4 as i imp o ed i s pe o mance
(da a no shown). Fo all o he models, e e ence wee s sligh ly wo sened he mac o-a e aged F1
sco e (e.g., Be by 4%, XLM-T by 2%, Llama 2 by 0.5%, Llama 3 by 0.8%, Mis al by 2.5% in he case o
posi i e/neu al/nega i e classi ica ion). The e o e, we ag ee wi h [
28
] ha , while smalle models ely
subs an ially on seman ic p io s om p e- aining, la ge models can o e ide hem by con adic ing
exempla s con ained in he p omp .
Fine- uning and in-con ex lea ning
To compa e hese wo app oaches o Twi e /X ask adap a ion, we e alua ed models Llama 2, Llama 3,
Mis al, and GPT-4 in he anilla e sion, i.e., wi hou ine- uning and in-con ex lea ning, espec i ely.
The s udy was es ic ed o he case o DeepL ansla ion and posi i e/neu al/nega i e classi ica ion.
Table 3shows ha ine- uning imp o ed he F1 sco e o Llama 2/3 and Mis al mainly by 20–40% o e
he anilla e sions, while GPT-4 bene i ed om he ICL by abou 10%.
Table 3
Mac o-a e aged F1 sco es o anilla (no ine- uning o in-con ex lea ning) and ine- uned e sions o
selec ed models. Se ing: DeepL ansla ion, no e e ence wee s.
Lang. Ta ge Llama 2 Llama 3 Mis al GPT-4
Vanilla Tuned Vanilla Tuned Vanilla Tuned Vanilla ICL
cs ua 38.5 76.9 37.8 72.2 52.5 72.3 48.8 57.9
cs u 40.2 79.2 47.1 77.1 40.4 79.4 49.8 51.8
hu ua 41.7 70.3 44.7 58.9 50.7 73.4 55.9 66.7
hu u 53.9 74.6 47.4 72.6 43.3 75.8 60.5 68.8
pl ua 33.7 71.1 24.3 62.7 48.2 68.3 39.8 45.3
pl u 34.8 70.0 35.7 64.3 35.6 68.9 46.1 55.4
T ansla ion in o English
In he o e whelming majo i y o se ings (see Fig. 2and he supplemen a y ma e ial), all LLMs pe o med
be e when ine- uned and es ed on English- ansla ed da ase s, and he DeepL ansla o ga e be e
esul s han he Helsinki ansla o . The imp o emen in he mac o-a e aged F1 sco e in all models was
0.8% o he Helsinki ansla o and 3.1% o he DeepL. DeepL ansla ion imp o ed he F1 sco e by
1.2% e en o he mul ilingual XLM-T sen imen model. In he supplemen a y ma e ial, we also p o ide
he compa ison o he o iginal wee s wi h bo h ansla ed e sions, o ensu e ha he classi ica ion
di e ences we e caused by he quali y o he ansla ion and no by a sys ema ic bias o sen imen
caused by he ansla o .
8. Conclusion
We add essed he ine- uning o la ge language models o sen imen analysis asks on Twi e /X in
unde ep esen ed Eas e n-Eu opean languages. We manually anno a ed a Twi e /X-based da ase
ela ed o he Russo-Uk ainian con lic , na owed o he V4 (Czech Republic, Slo akia, Poland, Hunga y)
language space. The da ase was used o ine- une six language models (BERT, BERTwee , XLM-T,
Llama 2/3, Mis al) used equen ly o sen imen analysis. The uning was done sepa a ely o each
language in se e al a ian s, using ei he he o iginal wee s o he English ansla ion wi h he Helsinki
o DeepL ansla o . Fu he mo e, GPT-4 (wi h o wi hou in-con ex lea ning) was used as a e e ence
model. The esul s we e e alua ed using s anda d me ics, mos ly F1.
We demons a ed ha adap e ine- uning, e en wi h as ew as hund eds o samples in unde ep e-
sen ed languages, was able o d aw he model’s a en ion o he desi ed aspec s and also o balance
language and cul u e di e ences (a leas o mos models). Expe imen s ha e shown ha , despi e
p e ious success ul expe imen s wi h mul ilingual models [
11
,
12
], ansla ing om unde ep esen ed
languages in o English s ill imp o es he ine- uning o all models es ed in a wide a ie y o expe i-
men al se ings. Howe e , nei he he ins uc ion in-con ex lea ning no he en ichmen o ine- uning
wi h he con ex o e e ence wee s imp o ed he esul s. Finally, ou expe imen s also con i med ha
he success o ine- uning depends on he model and he ask, as epo ed by o he s udies such as [
26
].
Acknowledgmen s
This a icle was p oduced wi h he inancial suppo o he Eu opean Union unde he RE-
FRESH – Resea ch Excellence Fo REgion Sus ainabili y and High- ech Indus ies p ojec numbe
CZ.10.03.01/00/22_003/0000048 ia he Ope a ional P og amme Jus T ansi ion, and unde he: Bi-
og aphy o Fake News wi h a Touch o AI: Dange ous Phenomenon h ough he P ism o Mode n
Human Sciences p ojec no.: CZ.02.01.01/00/23_025/0008724 ia he Ope a ional P og amme Jan Ámos