scieee Science in your language
[en] (orig)

Introducing Background Temperature to Characterise Hidden Randomness in Large Language Models

Author: Messina, Alberto; Scotta, Stefano
Publisher: Zenodo
DOI: 10.5281/zenodo.17279584
Source: https://zenodo.org/records/17279584/files/Background_Temperature_in_LLMs___arxiV.pdf
In oducing Backg ound Tempe a u e o Cha ac e ise
Hidden Randomness in La ge Language Models
Albe o Messina1and S e ano Sco a1
1RAI - Radio ele isione I aliana, Cen e o Resea ch, Technological Inno a ion and
Expe imen a ion (CRITS)
Oc obe 6, 2025
Abs ac
E en when decoding wi h empe a u e
T
= 0, la ge language models (LLMs) can p oduce
di e gen ou pu s o iden ical inpu s. Recen wo k by Thinking Machines Lab highligh s
implemen a ion-le el sou ces o nonde e minism, including ba ch-size a ia ion, ke nel non-
in a iance, and loa ing-poin non-associa i i y. In his sho no e we o malize his beha io
by in oducing he no ion o backg ound empe a u e
Tbg
, he e ec i e empe a u e induced
by an implemen a ion-dependen pe u ba ion p ocess obse ed e en when nominal
T
= 0.
We p o ide clean de ini ions, show how
Tbg
ela es o a s ochas ic pe u ba ion go e ned
by he in e ence en i onmen
I
, and p opose an empi ical p o ocol o es ima e
Tbg
ia he
equi alen empe a u e
Tn
(
I
) o an ideal e e ence sys em. We conclude wi h a se o pilo
expe imen s un on a ep esen a i e pool om he majo LLM p o ide s ha demons a e
he idea and ou line implica ions o ep oducibili y, e alua ion, and deploymen .
1 In oduc ion
A common assump ion in LLM deploymen is ha se ing he decoding empe a u e o
T
= 0
(g eedy decoding) ensu es de e minism. Howe e , empi ical e idence shows ou pu a iabili y
pe sis s unde nominally de e minis ic se ings. The ecen wo k in [
3
] a gues ha nonde-
e minism in LLM in e ence o en a ises om p ac ical sys ems issues such as a ying ba ch
sizes and he lack o ba ch-in a ian ke nels, along wi h loa ing-poin non-associa i i y and
educ ion-o de e ec s. This pape p oposes a igo ous aming o such e ec s ia he no ion o
abackg ound empe a u e.
Con ibu ions. (i) A concise o mal model ha add esses he phenomenon o nonde e minism
as a s ochas ic e ec on he ou pu p obabili y; (ii) a o mal de ini ion o backg ound empe a u e
Tbg
; (iii) he ou line o a p ac ical p o ocol o es ima e
Tbg
; (i ) a se o pilo s udies illus a ing
he concep .
2 Rela ed Wo k
The ecen wo k by Thinking Machines Lab p o ides a sys ems- i s analysis o LLM nonde e -
minism, emphasizing ba ch-size a ia ion and ba ch-in a ian ke nels o in e ence; hey also
explain how loa ing-poin non-associa i i y and educ ion o de ing con ibu e o a iabili y. [
3
].
In addi ion o his wo k, se e al ecen s udies ha e quan i ied non-de e minism in la ge
language model ou pu s e en unde se ings in ended o be de e minis ic (e.g. empe a u e
T= 0, ixed seeds). Fo example:
1
•
A il e al. (2025) [
1
] sys ema ically e alua e mul iple LLMs con igu ed unde de e min-
is ic se ings ac oss ze o-sho and ew-sho asks. They obse e la ge accu acy a ia ions
(up o 15%) ac oss uns wi h he same inpu , and show ha e en he s ing ou pu s a e
o en no iden ical.
•
Song e al. (2024) [
10
] explo e how e alua ion p ac ices o en igno e a iabili y a ising
om di e en decoding con igu a ions (g eedy s sampling). They show ha e en o
g eedy decoding, e alua ion me ics a y, and ha alignmen me hods can help educe
sampling a iance.
•
Ouyang e al. (2023) [
6
] analyze code gene a ion benchma ks and show ha many
coding asks p oduce di e en code ou pu s ac oss epea ed p omp in oca ions, e en
when using
T
= 0. This con i ms ha de e minis ic empe a u e se ings do no gua an ee
ou pu consis ency.
These wo ks align closely wi h obse a ions om Thinking Machines Lab’s blog [
3
] abou
sys em-le el implemen a ion ac o s (ba ch size, ke nel non-in a iance, loa ing poin non-
associa i i y, e c.) causing ou pu a ia ion e en unde nominally de e minis ic decoding.
While p io wo k la gely documen s he exis ence and magni ude o non-de e minism, he e
emains a gap in o malizing his beha io in e ms o an equi alen empe a u e ans o ma ion
unc ional and in p oposing s anda d p o ocols o measu e he e ec i e backg ound andomness.
Ou wo k add esses his by in oducing he no ion o an equi alen empe a u e
Tn
(
I
) and i s
expec a ion
Tbg
. In he nex sec ions, we ansi ion om o mal de ini ions o a conc e e empi ical
p o ocol aimed a es ima ing an equi alen empe a u e
Tn
(
I
) induced by implemen a ion noise,
and ul ima ely he backg ound empe a u e.
To gi e a conc e e desc ip ion o wha an o e all measu emen p o ocol o
Tbg
would look
like, we i s desc ibe gene al c i e ia o selec ing p omp s and da ase s ha a e sensi i e o small
pe u ba ions in model beha iou (including gene al, ask-o ien ed, and ad e sa ial/syn he ic
p omp s). We hen in oduce he ac ual measu emen p o ocol, made up o e e ence uns unde
known nonze o empe a u e se ings o calib a e ou pu a iabili y. Following his, based on
a sui e o quan i a i e me ics - such as exac -ma ch equency, i s -di e gence oken index,
edi -dis ance o s ing simila i y, dis ibu ional di e gence (e.g. JS o KL) o e nex - oken / op-k
p obabilis ic ou pu s, and en opy/con idence measu es - we inally ou line a i ing p ocedu e
o in e
Tn
(
I
) by minimizing di e gence be ween ou pu s unde noisy
T
= 0 uns and e e ence
nonze o-
T
uns, and desc ibe how o agg ega e o e
I
o compu e
Tbg
wi h s a is ical con idence.
3 P elimina ies and No a ion
Le
D
deno e he oken ocabula y wi h size
|D|
. A gene a ion s ep
i
, he model p oduces
logi s
z∈R|D|
and associa ed p obabili ies
P
(
)
∈
[0
,
1] ha he
i
- h oken in he sequence is
he − h oken in D, such ha P|D|
=1 P( ) = 1 ia so max:
P( ) = P(τ |τ<i) = exp(z )
Ps∈Dexp(zs) o = 1,...,|D|,(1)
whe e
τ
deno es he
- h oken in
D
and
P
(
τ |τ<i
) is he p obabili y o gene a ing
τ
gi en he
sequence o okens gene a ed up o he
i
- h oken. A
T
= 0, he con en ional model is g eedy
decoding by a gmax:
τi= a g max
τ∈DP(τ|τ<i).(2)
Decoding a empe a u e
T >
0 is equi alen o do he same ope a ion bu wi h modi ied logi s
ˆz∈R|D|:
ˆ
PT(τi|τ<i) = expˆzi
Ps∈Dexpˆzs.(3)
2
Then he
i
- h oken is dis ibu ed as some Ca ego ical andom a iable depending on he
p obabili y dis ibu ion abo e, i.e.
τi∼Ca ego ical(ˆ
P(τ|τ<i)).(4)
Logi s a e modi ied h ough he andomiza ion e ec s ha a e included in he decoding p o-
cess by he speci ic LLM implemen a ion. In s anda d au o eg essi e language models, he
decoding empe a u e pa ame e modi ies he andomness o nex - oken selec ion by adjus ing
he p obabili y dis ibu ion de i ed om logi s. Typically, one scales o ans o ms he aw
(p e-so max) logi s ia a empe a u e pa ame e and hen passes hem h ough so max o
ob ain he inal dis ibu ion o sampling o g eedy selec ion. In gene al, bu as a su icien
assump ion o he sake o his wo k, lowe empe a u es concen a e p obabili y mass on he
mos likely okens, making ou pu mo e de e minis ic, while highe empe a u es la en he
dis ibu ion and inc ease a iabili y.
Equi alen ly, his can be seen as he esul o he applica ion o an oppo une empe a u e
ans o ma ion unc ional FT:
FT:R|D|→R|D|,ˆ
P=FT(P),(5)
wi h he ideal iden i y limi
F0
(
P
) =
P
. Many implemen a ions use empe a u e
T
so ha he
model e ec i ely compu es some hing like
FT
(
P
), a unc ional ans o ma ion o he o iginal
oken p obabili y ec o
P
, whe e
T
= 0 co esponds (ideally) o pu ely g eedy decoding, and
T > 0 allows s ochas ic sampling.
4 Modelling In insic Nonde e minism a T= 0
As no ed by au ho s in [
3
], eal sys ems exhibi implemen a ion-dependen pe u ba ions e en
unde
T
= 0. Le
I∈ I
deno e he in e ence en i onmen (ba ch size and composi ion,
concu ency/load, ha dwa e/backends, ke nel choices, nume ic p ecision, educ ion o de ing,
e c.) and
F′
T
he empe a u e ans o ma ion unc ional o he eal sys em. We model a
pe u ba ion
ϵI
, mapping p obabili y dis ibu ion o e he se
D
o p obabili y dis ibu ion o e
he same se , ha al e s he e ec i e dis ibu ion as:
F′
0(P) = ϵI(F0(P)) ≈ϵI(P).(6)
While
ϵI
may di e only sligh ly om
F0
(
P
), in egions whe e mul iple okens ha e simila
p obabili y mass, e en sligh changes can lip he a gmax in
(2)
and hus he emi ed oken
sequence.
5 Equi alen Tempe a u e and Backg ound Tempe a u e
We posi ha he pe u ba ion in
(6)
beha es as i decoding we e pe o med by an in e ence
en i onmen - ee (ideal) sys em a a nonze o equi alen empe a u e Tn(I) :
F′
0(P)≈ϵI(P)≈FTn(I)(P).(7)
This mo i a es he ollowing de ini ion.
De ini ion (Backg ound empe a u e). The backg ound empe a u e o an LLM imple-
men a ion is he expec ed equi alen empe a u e induced by he in e ence en i onmen unde
nominal T= 0:
Tbg ≜EI∈I[Tn(I)] .(8)
In ui i ely,
Tbg
cap u es he implici andomness in a deploymen s ack e en when he use
selec s T= 0.
3
6 Es ima ing Tn(I)and Tbg Empi ically
The p oblem wi h he de ini ion gi en in
(8)
is ha he in e ence en i onmen - ee (ideal) sys em
may be no a hand. In ac , he key challenge in es ima ing
Tn
(
I
) is ha i equi es compa ing
o a pe ec , de e minis ic e e ence - which may no exis in p ac ice. To make
Tn
(
I
) calib a ion
easible wi hou an una ainable ideal, one can i s iden i y a quasi-ideal en i onmen : o
example, by using in e ence pipelines wi h ba ch-in a ian ke nels (in no maliza ion, ma ix
mul iplica ion, a en ion), ixed nume ic p ecision, minimal o single- eques concu ency, and
de e minis ic con igu a ion lags. Thinking Machines Lab demons a es ha eplacing s anda d
ke nels wi h ba ch-in a ian ones d as ically educes ou pu di e gence unde ze o empe a u e [
3
]
Simila ly, [
9
] show ha loa ing-poin non-associa i i y and asynch onous pa allel educ ions a e
majo sou ces o un- o- un a iabili y, and ha en o cing de e minis ic al e na i es signi ican ly
s abilizes in e ence and scien i ic compu ing pipelines. Based on his e idence, one can ancho
measu emen o
Tn
(
I
) ela i e o such quasi-ideal baselines, o employ mul iple such baselines
(di e ing in ha dwa e, ke nel implemen a ion, o p ecision) o abso b unce ain y. Fu he ,
measu ing a ious ou pu s a is ical dis ibu ions ( a he han only ou pu s ings) allows
ma ching o en i onmen s
I
o baselines ia s a is ical di e gence me ics, educing sensi i i y
o a e a gmax lips. Repo ing
Tn
oge he wi h such baseline a iances yields ope a ionally
meaning ul es ima es e en in he absence o a pe ec o acle.
Ano he p ac ical way o assess he backg ound empe a u e o an online model (e.g. Cha -
GPT) is o use a local ins alla ion o ano he model (e.g. Llama) as a benchma k e e ence.
The local model mus be con igu ed o be as de e minis ic and s able as possible— ixed p e-
cision, consis en ba ch sizes, ke nel implemen a ions ha do no al e beha io when ba ch
shape changes, de e minis ic educ ion o de s, disabled non-de e minis ic/au o uned ope a ions.
This e e ence becomes a baseline en i onmen ha app oxima es “ideal beha io ”. Then, by
compa ing ou pu dis ibu ions om he online model e sus hose om he s able local model,
one can compu e how a he online model’s beha io di e ges, o example ia measu es like
Jensen-Shannon di e gence o KL di e gence. By inding wha empe a u e se ing o he local
model would make i s dis ibu ion ma ch he di e ged dis ibu ion o he online model, i is
possible o in e an equi alen empe a u e o he online model in ha en i onmen . Repea ed
ac oss many p omp s and local con igu a ions, his yields an es ima e o he online model’s
backg ound empe a u e, oge he wi h unce ain y bounds. This me hod a oids elying on an
una ainable pe ec sys em, by using he bes s able e e ence you can build.
Wi h hese conside a ions in mind, we can ou line a p ac ical p o ocol o es ima e
Tn
(
I
) and
Tbg which is pic o ially desc ibed in Figu e 1.
6.1 P omp Se s and Da ase s
The i s elemen o he p o ocol is cons i u ed by a ele an p omp se Π, an elemen o he
heo e ical se o all he possible combina ions o p omp s
P
. To explo e he ull ange o
beha io o he sys em unde es , he sugges ion is o use a di e se e alua ion sui e, e.g.:
•Gene al gene a ion p omp s (sho /long, common/ a e ocab).
•
Task benchma ks: QA (e.g., SQuAD[
7
]/T i iaQA[
4
]), summa iza ion, close, and sho -
o ma classi ica ion. Code-gene a ion p omp s i applicable.
•
Edge/ad e sa ial p omp s: long con ex s, a e okens, nea - ies among op-
k
oken p oba-
bili ies.
•Syn he ic p omp s enginee ed o c ea e inely balanced nex - oken choices.
4
Figu e 1: Measu ing p o ocol.
6.2 Con olling he In e ence En i onmen I
Run epea ed in e ence (e.g.,
M≥
50 pe p omp ) a
T
= 0 while a ying
I
along axes known
o in luence nonde e minism:
•
Ba ch s uc u e: ba ch size, e.g.
∈ {
1
,
2
,
4
,
8
,
16
,
32
,...}
; co-ba ching wi h o he p omp s
s. se ial.
•Concu ency/load: single eques s. many simul aneous eques s.
•
Ha dwa e/backends: GPU ypes, CPU s. GPU, p ecision ( p16/b 16/ p32), ke nel
implemen a ions (ba ch-in a ian s. s anda d).
•Nume ics: educ ion o de , de e minis ic lags in amewo ks, used s. un used ke nels.
Fo emo e sys ems, o which i may be impossible o imp ac ical o go e n he in e ence
en i onmen , one can assume ha p olonged and epea ed ope a ion is a good way o sample
he in e ence en i onmen s a is ical dis ibu ion.
6.3 Re e ence Runs a Known Tempe a u es
Unde a s able en i onmen
Is able
, e.g. a local ancho sys em, un he same p omp s e.g. a a
g id o
T∈ {
0
,...,
1
, . . .}
o build a mapping be ween
T
and ou pu - a iabili y s a is ics. As
no ed ea lie , his s able en i onmen can ei he be a speci ic con igu a ion o he sys em unde
es o ano he ancho used as e e ence. Gi en ha he ancho con igu a ion is supposed o be
s able o wha conce ns he in e ence en i onmen , a lowe numbe
K
o uns o each p omp
in he p omp se should su ice.
6.4 Va iabili y Me ics
The key elemen o he p o ocol is he se o me ics used o ob ain he associa ion be ween
he sough - o backg ound empe a u e pa ame e o he sys em unde es and he e e ence
measu emen s on he ancho sys em. Since h
Tbg
is hough as a gene eic high le el accoun o
he sys em’s nonde e minism, me ics should be con en -agnos ic. Fu he mo e, since di e en
sys ems a e ained independen ly, i is p ac ically ce ain ha he same p omp would p oduce
5

di e en ou pu s e en unde s ic de e minis ic con igu a ions. Fo example, o each p omp ,
and ac oss he M(o K) uns o Figu e 1, compu e p ocess pa ame e s like e.g.:
•Exac -ma ch a e: ac ion o uns p oducing iden ical s ings o he same p omp .
•Fi s -di e gence index: posi ion o i s oken misma ch ac oss pai s o uns.
•Edi dis ance i s o de and second o de s a is ics be ween di e en ou pu s.
•
Dis ibu ional di e gence: e.g., symme ized KL o JS di e gence be ween empi ical
nex - oken dis ibu ions ( op-k) ac oss uns.
•En opy o nex - oken dis ibu ions.
Then, o each a iabili y me ic compu ed ac oss he uns, cons uc a mul idimensional
dis ibu ion
ha cap u es he alues o he a iabili y me ics o he sys em conside ed. In
pa icula , we’ll deno e by
T
(
Is able
) and
g
(
I
) espec i ely he dis ibu ion o he a iabili y
me ics o he e e ence sys em when he empe a u e is
T
and o he sys em unde es se a
empe a u e 0. No e ha hese dis ibu ions depend on mul iple ac o s, including he speci ic
LLMs used; o no a ional simplici y, we omi hese dependencies.
6.5 Es ima o s o Tnand Tbg
As explained in Sec ion 6 he ideal e e ence sys em does no exis . Howe e , i is possible o
es ima e
Tn
using some e e ence model unning in an en i onmen
Is able
as s able as possible.
In pa icula , o a e e ence LLM
ℓ
, i is possible o compu e an es ima o
ˆ
Tℓ
n
=
ˆ
Tℓ
n
(
I,
Π) o
Tn
,
o each
I
in he se o en i onmen s conside ed
˜
I ⊆ I
and each Π in he se o all he collec ions
o p omp s conside ed ˜
P ⊆ P, as
ˆ
Tℓ
n= a g min
T≥0D T(Is able), g(I),(9)
whe e
D
is a chosen di e gence (e.g., JS o KL di e gence, o a weigh ed combina ion) applied
o he a iabili y dis ibu ions
g
(
I
) and
T
(
Is able
), co esponding espec i ely o he sys em
unde es and o he e e ence sys em based on
ℓ
(see Sec ion 6.4). The e o e, i is possible o
compu e an es ima e ˆ
Tbg =ˆ
Tbg(ℓ) o Tbg, o each ˆ
Tn, as
ˆ
Tbg(ℓ) = 1
|˜
I|
1
|˜
P| X
I∈˜
I
X
Π∈˜
P
ˆ
Tℓ
n(I, Π),(10)
whe e
|˜
I|
and
|˜
P|
deno e, espec i ely, he numbe o all he
I
and Π conside ed. To u he
imp o e obus ness, we epea he same p ocess ac oss a se o di e en e e ence LLMs
L
and
ake he a e age1
Tbg =1
|L| X
ℓ∈L
ˆ
Tbg(ℓ),(11)
whe e
|L|
deno es he numbe o di e en
ℓ
(LLMs) used. Theo e ically, as he se o e e ence
LLMs
L
, p omp s, en i onmen s, and a iabili y me ics g ows, we can expec
Tbg
o con e ge
o he ue Tbg, as de ined in (8).
1
Beyond he a e age es ima e, he a ailabili y o mul iple e e ence LLMs and con igu a ions also allows he
compu a ion o highe -o de momen s and con idence in e als, p o iding a mo e p ecise cha ac e iza ion o he
unce ain y associa ed wi h his kind o es ima e.
6
6.6 Enginee ing o Reduce Tbg
Once o a ce ain sys em he backg ound empe a u e
Tbg
is a ailable, se e al mechanisms can
be pu in place o mi iga e i s e ec . Fo example, empi ical and sys ems wo k sugges s se e al
in e en ions:
•
Ba ch-in a ian ke nels o co e ops (ma mul, a en ion, RMSNo m) o p e en ba ch-
shape–dependen nume ics [3].
•De e minis ic educ ions and s able accumula ion o de s whe e easible [9].
•
Consis en pipelines: ix ke nel con igs ac oss shapes; a oid oppo unis ic algo i hm
swi ching ha al e s educ ion pa hs [8].
•De e minis ic lags in amewo ks and ca e ul p ecision selec ion [9].
•
Ope a ional con ols: cap concu ency o shape bucke s o educe co-ba ching a iabili y
[3].
Abla ion s udies can u he de e mine wha in e en ion is impac ing he mos on he backg ound
empe a u e. This ans o ms he ou lined p o ocol in o an i e a i e p ac ice aimed a con olling
he nonde e minis ic cha ac e is ics o he sys em in use, as opposed o a me e obse a ion o an
empi ical phenomenon.
7 Pilo Expe imen s
In his sec ion, we p esen some expe imen s o alida e he heo y p esen ed in his wo k. In
pa icula , in Sec ion 7.1, we p esen a simple pipeline o es ima ing he backg ound empe a u e
o a gi en model. A e ha , we p esen addi ional expe imen s ha could cla i y and add
elemen s o analyze he backg ound empe a u e.
7.1 Basic pipeline o es ima ing Tbg
He e we pe o m a pilo expe imen o es ima e
Tbg
o he OpenAI model gp -4.1-nano accessed
ia he Mic oso Azu e AI se ices wi h empe a u e
T
= 0 (i.e., conside ing i as Sys em B
in Figu e 1). No e ha , being he model used a ia hi d pa se ice, we can no con ol he
in e ence en i onmen
I
bu only he empe a u e. The p omp se Π used is composed o he
i s 200 ques ions o he da ase u h ul qa 2(see [5]).
The e e ence LLM
ℓ
, playing he ole o Sys em A in Figu e 1, is Hugging-Face LLM
SmolLM3-3B
3
(see [
2
]). As ou lined in p e ious sec ions, we selec ed ep esen a i e empe a u e
alues Θ sampled in inc emen s o 0
.
01 om 0 o 0
.
2, in inc emen s o 0
.
05 om 0
.
2 o 0
.
5 and
in inc emen s o 0.1 om 0.5 o 1, i.e.
Θ = {0,0.01,...,0.19,0.2,0.25,...,0.45,0.5,0.6,...,0.9,1}.
Fo each
T∈
Θ, we gene a ed
K
= 32 esponses, limi ed o 32 okens, wi h he e e ence
LLM o each o he 200 p omp s in Π. As a iabili y me ic (see Sec ion 6.4), we used he
exac -ma ch ac ion, i.e. o each empe a u e conside ed and each p omp in Π, we compu ed
he maximum ac ion o iden ical answe s among he 32 gene a ed. In his way, o each
T∈
Θ
we ob ained 200 alues in he in e al [1
/
32
,
1], which cons i u e he disc e e dis ibu ion
T
o
he exac -ma ch ac ion o ha empe a u e in he answe s gi en by he e e ence LLM.
In Figu e 2, hese dis ibu ions a e g aphically ep esen ed, showing how he densi y es ima e
shi s om a del a concen a ed a 1 when he empe a u e is 0 - indica ing ha all answe s a e
2h ps://hugging ace.co/da ase s/ u h ulqa/ u h ul_qa
3h ps://hugging ace.co/HuggingFaceTB/SmolLM3-3B
7
Figu e 2: Dis ibu ion o exac -ma ch ac ions ob ained om he e e ence LLM answe s. Top ow
( om le o igh ): his og ams ep esen ing he dis ibu ions
0
,
0.2
and
1
. Bo om ow: ke nel densi y
es ima es o he exac -ma ch ac ion o all sampled empe a u es in Θ. No e ha o
T
= 0, he densi y
is ep esen ed as a e ical line because all answe s a e iden ical, so he densi y is en i ely concen a ed a
1, o ming a Di ac del a.
8
iden ical - o a dis ibu ion wi h mos o i s mass nea 0, indica ing ha he answe s end o be
unique.
A e compu ing he e e ence dis ibu ions
T
o
T∈
Θ o he chosen a iabili y measu e,
we compu ed he same o he model o which we wan o es ima e
Tbg
, i.e., gp -4.1-nano,
accessed ia he Mic oso Azu e AI se ices. To do his, we p omp ed he model 100 imes o
each o he 200 p omp s in Π, bu his ime se ing he empe a u e a T= 0 and limi ing he
answe s o 32 okens, as done o he e e ence sys em. Then, analogously o he p ocedu e o
he e e ence sys em, o each p omp in Π we compu ed he maximum ac ion o iden ical
answe s p o ided by gp -4.1-nano. These 200 alues, in [1
/
100
,
1], o m he disc e e dis ibu ion
g
(see Figu e 3) ha we need o compa e wi h he e e ence dis ibu ions compu ed in sys em A
(see (9)).
Figu e 3: Disc e e dis ibu ion
g
o he ac ion o iden ical answe s gi en by he LLM unde es ,
gp -4.1-nano, o he p omp s in Π. The dis ibu ion is shown bo h as his og ams (wi h he
y
-axis on he
le ) and as a ke nel densi y es ima e (wi h he y-axis on he igh ).
In o de o compa e he disc e e dis ibu ions o obse a ions,
T
o
T∈
Θ and
g
, we chose
o use he Kolmogo o –Smi no (K-S) dis ance, which is equal o 0 o iden ical dis ibu ions
and 1 o comple ely di e en ones. The compu ed alues o K-S dis ance a e epo ed in Figu e
4.
F om he alues in Table 4 (b), we can conclude ha he es ima o o
Tbg
ound in his
expe imen is
ˆ
Tbg
(
ℓ
) = 0
.
05 (which, in his simple case, coincides wi h
Tbg
), as his is he case
whe e
T
is closes o
g
, conside ing only he e e ence dis ibu ions compu ed om
T∈
Θ.
Figu e 5 shows he wo ma ching his og ams. Ideally, his expe imen should be epea ed using a
wide ange o
T
alues - especially lowe ones - mo e p omp s, ewe oken limi s, and di e en
a iabili y me ics (see Sec ions 6.4 and 6.5). Howe e , he pu pose o his pilo expe imen was
simply o demons a e he ull p ocedu e o es ima e Tbg.
7.2 Ex ending he e e ence model se L
One o he possibili ies o making he es ima e o he
Tbg
mo e obus is o add e e ence
models, i.e. ex end he se
L
in oduced in Sec ion 6.5. In pa icula , we used he LLM
Llama-3.2-3B-Ins uc
4
and made i answe 32 imes o he same 200 p omp s ( he same se
4h ps://hugging ace.co/me a-llama/Llama-3.2-3B-Ins uc
9