Recei ed 6 Ap il 2025, accep ed 24 Ap il 2025, da e o publica ion 8 May 2025, da e o cu en e sion 19 May 2025.
Digi al Objec Iden i ie 10.1109/ACCESS.2025.3568028
Ago a: A Dis ibu ed Language Model
F amewo k Wi h API-Call Suppo o In eg a ed
Clima e Fo ecas ing
ALEXANDRA UDRESCU AND DAN-MATEI POPOVICI
Compu e Science Depa men , Na ional Uni e si y o Science and Technology POLITEHNICA Bucha es , 060042 Bucha es , Romania
Co esponding au ho : Dan-Ma ei Popo ici (ma ei.popo ici@upb. o)
This wo k was suppo ed in pa by he Eu opean Union h ough he FUTURAL P ojec -Empowe ing he FUTu e h ough inno a i e
Sma Solu ions o URAL a eas (HORIZON EUROPE) unde P ojec 101083958, and in pa by Uni a ea Execu i ă pen u Finan
.a ea
În ă
,ămân ului Supe io , a Ce ce ă ii, Dez ol ă ii Si Ino ă ii (UEFISCDI) h ough he P ojec FUTURAL-Soluţii in eligen e ino a oa e
pen u zonele u ale (O izon Eu opa Ins i u ii) unde P ojec 020234823. Views and opinions exp essed a e howe e hose o he au ho (s)
only and do no necessa ily e lec hose o he Eu opean Union o he Eu opean Resea ch Execu i e Agency. Nei he he Eu opean Union
no he g an ing au ho i y can be held esponsible o hem.
ABSTRACT We in oduce Ago a, a Gene a i e AI-d i en sys em ha deli e s expe answe s and
ecommenda ions on clima e and ag icul u e, ans o ming complex da a in o clea , na u al language
explana ions. While buil o he u al domain, Ago a is highly adap able and can be deployed ac oss a ious
domain applica ions. I ope a es as a ‘‘mix u e-o -expe s’’ language model sys em, selec i ely u ilizing
mul iple ine- uned la ge language models o in e ence. By dynamically in eg a ing ex e nal da a h ough
API calls, Ago a ensu es eal- ime, con ex ually ele an esponses. Ago a is buil o ex ensibili y—i
seamlessly in eg a es new APIs and domains wi hou equi ing a ull sys em e ain. De eloped en i ely
wi h open-sou ce la ge language models om he LLaMA amily, Ago a emains open and adap able,
allowing anyone o ex end and enhance i s capabili ies. Op imized o accessibili y, Ago a uns e icien ly on
commodi y GPUs wi hou comp omising pe o mance. By elimina ing he need o expensi e ha dwa e like
NVIDIA’s A100, i makes ex gene a ion mo e a o dable and widely accessible. Ago a ou pe o ms closed-
sou ce models, achie ing 78% accu acy on ou ques ion-answe ing benchma k. This esul is achie ed ia
dynamic API in eg a ion, which pulls in eal- ime ex e nal da a, making esponses mo e adap i e, p ecise,
and con ex -awa e.
INDEX TERMS API-call o ches a ion, API-call suppo , o ecas gene a ion, la ge language model, model
ine- uning, na u al language p ocessing.
I. INTRODUCTION
In many Eas e n Eu opean coun ies, pa icula ly Romania,
a signi ican gap exis s be ween he echnological ad ance-
men s and p ac ices p e alen in u al a eas. Despi e he
widesp ead a ailabili y o In e ne connec i i y Romania
anks wi hin he op 54 o 140 coun ies o mobile In e ne
speed [1] and accessibili y o expe da a, sma se ices, and
cu ing-edge in o ma ion emains low in hese egions.
A epo [2] om he Black Sea Basin P og amme
e ealed ha app oxima ely one million people in Romania,
along wi h hei amilies, a e disconnec ed om mode n
ad ancemen s. This is u he suppo ed by he s a is ics
The associa e edi o coo dina ing he e iew o his manusc ip and
app o ing i o publica ion was A ianna D’Ulizia .
shown in Figu e 1. The same epo highligh s ha 97%
o Romanian a ms a e mic o- and subsis ence en e p ises,
ypically amily owned, co e ing up o 10 ha. These a ms
employ a leas hal he ag icul u al wo k o ce a ailable in
Romania. Al hough specialized wea he o ecas s, seasonal
c opping in o ma ion, and insigh s in o he e ec s o clima e
change a e eadily a ailable [3],[4],[5], many po en ial
bene icia ies in u al Romania a e no u ilizing hem.
This disconnec a ises because many u al use s a e
un amilia wi h he echnologies unde pinning hese se ices.
The p ocess o ins alling and na iga ing apps, da ase s
and API clien s, coupled wi h he g owing complexi y o
echnical in e aces such as sa elli e and ada p ojec ions o
mul imodel o ecas s can be daun ing. The e o e, con e ing
complex in o ma ion in o a clea , accu a e, and speci ic
84112
2025 The Au ho s. This wo k is licensed unde a C ea i e Commons A ibu ion 4.0 License.
Fo mo e in o ma ion, see h ps://c ea i ecommons.o g/licenses/by/4.0/ VOLUME 13, 2025
A. Ud escu, D.-M. Popo ici: Ago a: A Dis ibu ed Language Model F amewo k Wi h API-Call Suppo
FIGURE 1. Awa eness abou sma a ming applica ions [2].
ex -based o ma is akey obs acle o he widesp ead adop ion
o mode n sma a ming p ac ices in he nea u u e.
As pa o a b oade e o o suppo u al communi ies [6],
ou objec i e was o de elop a sys em capable o accessing
expe da a om di e se hi d-pa y sou ces, easoning abou
such da a, and deli e ing ac ionable conclusions and ec-
ommenda ions. By le e aging he powe o La ge Language
Models (LLMs), which excel in p ocessing, unde s anding,
and gene a ing human language, we aim o p o ide u al
communi ies wi h easy access o expe knowledge and
insigh s wi hou equi ing deep echnical expe ise.
LLMs, buil on he g oundb eaking T ans o me a chi-
ec u e [7], ha e become s a e-o - he-a ools, a ac ing
unp eceden ed a en ion and esea ch e o , and a e now
widely adop ed ac oss di e se applica ions. They powe
cha bo s capable o human-like con e sa ions and imp o e
p og amming ools wi h ea u es such as code gene a ion
and explana ion. The la es language models, wi h hund eds
o billions o pa ame e s, demons a e ad anced eason-
ing abili ies and can pe o m logical easoning o some
ex en .
Howe e , language models a e no ou -o he box solu ions
o many applica ions including ou s. LLMs su e om
ain-da a dependency - hey a e cons ained by he da a
hey ha e been ained on, hus limi ing hei easoning abil-
i ies, as illus a ed in [8];in-con ex easoning limi a ions
- hey s uggle wi h undamen al easoning asks such as
calcula ing minimum, maximum, o a e age alues ac oss
da a se ies, and o en ail o decide when o pe o m ac ual
checks ins ead o pe o ming ex gene a ion; p ohibi i e
aining cos s - hey demand ex ensi e memo y and GPU
esou ces, wi h ine- uning cos s s a ing a hund eds o
dolla s and escala ing o hund eds o housands o aining
om sc a ch.
In his pape , we in oduce Ago a, an ad anced sys em
powe ed by language models speci ically designed o add ess
que ies ela ed o wea he o ecas s, clima e, and c op
managemen , while e ec i ely o e coming he challenges
men ioned ea lie . Ago a can in oke hi d-pa y APIs o
imp o e i s answe s. Mo eo e , i can e icien ly o ches a e
mul iple API calls, allowing i o igge addi ional eques s
when needed o e ie e supplemen a y da a, he eby ensu ing
mo e comp ehensi e and accu a e answe s o use que ies.
Fo ins ance, o answe a ques ion such as:
Can he cul i a ion o oma oes h i e in he
clima ic condi ions a ound he ci y o Pi eş i,
Romania?
Ago a ecognized ha simply e ie ing da a on op imal
sowing condi ions o oma oes is insu icien . To deli e a
p ecise esponse, i also ini ia es a seconda y call o ob ain
his o ical clima e da a o Pi eş i. In ano he example:
Wha is he cooles mon h in he ci y o Sibiu?
Ago a e ie es mon hly empe a u e a e ages o he
gi en loca ion and sends hem o an agg ega ion API o
pe o m he min ope a ion o e he gi en in e al.
In addi ion o answe ing ques ions accu a ely, Ago a is
highly modula . Ins ead o elying on a single la ge-scale
model, i u ilizes smalle domain-speci ic language models,
each ailo ed o handle speci ic ypes o da a. These models
a e coo dina ed by a la ge in e iewe model, ha in eg a es
and p ocesses hei ou pu s.
The domain-speci ic models ha we ained ocused on
ag icul u e, wea he , and clima e da a. Howe e , Ago a is
lexible and can be deployed o a wide ange o asks, wi h
he abili y o scale and inco po a e addi ional domains as
needed.
Mos impo an ly, by dis ibu ing expe ise ac oss mul iple
models, Ago a is scallable. Using mul iple models wi h a
ela i ely low numbe o pa ame e s, i suppo s aining and
in e ence ha can be e icien ly pe o med on commodi y
GPUs.
Ago a add esses a key gap in cu en esea ch, which is
la gely ocused on aining massi e, gene al-pu pose models
wi h b oad knowledge and billions o pa ame e s. These
la ge-scale sys ems—o en p op ie a y—a e ou o each o
academia, small businesses, and o he esou ce-cons ained
use s. They come wi h usage cos s and lock use s in o closed
ecosys ems. In con as , we belie e he u u e lies in smalle ,
high-expe ise models ailo ed o speci ic domains. These
models can be ained and deployed on modes ha dwa e, a e
easie o e alua e, and ely on expe da a o p oduce eliable,
a ge ed esponses. Ago a is buil wi h his ision in mind.
In Sec ion II we ou line he challenges encoun e ed
du ing he de elopmen o Ago a and explain how i s
design e ec i ely add esses each o hese issues. We explain
Ago a’s implemen a ion in Sec ion III. In Sec ion IV we
discuss model aining and e alua ion. In Sec ion Vwe
p o ide an in-dep h e iew o he exis ing app oaches ha
ollow simila di ec ions. In Sec ion VI we discuss limi a ions
and inally, in Sec ion VII we p esen conclusions and u u e
wo k.
II. DESIGN CHALLENGES WHEN BUILDING AGORA
To begin designing Ago a, we i s explo ed he exis ing
domain-speci ic models ele an o ou a eas o in e es . The
de elopmen o la ge language models (LLMs) has made
signi ican p og ess in ields such as ag icul u e [9] and
clima e science [4],[10]. Such models we e c ea ed by op ing
o a base language model, which may be ei he open-sou ce
VOLUME 13, 2025 84113
A. Ud escu, D.-M. Popo ici: Ago a: A Dis ibu ed Language Model F amewo k Wi h API-Call Suppo
(e.g., he LLaMA [11] amily) o p op ie a y (e.g., he GPT-4
amily). Nex , wo main app oaches a e employed. The i s
elies on (i) p omp enginee ing, whe e he model is guided
by ins uc ions and/o examples o answe domain-speci ic
ques ions, as seen in [4]. This may in ol e augmen ing
p omp s wi h ele an da ase agmen s and explana o y
in o ma ion o imp o e esponse accu acy. Al e na i ely,
(ii) he model can be ine- uned using a da ase o example
que ies and he co esponding a ge answe s.
Ou expe ience has shown ha p omp enginee ing ends
o pe o m well on la ge models (wi h mo e han 100 billion
pa ame e s). Fine- uning can yield good esul s wi h much
smalle models (e.g., he LLaMA2 7B model [11]). Howe e ,
bo h me hods a e inhe en ly cons ained by hei eliance
on da a encoun e ed du ing aining, which limi hei
easoning abili ies o ha in o ma ion.
Re ie al-Augmen ed Gene a ion (RAG) [12] is a common
app oach o add essing his limi a ion. RAG ope a es in wo
phases: i s , a e ie e analyzes he use que y and e ie es
ele an da a om a da abase o ex e nal sou ce. Second,
hese da a a e appended o he use que y as con ex , which
he LLM hen uses o gene a e i s esponse.
We ound ha ypical e ie al componen s in RAG a e
o en simple, ypically employing basic sea ch me hods such
as BM25 [13], which use keywo d ma ching and do no
scale well o complex nume ical da ase s. Fo example,
conside ing he que y om he p e ious sec ion: ‘‘Wha
is he cooles mon h in he ci y o Sibiu?’’, a BM25-
based e ie e s uggled o iden i y he speci ic da ase slice
necessa y o answe his que y. A bes , he en i e da ase
can be appended o a use que y. Howe e , his app oach is
imp ac ical because language models ha e a limi ed con ex
window (e.g., LLaMA 2’s 4096- oken limi is equi alen o
app oxima ely 3000 wo ds).
To add ess hese issues, we examined me hods ha suppo
in oking ex e nal APIs du ing ex gene a ion, such as [8].
The au ho s p oposed a language model ha has been ained
o selec i ely pause ex gene a ion, in oke an API and
inco po a e he esponse in o ongoing ex gene a ion. This
me hodology which includes da ase c ea ion and he esul -
ing model, is e med Tool o me . Tool o me le e ages he
inhe en easoning abili ies o he small-size 6B-pa ame e
GPT-J model [14] o au oma ically anno a e an exis ing
da ase wi h API calls. Subsequen ly, he da ase is used o
ine- une he same model o each i whe e o pe o m API
calls and how o selec he app op ia e pa ame e s. Fu he
de ails a e p esen ed in Sec ion III.
By expe imen ing wi h Tool o me we iden i ied i e
essen ial c i e ia ha we e essen ial in c ea ing Ago a:
1) API-call suppo : Ou sys em needs o lea n when and
how o pe o m ex e nal API-in oca ions in o de o
suppo i s answe -gene a ion p ocess;
2) API-call o ches a ion: In many cases, API esponses
may in luence he cons uc ion o subsequen API-
calls. Fo ins ance, o many use que ies, we i s
ga he da a ega ding mon hly p ecipi a ion, hen i , and
FIGURE 2. Ago a a chi ec u e.
based on he esul ing alue sea ch o c ops whose
op imal condi ions ma ch hose p ecipi a ion a e ages.
The sys em needs o suppo such dependencies
be ween API-calls.
3) API scalabili y: New da a sou ces exposed ia new
APIs should be easy o in eg a e wi hou equi ing an
en i e e aining o he sys em
4) Sys em scalabili y: T aining he en i e sys em should
be achie able on commodi y GPUs, wi h minimal
cos s.
5) Open-sou ce eliance: The sys em should exclusi ely
u ilize open-sou ce models o in e ence, allowing
o easy adap a ion and implemen a ion in a ious
en i onmen s.
O hese i e ea u es, Tool o me sa is ies only 1., 4.
and 5. c i e ia. In addi ion, we ound ha c i e ia 2. and 3.
con lic when a emp ing o c ea e a single language model.
As he ype o o ches a ion be ween API-calls becomes mo e
elabo a e, smalle models end o unde -pe o m, and he
da a size equi ed o aining inc eases exponen ially wi h
he numbe o APIs. The e o e, in o de o build Ago a,
we expe imen ed wi h he idea o c ea ing a mul i-model
sys em, ha sepa a es he asks o p ope ly add essing API-
calls and esponses, om he ask o API-call o ches a ion.
III. AGORA ARCHITECTURE
The Ago a was a cen al public space in ancien G eece
whe e ci izens ga he ed o discuss public ma e s and make
decisions. Ou sys em is inspi ed by Ago a’s concep .
I consis s o a collec ion o models E1, . . . , Enhence o h
called expe s oge he wi h a special model MI, which we
e m he in e iewe model.
Whene e a use que y is ecei ed, MI, as well as all he
expe s gene a e one oken a a ime simul aneously on he
same inpu p omp . A speci ic poin s du ing his p ocess,
when necessa y, MI delega es con ol o an expe model Ei o
con inue he esponse. This p ocess is illus a ed in Figu e 2,
which shows he In e iewe model coo dina ing wi h h ee
expe models. Each expe is ained o in oke API calls o
e ie e da a om hi d-pa y eposi o ies, da abases, o o he
ex e nal sou ces. Fo complex APIs, we ain a dedica ed
84114 VOLUME 13, 2025
A. Ud escu, D.-M. Popo ici: Ago a: A Dis ibu ed Language Model F amewo k Wi h API-Call Suppo
FIGURE 3. Ago a a chi ec u e.
model o handle hose in e ac ions. In con as , o simple
APIs ha can be lea ned om examples, a single expe is
ained o manage mul iple call ypes.
Once an expe model Eicomple es i s ask, con ol e u ns
o MI, which esumes ex gene a ion un il ano he expe
is equi ed o he esponse is comple e. To cla i y his ex
gene a ion wo k low, we p o ide an example in Figu e 4.
The use ques ion is shown in g ay. The ex gene a ed by
MI is illus a ed in blue, whe eas ha gene a ed wi h he aid
o expe E1is shown in o ange. E1is he wea he o ecas
and clima e model. To p ope ly add ess his ques ion, MI
no ices ha empe a u e in o ma ion is necessa y and uses E1
o gene a e he app op ia e ex .
Figu e 3illus a es how he In e iewe model o ches-
a es API in e ac ions be ween wo expe models. In
S ep 1, he In e iewe delega es he ini ial pa o he
esponse o he i s expe . This expe issues an API
call and uses he e ie ed da a o gene a e i s con ibu ion
(S ep 2). In S ep 4, he In e iewe in okes a second
expe o con inue he esponse. This second expe elies
on he ou pu o he i s o cons uc a alid API call—
highligh ing a dependency be ween he wo expe s. Such
dependencies can span mul iple expe s and occu in a bi a y
sequences.
A conc e e example o his in e ac ion is shown in Figu e5,
which in ol es wo expe s, E1and E2.E1specializes
in clima e- ela ed da a, while E2 ocuses on c op- ela ed
knowledge. As be o e, he ques ion is shown in g ey. Since
clima e da a is needed i s , he In e iewe s a s ex
gene a ion wi h E1. When c op-speci ic in o ma ion becomes
ele an , E2is engaged o con inue he esponse. Finally, MI
syn hesizes he comple e answe based on he ou pu s om
bo h expe s.
Nex we discuss how each ype o model (in e iewe /ex-
pe s) has been buil .
A. EXPERT MODELS
Expe models a e ained independen ly and hei ask is
o pe o m accu a e calls o a designa ed API (o APIs).
In his wo k we conside ou APIs: (i) cu en da e
- which is necessa y when que ies use ime exp essions
FIGURE 4. Gene a ing answe s wi h Ago a.
ela i e o he cu en momen in ime, such as now, oday,
omo ow, (ii) wea he & clima e - which e ie es wea he
o ecas s as well as que ies ela ed o p ecipi a ion, wind
and empe a u e eco ds o up o 10 yea s in he pas ,
(iii) c ops - which e ie es ag icul u al da a ega ding
o op imal plan ing pe iods and p ecipi a ion equi emen s,
om a collec ion o da a-sou ces and (i ) agg ega ion
- which compu es maximal, minimal and a e age alues
o e lis s o in ege s. We ained h ee expe s o handle
hese ou APIs. Each expe model was ained ia ine-
uning, s a ing om he same base LLaMA-3-8B-Ins uc
model [15].
LLaMA-3-8B-Ins uc is al eady ine- uned o ins uc ion-
ollowing [15], making i a s ong ounda ion o ask-speci ic
dialogue and assis an beha io . This means be e alignmen
and usabili y ou o he box—especially o s uc u ed, guided
ou pu s like ou s. Despi e i s smalle size, i ma ches o
ou pe o ms la ge models such as LLaMA 2–13B and GPT-
3.5, while emaining ligh weigh enough o un on a single
commodi y GPU [15]. Jus as impo an ly, LLaMA-3 is open-
sou ce, gi ing us he lexibili y o ine- une and deploy he
model p i a ely, ensu ing compliance wi h sensi i e da a
equi emen s.
The API call ep esen a ion is inspi ed om [8]. Fo a gi en
API a, a call is a pai ( a,i1. . . , in) whe e adesigna es he call
name, while i1, . . . , ikdesigna e he kpa ame e s o he call.
API calls a e encoded as a bi a y wo d sequences iden i ied
only wi h special okens ma king he s a and end o a call,
as illus a ed below:
<API_CALLa> a(i1, . . . , in)</API_CALLa>
Thus, du ing ex gene a ion, whene e he expe model
p edic s <API_CALLa>as he nex oken o be gene a ed,
he ollowing s eps occu :
•The model con inues gene a ion in e nally un il
he comple e sequence o he call, ending when
</API_CALLa>is p oduced;
•The ac ual call a(i1, . . . , in) o API ais pe o med and
he esponse sequence is e ie ed, and appended o he
cu en con ex ;
Th ough ine- uning, expe models Eilea n how o
cons uc alid API calls, and by ca e ully designing he
aining da a, hey also lea n when API-calls should ake
place. In Sec ion IV we go in o mo e de ail in o ou app oach
o aining expe s as well as da ase ins umen a ion.
Mo e de ails ega ding he API syn ax can be ound in
Appendix A.
VOLUME 13, 2025 84115
A. Ud escu, D.-M. Popo ici: Ago a: A Dis ibu ed Language Model F amewo k Wi h API-Call Suppo
FIGURE 5. Ago a wi h wo expe s.
B. LIMITATIONS WHEN BUILDING ONE EXPERT ACROSS
MULTIPLE APIS
Using ca e ul ins umen a ion o he aining da ase [8],
a model can be ained o pe o m calls o mul iple APIs.
Howe e , his app oach does no scale well when he numbe
nand he complexi y o di e en suppo ed APIs inc eases.
Suppose a1is he numbe o ine- uning examples ha a
model needs o ain o accu a ely lea n one API ype (e.g. he
Wea he API). I a comple ely new and independen API ype
(say C ops) is o be added, ano he bexamples would su ice.
Howe e , i we would like o cap u e possible dependencies
be ween he ela i e posi ion o one API call wi h espec o
he o he in he ex (e.g. many C op calls a e likely o ha e a
Wea he call p io o hem, and C op answe s may in luence
how he Wea he calls a e being pe o med), hen he aining
da a needs o ha e en ies whe e one call occu s in ela ion o
ano he (o de O( a× b) en ies), as well as en ies whe e calls
occu independen ly. Hence, he comple e aining da ase o
lea ning he wo APIs wi h dependencies be ween hem is as
ollows:
O( a· b+ a+ b) (1)
This illus a es he API-call scalabili y p oblem high-
ligh ed in Sec ion II. As he numbe no APIs inc eases, he
da ase size equi ed o lea n hem inc eases exponen ially
wi h espec o n.
Mo eo e , he dependencies be ween API calls a e no
solely posi ional. Conside he example in Fig. 5. To assess
whe he peach ees a e sui able o plan ing in he a ea o
Cons an a, we need o ha e knowledge abou p ecipi a ion
a e ages in Cons an a om he Wea he API. Mo e gene ally,
a esponse om an API call may in luence he manne in
which ano he subsequen call is pe o med. T aining da a
size is no he only limi a ion - as he numbe o ine- uning
1Ou expe ience shows ha a≤10000. Mo e de ails in Sec ion IV.
examples inc eases, small models such as LLaMA-3-8b a e
no longe capable o sus aining such in ica e co ela ions,
and hei associa ion pe o mance deg ades.
Thus, ou solu ion eplaces he ‘‘single model’’ scena io
as well as he massi e da ase equi ed o ine- uning, wi h
se e al smalle expe models, each adap ed o a gi en
domain ia a s aigh o wa d ine- uning ask. To co ela e he
ex gene a ed by such expe s, we equi e ano he dedica ed
model.
C. THE INTERVIEWER MODEL
The in e iewe model (MI) was ained speci ically o
handle he ask o expe model mode a ion. Mo e speci -
ically, MI is esponsible o deciding when an expe LLM
should be used o gene a e pa s o he answe , as well as
o in eg a e hese pa s in he o e all answe , whene e
necessa y. We implemen ed se e al op ions o achie e his
mode a ion. The i s , e med sequen ial con ol, is shown
in Fig. 6. In o mally, in his se up, ’’ he in e iewe asks
an expe ‘‘.MI gene a es oken sequences x1, . . . , xn. Nex ,
i decided o use expe E1 o con inue he answe . This is
achie ed by gene a ing a special oken CTRLi(CTRL1in
Fig. 6). Expe model E1uses x1, . . . , xnas he ini ial
con ex (i.e. he sequence o p e iously-gene a ed okens).
I gene a es okens yn+2, . . . , yn+m, ollowed by a STOP
oken ha e u ns he con ol o MI.
FIGURE 6. Sequen ial con ol passing be ween models.
When o mula ing a ques ion ela ed o plan cul i a ion
MI migh accu a ely decide o allow he C ops expe model
84116 VOLUME 13, 2025
A. Ud escu, D.-M. Popo ici: Ago a: A Dis ibu ed Language Model F amewo k Wi h API-Call Suppo
o con inue gene a ion. Howe e , ou ini ial expe imen s
showed ha MI does no always exhibi con ex sensi i i y.
O en imes, he expe is mo e capable o assessing he con-
ex and de e mining when o s a ’’ alking‘‘ by gene a ing
i s own CTRLi oken. The e o e, we also included ano he
scena io, pa allel gene a ion, illus a ed in Figu e 7.
He e, he expe models eps and in e up s he In e iewe .
To achie e his, we ha e all he expe models con inuously
gene a e okens. As be o e, oken ykis gene a ed by an expe
based on he his o y o okens x1, . . . , xk−1which ep esen
he cu en con ex . When an expe gene a es a oken CTRLi,
i p eemp s he In e iewe . All subsequen okens up un il
STOP a e pa o he use ’s answe . In Figu e 7, all okens
ha he use does no see a e indica ed by dashed lines.
This si ua ion o en occu s when an expe decides o
pe o m an API call. We obse ed ha in almos all scena ios
such a decision is con ex ually co ec and should be
p io i ized o e he ex gene a ed by he In e iewe .
FIGURE 7. Pa allel gene a ion wi h mul iple models.
Finally, ou expe ience has also shown ha he expe -
gene a ed ex needs o be p ocessed be o e ou pu . Fo his
eason we in oduced Pa allel gene a ion wi h mode a ion
(see Figu e 8). He e, we apply a ex ans o ma ion unc ion
g o he sequence o okens yn+2, . . . , yn+mgene a ed by an
expe .
FIGURE 8. Pa allel gene a ion wi h mode a ion.
I gis he iden i y unc ion (g(s)=s), hen he ex
gene a ed by he expe is unmodi ied. I g(·)=ϵ( he emp y
s ing), hen he en i e sequence gene a ed by he expe is
e ec i ely hidden om he use . Howe e , his sequence will
s ill be pa o he con ex which MI uses o gene a e ex .
In Figu e 8, no e ha oken xn+m+2is gene a ed based on he
sequence o okens x1,x2, . . . , xn,yn+2, . . . , yn+m, o which
he expe E1con ibu ed wi h yn+2, . . . , yn+m. This o m
o hiding okens is pa icula ly help ul in dealing wi h API-
calls ha p oduce abula da a as a esponse. Fo example,
he Wea he expe is ained o gene a e such API calls.
We wan his ype o in o ma ion o be in he con ex o he
In e iewe o d aw a conclusion, bu no be explici o he
use . We illus a e his si ua ion using he example shown in
Fig. 9.
A e s a ing a sen ence, MI decides o yield he con ex o
he Cu en da e expe . The con ol okens we e omi ed o
b e i y. The expe pe o ms a call o e ie e he cu en da e.
This call akes no pa ame e s. The ac ual call will be hidden
om he use (illus a ed wi h whi e boxes in Figu e 9), bu
kep in he con ex window. A e he expe has inished he
call, con ol is esumed by MI which swi ches o he Wea he
expe . A his poin , he cu en con ex con ains he loca ion
as well as da e, which will be used by he second expe model
o cons uc i s wea he - ela ed API-call. Once he call has
inished, ex gene a ion con ol e u ns o MI, which uses
he ime and o ecas in o ma ion o p oduce i s conclusion.
This example highligh s se e al key ai s o ou app oach:
•We use API calls no only o gene a e ex -answe s, bu
also o add con ex ual in o ma ion lexibly. By using
g, we keep his in o ma ion in he con ex so ha he
In e iewe can eason abou i and a he same ime hide
i om he use . This is akin o dynamic gene a ion o
que y-dependen p omp s.
•The example in Figu e 9also illus a es he dependency
be ween he answe gi en by he Cu en da e API,
and he cons uc ion o he subsequen Wea he call.
In p ac ice we ind many such dependencies, some imes
cascading o e h ee o mo e calls. Fo ins ance,
we migh need o e ch he cu en da e, based on i
iden i y wea he - ela ed da a, hen pe o m an a e age
o e he esul and inally e ch c op- ela ed in o ma ion
based on ha a e age.
IV. TRAINING AND EVALUATING AGORA
Ago a was buil as a esul o an i e a i e e inemen p ocess,
in which we explo ed di e en designs o achie e ou goals.
As men ioned in Sec ion II, hese we e: (i) API-call suppo ,
(ii) he abili y o suppo dependencies be ween calls i.e. API-
call o ches a ion, (iii) he abili y o easily in eg a e new
APIs - API scalabili y, (i ) he abili y o ain he en i e
sys em on commodi y GPUs (sys em scalabili y) and ( )
open-sou ce eliance.
A. FINE-TUNING A SINGLE MODEL FOR API-CALL
SUPPORT
Ou i s s ep was o apply he Tool o me me hodology [8],
which consis s o c ea ing a single model ine- uned o
add ess ou di e en ypes o API-calls. Tool o me selec-
i ely inse s API calls in o a la ge da ase , enabling he model
o lea n when and how o gene a e hem.
1) DATASET
Ou me hodology di e ged om ha in [8] because o he
una ailabili y o such an exis ing da ase . Ag icul u al ex s
and da ase s gene ally lack he su icien empo al and spa ial
da a equi ed o accu a e wea he and clima e API calls, wi h
ele an examples occu ing oo in equen ly o que ies on
op imal sowing condi ions.
VOLUME 13, 2025 84117
A. Ud escu, D.-M. Popo ici: Ago a: A Dis ibu ed Language Model F amewo k Wi h API-Call Suppo
FIGURE 9. Illus a ing he usage o he unc ion g.
To build aining da a o each o ou APIs—Cu en
Da e,Wea he & Clima e,C ops, and Agg ega ion—we used
GPT-4o o gene a e sepa a e da ase s o ques ion–answe
pai s anno a ed wi h API calls. Ensu ing di e si y in hese
da ase s was essen ial o he expe models o gene alize
e ec i ely. We began by w i ing hand-c a ed ques ion
empla es ha a ied in complexi y, in ol ing be ween
one and h ee expe s, and cap u ing di e en ypes o
dependencies ac oss domains. GPT-4o was hen used o
p oduce seman ic a ia ions o hese empla es, using a
ange o p omp s o encou age di e se language s yles and
ph asings. Finally, GPT-4o ins an ia ed each empla e by
illing in speci ic de ails—such as loca ions, c op ypes,
o clima e condi ions— o c ea e ully conc e e ques ions.
Using he selec ed pa ame e s, we cons uc ed a co ec se
o API calls, que ied he ele an da a and asked GPT-4o o
gene a e an answe , oge he wi h he necessa y API calls,
esul ing in a comple e ques ion–answe pai .
The p omp s used o each API a e p o ided in
Appendix B-A, along wi h mo e de ails on he addi ional p o-
cessing pe o med on he syn he ically gene a ed examples.
2) TRAINING
We chose a ela i ely small, s a e-o - he-a open-sou ce
LLM, speci ically LLaMA-3-8B-Ins uc , because he
LLaMA-3 amily consis en ly ou pe o ms o he models o
simila size. We pe o med ine- uning using LoRA [16]
and QLoRA [17] adap e s, o i he memo y GPU
limi a ions. The exac hype -pa ame e s ha we used a e
p esen ed in Appendix B-C. To achie e objec i e (i ),
we selec ed he NVIDIA AD102 GeFo ce RTX 4090 GPU, a
high-pe o mance, cos -e ec i e, and eadily a ailable piece
o ha dwa e on he ma ke . We u ilized h ee such GPUs each
wi h 24,576 MB o a ailable VRAM, accessed ia he CUDA
API.
3) EVALUATION
We began by iden i ying he main ca ego ies o use -
ele an ques ions, wi h a ocus on complex que ies ha
equi e in eg a ing in o ma ion ac oss mul iple domains.
This analysis included mapping ou all possible dependency
ela ionships be ween API calls. Ou indings show ha
wea he and clima e da a o en se e as ounda ional inpu s,
wi h c op- ela ed que ies ypically depending on bo h o
hese, as well as he cu en da e. Agg ega ion API calls
end o ely on he esul s o p io API esponses, such
as hose om wea he , clima e, o c op se ices. Based on
hese insigh s, we designed a comp ehensi e se o ques ion
empla es ha sys ema ically e lec he ull ange o possible
dependencies. Using an app oach simila o ha desc ibed
in Sec ion IV-A1, we gene a ed di e se ques ions g ounded
in hese empla es. We hen used GPT-4o—guided by he
p omp s desc ibed in he Appendix— o p oduce use que ies
e enly dis ibu ed ac oss he iden i ied ca ego ies, including
examples in ol ing only a single API call. These que ies a e
dis inc om hose used in aining (see Sec ion IV-A1) and
we e no seen by he model du ing ine- uning.
We subjec ed each o he sys ems unde sc u iny o hese
ques ions and manually g aded he answe s on a scale o
1 o 5. G ades 3 - 5 a e assigned o answe s whe e all API
calls a e co ec o only pa o hem, bu he o e all answe
and unde lying easoning a e alid. G ades 1 and 2 e e o
84118 VOLUME 13, 2025
A. Ud escu, D.-M. Popo ici: Ago a: A Dis ibu ed Language Model F amewo k Wi h API-Call Suppo
answe s in which API calls a e in alid and dependencies a e
misiden i ied.
Fo each que y included in ou e alua ion, we ha e
de e minis ic knowledge o bo h he speci ic API call(s) ha
need o be in oked and he dependencies be ween hem.
This p ede ined s uc u e elimina es any ambigui y in he
e alua ion p ocess, allowing us o assess each esponse wi h
comple e ce ain y and ensu ing a igo ous and objec i e
accu acy analysis.
The esul s a e shown in Figu e 10.Accu acy e e s o he
pe cen age o sco ed g ades om h ee o i e ou o he o al,
while Pe cen age o pe ec answe s e e s o hose g ades o
i e ou o he o al.
FIGURE 10. Ago a pe o mance compa ed o o he app oaches.
Ou i s expe imen s consis in applying he Tool-
o me [8] me hodology on ou aining se , and wi h ou
language model choice. Al hough he esul s we e p omising
( i s ow in Figu e 10), and be e esul s could be ob ained
by inc easing he size o he da ase o ha o he model, ou
obse a ion was ha he single model lacked he abili y o
associa e mul iple API calls, e en hough i had knowledge
o each a ailable API. Ou a emp s a ins umen ing he
da ase o cap u e API call dependencies e ealed he API-
call scalabili y p oblem discussed in Sec ion III-B.
B. AGORA - INTERVIEWER WITH MULTIPLE EXPERTS
To enhance pe o mance and add modula i y (i.e. ou
objec i e (iii)), we in oduced he Ago a sys em, which dis-
ibu es he API-call gene a ion asks ac oss expe models,
dependency handling and gene a ing conclusions - o he
In e iewe model. We used he same s a egy as be o e o
gene a e he da ase s o he aining o expe s.
In ou i s Ago a i e a ion, we used he same 8B base
model o expe s and he In e iewe . The aining p ocedu e
o each expe model ollows he de ails in Sec ion IV-A2.
Table 1ou lines he APIs managed by each expe .
Fo he in e iewe model we used a sys em p omp
ha con ained a sho desc ip ion o each API, a g amma
showcasing hei syn ax and ew-sho examples o hei use.
This sys em p omp , which can be ound in he Appendix,
enables he in e iewe o unde s and how di e en APIs can
TABLE 1. The numbe o examples used o ain each expe model.
be linked o combined when a use que y lacks su icien
in o ma ion o a single API call. Mo eo e , using a sys em
p omp ensu es he modula i y o he Ago a sys em because
adding o emo ing one o mo e APIs equi es only upda ing
his p omp .
We obse ed a signi ican imp o emen in pe o mance
and Ago a success ully chained mul iple API calls, hus
achie ing objec i e (ii). The esul s a e shown in Figu e 10
(second column).
Ou subsequen objec i e was o u he inc ease he
sys em pe o mance. We expe imen ed wi h a 70B in e -
iewe model along wi h 8B expe models. This led o
s ong pe o mance esul s, as shown in Figu e 10 ( hi d
column), because he la ge model’s enhanced easoning
abili ies enabled i o be e unde s and he a ailable APIs and
combine hem. Simul aneously, new expe models can be
ained independen ly, wi h no in e en ion equi ed o he
exis ing ones, and wi h minimal API desc ip ions ha need o
be added o he In e iewe ’s sys em p omp , hus achie ing
objec i e (iii).
Howe e , unning in e ence on he 70B in e iewe
model wi h an NVIDIA AD102 is no easible because o
insu icien GPU memo y, equi ing us o swi ch o a mo e
capable A100 wi h 80GB o memo y.
1) MEMORY CONSTRAINTS DURING INFERENCE
Using LoRA [16] and QLoRA [17] adap e s du ing ine-
uning, each expe can be o med by combining a base model
wi h a small adap e , which can be easily plugged in o
emo ed as needed (see Figu e 11).
This s a egy signi ican ly enhances memo y e iciency
and educed GPU memo y equi emen s by up o h ee
imes, as epo ed in [16] compa ed wi h adi ional ine-
uning me hods. This app oach o e s se e al ad an ages o
Ago a’s a chi ec u e, which, in p inciple, is designed o
n+1 models, whe e n ep esen s he numbe o expe s
oge he wi h he In e iewe . Because all n+1 models mus
un o each gene a ed oken, we can achie e his by loading
he n+1 base models on a su icien ly la ge GPU (o mul iple
GPUs).
Howe e , we can achie e be e memo y u iliza ion by
loading one base model on a GPU, oge he wi h kadap e s,
one o each expe . This means ha on each GPU, we can un
he in e ence om kdi e en expe s sequen ially by simply
VOLUME 13, 2025 84119
A. Ud escu, D.-M. Popo ici: Ago a: A Dis ibu ed Language Model F amewo k Wi h API-Call Suppo
swapping ou he co esponding LoRA adap e s, as illus a ed
in Figu e 11. This me hod educes he numbe o base
models ha need o be loaded simul aneously, hus sa ing
compu a ional esou ces a he expense o inc eased ime
owing o sequen ial loading.
Addi ionally, when he in e iewe model is signi ican ly
la ge han he expe models, we can load he in e iewe
on o a dedica ed GPU. The expe s which a e much smalle
han he In e iewe , can be e enly dis ibu ed ac oss he
emaining GPUs.
FIGURE 11. Ago a pe o mance compa ed o o he app oaches.
Figu e 11 showcases se e al scena ios used du ing ou
e alua ion. In Scena io 1, ou smalle GPUs un in e ence
in pa allel, an a angemen we used o assess he ini ial
e sion o Ago a. As he e alua ion mo ed o he la ge
70B In e iewe model, we ansi ioned o mo e powe ul
80GB A100 GPUs. Scena ios 2 and 3 demons a e how base
models and adap e s a e dis ibu ed ac oss a ailable memo y.
In Scena io 3, o example, a single GPU holds one base
model and wo adap e s, enabling in e ence o wo expe
models, E2and E3, which gene a e okens sequen ially. The
adap e -swi ching ime is app oxima ely 20 imes sho e
han he ime needed o gene a e a oken. Despi e ha dwa e
limi a ions, Scena ios 2 and 3 illus a e ha ou mul i-model
sys em can s ill ope a e e icien ly, hough wi h educed
pe o mance due o sequen ial in e ence.
C. AGORA COMPRESSION
The p ima y limi a ion o he Ago a sys em, as desc ibed in
Sec ion IV-B, is he subs an ial esou ce demands necessa y
o achie e op imal pe o mance. This is mainly because o
he need o load bo h he LLaMA-3-70B-Ins uc model
(In e iewe ) and he expe s’ base model, LLaMA-3-8B-
Ins uc , esul ing in excessi e memo y consump ion. The
memo y o e head su passed he capabili ies o NVIDIA
AD102 GPUs alone. To mi iga e his, we de eloped a new,
s andalone model, which we ained using he esponses
p o ided by Ago a. We call his model a ‘‘comp ession
model’’, because unlike Ago a, i is a single language model,
bu is able o ep oduce, and e en enhance he pe o mance
o Ago a. To achie e his, we p oceeded as ollows:
•We s a ed wi h a da ase o ques ions and used Ago a
o gene a e answe s. We employed an 70B in e iewe
and 8B expe models. This da ase includes examples
demons a ing indi idual API usage as well as examples
o chained API calls.
•Ago a has g ea bu no pe ec accu acy, hence we
ca e ully il e ed i o e ain only ques ion-answe pai s
wi h highly accu a e mul i-API call examples. The inal
da ase con ains 17,000 such en ies.
•We ine uned a single LLaMA-3-8B-Ins uc model wi h
his da ase , esul ing ou comp ession model.
The pipeline o aining he Ago a comp ession model, ely-
ing on he p e ious s ages we ha e desc ip ed, is illus a ed
in Figu e 12.
FIGURE 12. All s eps equi ed o build he Ago a comp ession model.
While his model demons a ed he bes pe o mance
o e all, e en sligh ly su passing Ago a wi h he 70B in e -
iewe (Fig. 10), he e we e some ade-o s. The comp ession
model:
•canno be ine- uned independen ly o Ago a, as he e
is no exis ing da ase sui able o his ask. Ins ead, he
equi ed da ase mus be gene a ed di ec ly using Ago a,
o a simila ool.
•sac i ices modula i y (i.e. objec i e ( )), meaning i
canno - by i sel accommoda e new APIs wi hou
eso ing o Ago a as p e iously desc ibed.
Howe e , we ound ha ine- uning he newe e sions o a
comp ession model is an accep able comp omise o a ious
applica ions. The ine- uning ime - pe o med on an NVIDIA
A100, is unde wo hou s.
84120 VOLUME 13, 2025
A. Ud escu, D.-M. Popo ici: Ago a: A Dis ibu ed Language Model F amewo k Wi h API-Call Suppo
APPENDIX B
PROMPTS FOR BUILDING AGORA
A. PROMPTS FOR BUILDING TRAINING DATASETS FOR
EXPERTS
Below, we p esen he p omp s used o gene a e da ase s o
each API, along wi h sample ou pu s om he gene a ed da a
o cla i y.
C ops and Agg ega ion: The ques ion-answe pai s in
he da ase s o he c ops and agg ega ion APIs we e
gene a ed di ec ly using GPT-4, wi hou any addi ional
il e ing, by employing he p omp s shown in Figu es 18,
19 and 20. Each API is linked o a co esponding back-end
unc ion, which ei he e u ns he cu en da e o que ies an
in e nal da abase o sowing condi ions.
Fo handling he cu en da e and wea he & clima e APIs,
we gene a ed he ques ion and he pa ame e s o he calls
using GPT-4o. We began by aking a lis o Romanian ci ies
along wi h hei la i ude and longi ude, which we e la e
needed when in e ac ing wi h he wea he and clima e API.
FIGURE 18. P omp o gene a ing da ase o C ops expe da ase .
FIGURE 19. P omp o gene a ing da ase o Agg ega ion expe
da ase .
We w o e by hand 190 di e en ime exp essions ha we
conside ed plausible o be used in a que y. Ou API p o ides,
by design, hou ly da a o in e als o up o h ee days, daily
da a o in e als o up o wo mon hs, and mon hly da a
o longe ime spans. I he use ’s que y includes he name
o a coun y, i is passed as a pa ame e in he call. I no
coun y is speci ied, he pa ame e alue de aul s o ‘‘???’’,
in which case he sys em assumes he que y e e s o he
la ges ci y wi h he gi en name. In his si ua ion, he coun y
name is e ie ed au oma ically om a da abase. We il e ed
ou examples whe e da a e ie al om ou sou ces was
unsuccess ul and hen asked GPT-4o-mini wi h adjus ing
he cu en da e and all ela ed da es in each example. This
ensu ed ha he inal da ase does no con ain examples
ied exclusi ely o he o iginal c ea ion da e. In addi ion o
ully anno a ed examples, we ealized he need o mo e
examples ocusing solely on he wea he & clima e API call
pa ame e s. To boos he model’s abili y o selec he co ec
pa ame e s o his API, we gene a ed addi ional examples
ha con ained only he ques ion and he call pa ame e s,
omi ing he inal esul and ex easoning abou he e ei ed
da a.
FIGURE 20. P omp o gene a ing da ase o Wea he & clima e expe
da ase .
Table 4ou lines he numbe o examples gene a ed o
aining each expe .
VOLUME 13, 2025 84127
A. Ud escu, D.-M. Popo ici: Ago a: A Dis ibu ed Language Model F amewo k Wi h API-Call Suppo
TABLE 4. The numbe o examples used o ain each expe model.
B. PROMPTS FOR CROSS-DOMAIN QUESTIONS
To gene a e ques ions ha equi e API calls om bo h he
Wea he & Clima e and Agg ega ion expe s, we used he
p omp shown in Figu e 21. We used simila p omp s o all
o he co ss-domain ques ions.
FIGURE 21. P omp o gene a ing da ase o Agg ega ion expe
da ase .
Table 5shows he numbe o examples gene a ed o each
combina ion o API calls ha we deemed necessa y and ha
in ol e a leas wo expe s.
TABLE 5. The numbe o examples wi h combina ions o API calls.
C. FINE-TUNING HYPER-PARAMETERS
We applied he same aining se ings consis en ly ac oss all
ou ine- uning expe imen s.
A cons an lea ning a e (LR) was subop imal, so we
adop ed a cosine LR schedule wi h a maximum alue o
1e-4, which p o ided be e con e gence. Due o es ic ed
memo y, we used a ba ch size o 1 and ained he model
o 2 epochs wi h a maximum inpu sequence leng h o
2048 okens. We applied g adien clipping a 0.3 and se he
weigh decay o 0.1 o s abilize he aining p ocess.
Fo he LoRA hype pa ame e s, we ha e se bo h alpha and
ank ( ) o 32.
ACKNOWLEDGMENT
Views and opinions exp essed a e howe e hose o he
au ho (s) only and do no necessa ily e lec hose o
he Eu opean Union o he Eu opean Resea ch Execu i e
Agency. Nei he he Eu opean Union no he g an ing
au ho i y can be held esponsible o hem.
REFERENCES
[1] In es Romania. (2022). In e ne In as uc u e. Accessed: Jul. 22, 2024.
[Online]. A ailable: h ps://in es omania.go . o/web/in e ne -
in as uc u e/
[2] Business Agency Associa ion (BAA), ‘‘Join ly p epa ing he condi ions
in he ag icul u al and connec ed sec o s in he BSB a ea o he digi al
ans o ma ion (BSB sma a ming),’’ Duna ea de Jos, Uni . Gala i,
Gala i, Romania, Tech. Rep., 2021. Accessed: Dec. 12, 2024.
[3] S. Rezayi, Z. Liu, Z. Wu, C. Dhakal, B. Ge, C. Zhen, T. Liu, and
S. Li, ‘‘Ag iBERT: Knowledge-in used ag icul u al language models o
ma ching ood and nu i ion,’’ in P oc. 31s In . Join Con . A i . In ell.,
Jul. 2022, pp. 5150–5156, doi: 10.24963/ijcai.2022/715.
[4] N. Kolduno and T. Jung, ‘‘Local clima e se ices o all, cou esy o la ge
language models,’’ Commun. Ea h En i on., ol. 5, no. 1, p. 13, Jan. 2024,
doi: 10.1038/s43247-023-01199-1.
[5] T. T. Nguyen, J. B ands e e , A. Kapoo , J. K. Gup a, and A. G o e ,
‘‘ClimaX: A ounda ion model o wea he and clima e,’’ in P oc. 40 h
In . Con . Mach. Lea n., Jan. 2023, pp. 1–14.
[6] Fu u al P ojec . (2024). Fu u al P ojec –ag icul u e and Clima e.
Accessed: Sep. 14, 2024. [Online]. A ailable: h ps:// u u al-p ojec .eu/
[7] A. Vaswani, N. Shazee , N. Pa ma , J. Uszko ei , L. Jones, A. N. Gomez,
Ł. Kaise , and I. Polosukhin, ‘‘A en ion is all you need,’’ Ad . Neu al In .
P ocess. Sys ., ol. 30, pp. 5998–6008, Jun. 2017.
[8] T. Schick, J. Dwi edi-Yu, R. Dessì, R. Raileanu, M. Lomelí, L. Ze le-
moye , N. Cancedda, and T. Scialom, ‘‘Tool o me : Language models can
each hemsel es o use ools,’’ in P oc. Ad . Neu al In . P ocess. Sys .,
ol. 36, Jan. 2024, pp. 1–16.
[9] (2024). KissanAI. Accessed: Jul. 22, 2024. [Online]. A ailable: h ps://
kissan.ai/
[10] B. Sil a, L. Nunes, R. Es e ão, V. Aski, and R. Chand a, ‘‘GPT-4 as an
ag onomis assis an ? Answe ing ag icul u e exams using la ge language
models,’’ 2023, a Xi :2310.06225.
[11] H. Tou on, T. La il, G. Izaca d, X. Ma ine , M.-A. Lachaux, T.
Lac oix, B. Roziè e, N. Goyal, E. Hamb o, F. Azha , A. Rod iguez, A.
Joulin, E. G a e, and G. Lample, ‘‘LLaMA: Open and e icien ounda ion
language models,’’ 2023, a Xi :2302.13971.
[12] P. Lewis, E. Pe ez, A. Pik us, F. Pe oni, V. Ka pukhin, N. Goyal, H. Kü le ,
M. Lewis, W.- . Yih, T. Rock äschel, S. Riedel, and D. Kiela, ‘‘Re ie al-
augmen ed gene a ion o knowledge-in ensi e NLP asks,’’ in P oc. Ad .
Neu al In . P ocess. Sys ., Jan. 2020, pp. 9459–9474.
[13] S. Robe son and H. Za agoza, ‘‘The p obabilis ic ele ance ame-
wo k: BM25 and beyond,’’ Found. T ends In . Re ., ol. 3, no. 4,
pp. 333–389, 2009.
[14] B. Wang and A. Koma suzaki. (2021). Gp -j-6b: A 6 Billion Pa ame e
Au o eg essi e Language Model. Accessed: Aug. 12, 2023. [Online].
A ailable: h ps://gi hub.com/kingo lolz/mesh- ans o me -jax
[15] A. Al o d. (2024). Me a Releases Llama 3 Open-Sou ce LLM. Accessed:
May. 7, 2024. [Online]. A ailable: h ps://www.in oq. com/news/2024/
05/me a-llama-3/
[16] J. E. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, and W. Chen,
‘‘LoRA: Low- ank adap a ion o la ge language models,’’ in P oc. In .
Con . Lea n. Rep esen ., Jan. 2021, pp. 1–16.
[17] T. De me s, A. Pagnoni, A. Hol zman, and L. Ze lemoye , ‘‘QLoRA:
E icien ine uning o quan ized LLMs,’’ in P oc. Ad . Neu al In . P ocess.
Sys ., Jan. 2023, pp. 10088–10115.
[18] (2024). Ag i1. Accessed: Sep. 4, 2024. [Online]. A ailable: h ps://www.
ag i1.ai/en/
[19] S. Chen, G. Long, J. Jiang, D. Liu, and C. Zhang, ‘‘Founda ion models o
wea he and clima e da a unde s anding: A comp ehensi e su ey,’’ 2023,
a Xi :2312.03014.
[20] J. De lin, M. Chang, K. Lee, and K. Tou ano a, ‘‘BERT: P e- aining
o deep bidi ec ional ans o me s o language unde s anding,’’ in
P oc. NaacL-HLT, Minneapolis, MN, USA, Jan. 2019, ol. 1, no. 2,
pp. 4171–4186.
84128 VOLUME 13, 2025
A. Ud escu, D.-M. Popo ici: Ago a: A Dis ibu ed Language Model F amewo k Wi h API-Call Suppo
[21] Y. Gao, Y. Xiong, X. Gao, K. Jia, J. Pan, Y. Bi, Y. Dai, J. Sun, M. Wang, and
H. Wang, ‘‘Re ie al-augmen ed gene a ion o la ge language models: A
su ey,’’ 2023, a Xi :2312.10997.
[22] L. Yuan, Y. Chen, X. Wang, Y. Fung, H. Peng, and H. Ji, ‘‘CRAFT:
Cus omizing LLMs by c ea ing and e ie ing om specialized oolse s,’’
in P oc. 12 h In . Con . Lea n. Rep esen ., Jan. 2023, pp. 1–16.
[23] Y. Qin, S. Liang, Y. Ye, K. Zhu, L. Yan, Y. Lu, Y. Lin, X. Cong, X. Tang,
B. Qian, S. Zhao, R. Tian, R. Xie, J. Zhou, M. Ge s ein, D. Li, Z. Liu,
and M. Sun, ‘‘ToolLLM: Facili a ing la ge language models o mas e
16000+ eal-wo ld Apis,’’ in P oc. 12 h In . Con . Lea n. Rep esen .,
Jan. 2024, pp. 1–19. [Online]. A ailable: h ps://open e iew.ne / o um?id=
dHng2O0Jj
[24] S. Gao, Z. Shi, M. Zhu, B. Fang, X. Xin, P. Ren, Z. Chen, J. Ma, and Z. Ren,
‘‘Con ucius: I e a i e ool lea ning om in ospec ion eedback by easy- o-
di icul cu iculum,’’ in P oc. AAAI Con . A i . In ell., Ma . 2024, ol. 38,
no. 16, pp. 18030–18038.
[25] S. G. Pa il, T. Zhang, X. Wang, and J. E. Gonzalez, ‘‘Go illa: La ge
language model connec ed wi h massi e Apis,’’ 2023, a Xi :2305.15334.
[26] R. Yang, S. Lin, Y. Li, S. Zhao, Y. Ge, X. Li, and Y. Shan, ‘‘GPT4Tools:
Teaching la ge language model o use ools ia sel -ins uc ion,’’ in P oc.
Ad . Neu al In . P ocess. Sys ., ol. 36, Jan. 2023, pp. 1–17.
[27] W.-L. Chiang, Z. Li, Z. Lin, Y. Sheng, Z. Wu, H. Zhang, L. Zheng,
S. Zhuang, Y. Zhuang, J. E. Gonzalez, I. S oica, and E. P. Xing. (2023).
Vicuna: An Open-sou ce Cha bo Imp essing Gp -4 Wi h 90% Cha gp
Quali y. Accessed: Sep. 12, 2023. [Online]. A ailable: h ps://lmsys.o g/
blog/2023-03-30- icuna/
[28] Q. Tang, Z. Deng, H. Lin, X. Han, Q. Liang, B. Cao, and L.
Sun, ‘‘ToolAlpaca: Gene alized ool lea ning o language models wi h
3000 simula ed cases,’’ 2023, a Xi :2306.05301.
[29] C. Qian, C. Han, Y. R. Fung, Y. Qin, Z. Liu, and H. Ji, ‘‘CREATOR:
Tool c ea ion o disen angling abs ac and conc e e easoning o la ge
language models,’’ 2023, a Xi :2305.14318.
[30] T. Cai, X. Wang, T. Ma, X. Chen, and D. Zhou, ‘‘La ge language models
as ool make s,’’ in P oc. 12 h In . Con . Lea n. Rep esen ., Jan. 2023,
pp. 1–14.
[31] S. Hao, T. Liu, Z. Wang, and Z. Hu, ‘‘ToolkenGPT: Augmen ing ozen
language models wi h massi e ools ia ool embeddings,’’ in P oc. 37 h
Con . Neu al In . P ocess. Sys ., 2023, pp. 1–13. [Online]. A ailable:
h ps://open e iew.ne / o um?id=BHXsb69bSx
[32] A. S i as a a e al., ‘‘Beyond he imi a ion game: Quan i ying and
ex apola ing he capabili ies o language models,’’ in P oc. T ans.
Mach. Lea n. Res., Jan. 2022, pp. 1–95.
[33] G. Mialon, R. Dessì, M. Lomelí, C. Nalmpan is, R. Pasunu u, R. Raileanu,
B. Roziè e, T. Schick, J. Dwi edi-Yu, A. Çelikyilmaz, É. G a e, Y. LeCun,
and T. Scialom, ‘‘Augmen ed language models: A su ey,’’ T ans. Mach.
Lea n. Res., Jan. 2023, pp. 1–33.
[34] Y. Talebi ad and A. Nadi i, ‘‘Mul i-agen collabo a ion: Ha nessing he
powe o in elligen LLM agen s,’’ 2023, a Xi :2306.03314.
[35] S. Zejiang Shen, H. Lang, B. Wang, Y. Kim, and D. Son ag, ‘‘Lea n-
ing o decode collabo a i ely wi h mul iple language models,’’ 2024,
a Xi :2403.03870.
[36] Z. Chai, G. Wang, J. Su, T. Zhang, X. Huang, X. Wang, J. Xu, J. Yuan, H.
Yang, F. Wu, and Y. Yang, ‘‘An expe is wo h one oken: Syne gizing
mul iple expe LLMs as gene alis ia expe oken ou ing,’’ 2024,
a Xi :2403.16854.
[37] X. Xu, M. Li, C. Tao, T. Shen, R. Cheng, J. Li, C. Xu, D. Tao, and T. Zhou,
‘‘A su ey on knowledge dis illa ion o la ge language models,’’ 2024,
a Xi :2402.13116.
[38] A. Madaan, N. Tandon, P. Gup a, S. Hallinan, L. Gao, S. Wieg e e,
U. Alon, N. Dzi i, S. P abhumoye, Y. Yang, S. Welleck, B. P. Majumde ,
S. Gup a, A. Yazdanbakhsh, and P. Cla k, ‘‘Sel - e ine: I e a i e e inemen
wi h sel - eedback,’’ in P oc. Ad . Neu al In . P ocess. Sys ., ol. 36,
Jan. 2023, pp. 1–24.
[39] Y. Dong, R. Mu, G. Jin, Y. Qi, J. Hu, X. Zhao, J. Meng, W. Ruan,
and X. Huang, ‘‘Building gua d ails o la ge language models,’’ 2024,
a Xi :2402.01822.
ALEXANDRA UDRESCU ecei ed he enginee ing deg ee in compu e
science om he Facul y o Au oma ic Con ol and Compu e s, Na ional
Uni e si y o Science and Technology POLITEHNICA Bucha es , whe e
she is cu en ly pu suing he mas e ’s deg ee. She is a Compu e Scien is
specializing in algo i hms, o mal me hods, and machine lea ning.
DAN-MATEI POPOVICI ecei ed he Ph.D. deg ee om he Na ional
Uni e si y o Science and Technology POLITEHNICA Bucha es , in 2012.
He was a Resea ch Fellow wi h he Claus hal Uni e si y o Technology
and ICUB (Uni e si y o Bucha es ’s Resea ch Ins i u e). He is cu en ly
an Associa e P o esso wi h he Compu e Science Depa men , Na ional
Uni e si y o Science and Technology POLITEHNICA Bucha es . His
esea ch in e es s include o mal e i ica ion echniques o compu e
ne wo ks, compu ing esea ch educa ion, and NLP using language models.
VOLUME 13, 2025 84129