Lea ned Rep esen a ions Enhance
Mul i Agen Pa h Planning
Ma ius Cap a i 1He ke an Hoo 1
Abs ac
Mul i-Agen Pa h inding (MAPF) in ol es coo di-
na ing mul iple agen s o ind collision- ee pa hs
in a sha ed en i onmen . Fo la ge-scale ins ances,
sub-op imal heu is ics can be used ha a e ei-
he hand-c a ed o lea ned om da a. In his
pape , we a emp o combine hese app oaches
by aining a neu al ne wo k o modi y p oblem
ep esen a ions such ha P io i ized Planning, a
con en ional heu is ic sol e , will p oduce close -
o-op imal solu ions. The eby, we can le e age
he s ong pe o mance o exis ing heu is ics wi h
he lexibili y o da a-d i en algo i hms. T aining
he neu al ne wo k equi es p opaga ing lea n-
ing signals h ough p io i ized planning. This is
achie ed by calcula ing g adien s o a elaxa ion
o he algo i hm using a black-box di e en ia ion
app oach. Expe imen s on s anda d MAPF bench-
ma ks demons a e ha ou app oach educes PP’s
op imali y gap wi hou signi ican ly comp omis-
ing compu a ional e iciency.
1. In oduc ion
In eg a ing planning algo i hms wi h lea ning-based me h-
ods o e s a p omising s a egy o add essing complex
decision-making challenges. Classical planne s such as
cons ain sol e s, g aph sea ch algo i hms, and symbolic
planne s, o e s ong heo e ical gua an ees, in e p e abili y,
and obus ness de i ed om hei s uc u ed, ule-based ep-
esen a ions. Howe e , hese me hods o en equi e de ailed
p oblem o mula ions, po en ially limi ing hei adap abili y
in dynamic en i onmen s and ail o scale o mo e ealis-
ic scena ios. On he o he hand, lea ning-based me hods
ha e demons a ed ema kable lexibili y and gene aliza-
ion capabili ies by ex ac ing la en s uc u es di ec ly om
da a (Sil e e al.,2016). Ye , hey commonly lack he in-
e p e abili y, composi ionali y, and obus ness o e ed by
1
Uni e si y o Ams e dam. Co espondence o: Ma ius Cap a i
<m.cap a i@u a.nl>.
ICML 2025 Wo kshop on P og amma ic Rep esen a ions o Agen
Lea ning, Vancou e , Canada. Copy igh 2025 by he au ho (s).
classical symbolic o p og amma ic ep esen a ions, mak-
ing hei decisions di icul o e i y o sa ely deploy in
eal-wo ld scena ios.
Mo i a ed by hese complemen a y s eng hs, ecen wo k
explo es hyb id planning–lea ning sys ems (Bengio e al.,
2021). Ra he han eplacing symbolic sol e s, hese me h-
ods injec lea ning in o a ge ed componen s such as heu is-
ics, ewa d shaping, p og am syn hesis, o symbolic cos
unc ion, o blend s uc u ed easoning wi h da a-d i en
lexibili y.
In his wo k, we ocus on Mul i-Agen Pa h inding (MAPF),
a sequen ial decision-making ask ha equi es coo dina ing
mul iple agen s on a sha ed g aph o each goals wi hou
collisions. MAPF lies a he co e o many eal-wo ld sys-
ems, anging om au oma ed wa ehouse lee s and ai po
g ound ehicle coo dina ion o d one swa ms and mul i-
obo explo a ion—whe e e icien , collision- ee ou ing
ansla es di ec ly in o highe h oughpu , lowe ene gy con-
sump ion, and imp o ed sa e y. Classical MAPF me hods
o e clea gua an ees bu scale poo ly wi h inc easing com-
plexi y, whe eas heu is ic-based planne s scale be e bu
yield subop imal solu ions. P e ious e o s o enhance
MAPF sol e s h ough lea ning ha e p edominan ly mod-
i ied local planne decisions, such as agen p io i iza ion
(Zhang e al.,2022) o con lic esolu ion (Huang e al.,
2021). Howe e , hese local modi ica ions do no ully
le e age he global s uc u e o he unde lying ep esen a-
ion, po en ially limi ing solu ion quali y.
To add ess his, we p opose a global, di e en iable ep esen-
a ion lea ning me hod: we ain a neu al ne wo k o adjus
g aph edge weigh s such ha a as , heu is ic planne , P io i-
ized Planning (PP) (Sil e ,2005), p oduces solu ions close
o op imali y. Ou lea ned g aph ep esen a ion emains
s uc u ed and in e p e able, aligning wi h he b oade goal
o p og amma ic ep esen a ion. We use black-box di e en-
ia ion (Pogan
ˇ
ci
´
c e al.,2019) o p opaga e g adien s h ough
he non-di e en iable PP algo i hm, enabling end- o-end
lea ning o he g aph ep esen a ion wi hou comp omising
sol e e iciency.
Ou p ima y con ibu ions a e wo old. Fi s , we in oduce a
di e en iable ep esen a ion-lea ning amewo k ha glob-
1
Lea ned Rep esen a ions Enhance Mul i Agen Pa h Planning
ally eshapes MAPF ins ances by lea ning edge-cos ad-
jus men s a he han ocusing on local planne heu is ics
o p io i y decisions. Second, we demons a e empi ically
ha his app oach sh inks he op imali y gap o PP while
p ese ing i s compu a ional e iciency.
2. Rela ed Wo k
In eg a ing planning wi h machine lea ning o en in ol es
embedding symbolic sol e s in o neu al a chi ec u es. Fo
example, Value I e a ion Ne wo ks (Tama e al.,2016) in-
se a di e en iable alue-i e a ion module in o CNNs o
pe o m implici planning, and Neu al A* Sea ch (Yone ani
e al.,2021) lea ns heu is ic unc ions o a di e en iable A*
algo i hm—imp o ing in e p e abili y and gene aliza ion
o e eac i e policies.
P edic - hen-Op imize ains models o p edic p oblem pa-
ame e s (e.g., cos s, demands) ha a e passed unchanged
in o classical sol e s, wi h lea ning guided by downs eam
decision loss a he han aw accu acy (Elmach oub & G i-
gas,2022). In con as , ou me hod lea ns o al e he p ob-
lem ep esen a ion i sel so ha a as , subop imal planne
p oduces highe -quali y solu ions.
To enable end- o-end g adien low h ough inhe en ly dis-
c e e sol e s such as sho es -pa h, cons ain , and combi-
na o ial op imize s, esea che s employ con inuous elax-
a ions, pe u ba ion-based app oxima ions, o black-box
di e en ia ion (Be he e al.,2020;Pogan
ˇ
ci
´
c e al.,2019),
success ully in eg a ing sol e s in o lea ning pipelines o
anking (Rol
´
ınek e al.,2020) and g aph op imiza ion (Ka -
alias & Loukas,2020) wi hou undamen ally al e ing he
sol e i sel .
Wi hin Mul i-Agen Pa h Finding (MAPF), mos esea ch
in eg a ing lea ning has ocused on enhancing local sol e
componen s. Fo op imal sol e s, such as Con lic -Based
Sea ch (CBS) (Sha on e al.,2015), lea ning-based me hods
ha e imp o ed e iciency by guiding con lic esolu ion o
node selec ion (Huang e al.,2021). Fo heu is ic sol e s
like PP, neu al models ha e been success ully used o lea n
e ec i e agen p io i iza ions (Zhang e al.,2022). Simila ly,
La ge Neighbou hood Sea ch (LNS) app oaches le e age
lea ning o in elligen ly selec subse s o agen s o i e a i e
eplanning o o embed local sub-p oblems in o lea ned a -
chi ec u es (Li e al.,2021a;Huang e al.,2022;Yan & Wu,
2024). Decen alized ein o cemen lea ning me hods, such
as PRIMAL (Sa o e i e al.,2019), u he scale o la ge
scena ios bu o en comp omise op imali y and comple e-
ness gua an ees o scalabili y.
Despi e hese ad ancemen s, exis ing lea ning-augmen ed
MAPF app oaches p edominan ly a ge local decision com-
ponen s such as heu is ics o p io i y o de ing, a he han
al e ing he unde lying global ep esen a ion o he plan-
ning p oblem i sel . To ou knowledge, no p io wo k has
le e aged black-box di e en iable op imiza ion echniques
o globally eshape g aph-based ep esen a ions explici ly
o guide heu is ic MAPF planne s owa ds mo e op imal
solu ions.
3. Me hod
Ou wo k aims o b idge his gap by applying di e en-
iable op imiza ion o lea n s uc u ed g aph ep esen a ions,
hus globally in luencing heu is ic sol e s o enhance he
quali y o hei solu ions wi hou sac i icing compu a ional
e iciency.
We p opose a di e en iable lea ning amewo k o guiding
PP owa d nea -op imal solu ions by modi ying he edge
weigh s o he planning g aph. The p oposed sys em is
illus a ed in Figu e 1. A neu al ne wo k p edic s ins ance-
speci ic edge cos s, which a e used by PP o gene a e plans.
The ne wo k is ained o minimize he de ia ion om op i-
mal solu ions p oduced by EECBS (Li e al.,2021b), com-
bining PP’s e iciency wi h da a-d i en adap abili y. Since
he mapping om edge weigh s o sol e ou pu is piece-
wise cons an , s anda d backp opaga ion yields ze o g adi-
en s. To add ess his, we apply he black-box di e en ia ion
echnique om Pogan
ˇ
ci
´
c e al. (2019), which enables end-
o-end aining by app oxima ing in o ma i e g adien s ia
pe u bed planne e alua ions. The emainde o his sec ion
o malizes he MAPF se ing and PP, in oduces he su o-
ga e g adien , desc ibes he neu al cos model, and ou lines
he aining p ocedu e.
3.1. MAPF P oblem De ini ion
A MAPF ins ance is de ined on an undi ec ed g aph
G=
(V, E)
wi h
|V|=N
e ices (g id cells) and
|E|=M
edges be ween neighbou ing cells. Fo e e y e ex
∈V
,
we also include a sel -loop
{ , } ∈ E
o explici ly encode
wai ac ions.
We conside a se o
n
agen s
A={a1, . . . , an}
, each wi h
a s a e ex
si∈V
and a goal e ex
gi∈V
. Each agen
mo es o e disc e e ime s eps. Le
w∈RM
≥0
deno e a
non-nega i e ec o o edge cos s, indexed acco ding o he
edges in E.
A join plan is easible i i a oids any e ex con lic s— wo
agen s occupying he same e ex a he same imes ep—and
edge con lic s— wo agen s a e sing he same undi ec ed
edge in opposi e di ec ions simul aneously.
We u he de ine he edge-usage ec o y∈NM, whe e
ye=X
i,
1(
i, +1
i) = e(1)
coun s how many imes edge
e
is a e sed (including sel -
2
Lea ned Rep esen a ions Enhance Mul i Agen Pa h Planning
Figu e 1.
Di e en iable MAPF aining amewo k ha lea ns edge-cos adjus men s ia black-box g adien s om expe plan compa isons.
loops o wai s). Gi en he edge cos s
w
, he o al plan cos
is he sum o each edge’s cos mul iplied by i s usage
c(w, y) = X
e∈E
weye.(2)
Le
Y
be he se o all easible edge-usage ec o s (i.e.
collision- ee plans). Then he planne sol es he ollowing
disc e e op imiza ion p oblem:
y⋆(w) = a g min
y∈Y c(w, y).(3)
P io i ized planning sol es his p oblem heu is ically by
assigning a ixed p io i y o de o agen s and planning hei
pa hs sequen ially. Each agen compu es i s sho es pa h
using A* on a space– ime g aph, ea ing ese ed e ices
and edges om highe -p io i y agen s as dynamic obs acles
(Sil e ,2005). This de ines a de e minis ic mapping om
edge cos s
w
o a easible join plan
y(w)
. While PP is
as and scales well wi h he numbe o agen s, i s g eedy
s uc u e o en esul s in subop imal global solu ions. The
goal o his wo k is o lea n cos ec o s
w
such ha
y(w)
mo e closely app oxima es a globally op imal plan and hus
minimizes he o al sum-o -cos s.
3.2. Black-Box Di e en ia ion h ough PP
In ui i ely we can hink o PP as a mapping
w7→ y(w)
which is piecewise cons an , yielding ze o g adien s almos
e e ywhe e. To ob ain meaning ul lea ning signals, we
apply he con inuous pe u ba ion me hod o Pogan
ˇ
ci
´
c e al.
(2019). Speci ically, we de ine he ask loss
L(ˆy, y⋆)
as he
mean squa ed e o be ween he p edic ed (
ˆy
) and op imal
(y⋆)di ec ed edge-usage ec o s:
L(ˆy, y⋆) = 1
MX
e∈E
(ˆye−y⋆
e)2,(4)
whe e each di ec ed edge
e= (u, )
is ea ed dis inc ly
om i s e e se edge
( , u)
. Gi en a scala
λ > 0
, a pe -
u bed cos ec o is cons uc ed as:
w′=w+λ∂L
∂ˆy.(5)
E alua ing PP wi h
w′
yields a pe u bed solu ion
yλ=
y(w′), om which we compu e he su oga e g adien :
∇w λ(w) = −1
λ(ˆy−yλ).(6)
This app oxima ion enables backp opaga ion h ough he
planne wi h exac ly wo calls o PP and wi hou modi ying
i s in e nals.
3.3. Neu al Cos Shaping
We now in oduce a neu al ne wo k
Nθ
which maps ins ance
ea u es
x
( he map, s a ic obs acles, and agen s a /goal
pai s) in o a ec o o edge cos s w=Nθ(x).
G adien s ob ained ia
(6)
a e p opaga ed h ough
Nθ
o
upda e he pa ame e s
θ
. Ou aim is ha his p ocess en-
cou ages he ne wo k o in la e cos s along edges ha lead
o downs eam con lic s and o discoun hose ha p omo e
globally e icien pa hs.
Du ing planning, he p edic ed edge weigh s
w
a e used as
cos alues in he single-agen A* sea ches pe o med by
PP. Howe e , o p ese e he alidi y o he ese a ion able
which encodes blocked edges and e ices o e ime we e-
ain he o iginal g aph cos s o de e mine a e sal du a ions.
We employ a ue dis ance heu is ic in A*, compu ed as he
sho es -pa h dis ance om each agen ’s s a o i s goal on
he obs acle- ee g aph: his heu is ic is admissible (i ne e
o e es ima es he ue cos ). Du ing aining, we ecompu e
his heu is ic unde he cu en lea ned weigh s o ensu e i
emains admissible ela i e o he modi ied cos unc ion.
This main ains consis ency wi h he unde lying en i onmen
dynamics while allowing he lea ned weigh s o in luence
he planne ’s ou e p e e ences.
3.4. T aining
T aining is pe o med on scena ios—se s o
n
agen
s a –goal pai s de ined on he same map as ou lined in
Algo i hm 1. Each mini-ba ch consis s o
B
independen
scena ios. Fo each scena io, edge cos s a e p edic ed, he
plan is compu ed ia PP, a pe u bed plan is ob ained, and
he su oga e g adien is applied.
3
Lea ned Rep esen a ions Enhance Mul i Agen Pa h Planning
Algo i hm 1 One aining epoch wi h di e en iable PP
Inpu : Mini-ba ch
{xj, y⋆
j}B
j=1
, smoo hing pa ame e
λ
Ou pu : Upda ed ne wo k pa ame e s θ
o j= 1 o Bdo
wj← Nθ(xj){p edic edge cos s}
ˆyj←PP(wj){ o wa d pass}
w′
j←wj+λ ∂L/∂ˆyj
yλ,j ←PP(w′
j){pe u bed pass}
∇wj← −(ˆyj−yλ,j)/λ
end o
Back-p opaga e {∇wj}and upda e θ
The model is ained o
T
epochs using andomly sampled
aining scena ios. E alua ion is pe o med on held-ou
s a /goal con igu a ions d awn om he same map. Gene -
aliza ion o new maps is le o u u e wo k.
4. Expe imen s
We e alua e ou me hod on scena ios om he
andom-32-32-20
map, which is pa o a s an-
da d benchma k om he MAPF de ini ions and a ian s
sui e (S e n e al.,2019). We conside ins ances wi h
n∈ {50,75,100}
agen s o assess pe o mance ac oss
a ying le els o conges ion and complexi y.
T aining Se up. We use Enhanced Edge-Con lic Based
Sea ch (EECBS) (Li e al.,2021b), a CBS a ian wi h edge
cons ain s and ocal sea ch, o gene a e expe plans and
hei edge-usage ec o s
ˆy
o loss compu a ion. We employ
a sub op imali y bound o 1.0 o 50-agen scena ios and
1.05 o mo e challenging ones, ading o op imali y and
un ime. On a 36-co e machine, we we e able o gene a e
solu ions o all 500 aining scena ios wi hin a o al ime
budge o app oxima ely 1 hou .
We use 500 aining and 25 es scena ios (20 % o aining
o alida ion). The edge p edic o is ained wi h ou ea lie
loss, ma ching PP’s di ec ed edge-usage o EECBS, s a ing
om uni o m cos s o 1. Agen s a e o de ed by ascending
ue dis ance so sho e -pa h agen s plan i s ; i PP ails, we
ix a single andom pe mu a ion. The same o de ing is used
h oughou aining ( o
ˆy
) and es ing o a ai compa ison.
Model A chi ec u es and Op imiza ion. We compa e
wo a ian s o ou di e en iable MAPF amewo k. The
Edge model main ains a lea nable scala cos o each g aph
edge (ini ialized o he o iginal uni weigh alues o 1.0) and
igno es s a /goal inpu s, di ec ly p edic ing he ull edge-
weigh ec o . The GNN model ins ead compu es edge
weigh s ia a wo-laye G aphSAGE ne wo k (Hamil on
e al.,2017) o e he g id: node ea u es encode a simple de-
mand signal (s a s s. goals), sinusoidal posi ional embed-
dings, and a small agen ID embedding; a e message pass-
ing, each edge’s weigh is p oduced by a ligh weigh MLP
o e i s wo endpoin embeddings. Bo h models a e ained
end- o-end using he desc ibed loss wi h Adam (lea ning
a e 5e−4) and a λ= 1.0.
E alua ion. Model selec ion uses alida ion delay (PP’s
sum-o -cos s minus he collision- ee lowe bound). The
model checkpoin ha achie es he lowes alida ion delay
is selec ed as he inal model. The selec ed model is hen
e alua ed on 25 held-ou es scena ios, epo ing delay and
op imali y gap ela i e o EECBS.
Table 1.
A e age delay and op imali y gap ( ela i e o EECBS) on
25 unseen es scena ios. Lowe is be e .
MAP
AGENTS AVERAGE DELAY (% GAP)EECBS
PP (ORIG.) PP (EDGE) PP (GNN)
RANDOM
50 56.20 (134.6%) 52.96 (121.0%) 51.36 (114.3%) 23.96
75 163.08 (141.8%) 151.68 (124.9%) 149.64 (121.9%) 67.44
100 354.20 (118.9%) 333.44 (106.1%) 317.12 (95.9%) 161.80
Discussion. Expe imen s on he
andom-32-32-20
map (Table 1) demons a e ha lea ning a s uc u ed
g aph ep esen a ion can imp o e he solu ion quali y o
a as heu is ic planne wi hou sac i icing i s e iciency.
Ac oss all agen coun s
(50,75,100)
, bo h lea ned mod-
els—di ec pe -edge pa ame e s and he GNN-based p edic-
o —consis en ly educe he a e age delay o PP ela i e o
he o iginal uni o m-cos baseline. The GNN model yields
he la ges gains highligh ing he alue o condi ioning edge
cos s on global con ex and agen in e ac ions.
Impo an ly, hese bene i s incu minimal o e head. PP
alone akes 0.0183s e sus 0.0263s wi h ou lea ned model
(+0.0080s), since lea ned weigh s don’ al e PP’s inne loop,
p ese ing scalabili y o eal- ime sys ems.
5. Conclusion
In his wo k, we combine black-box di e en iable plan-
ning wi h lea ned edge-cos shaping o enhance an exis ing
MAPF heu is ic sol e , o e ing a p ac ical b idge be ween
s uc u ed planning and da a-d i en adap abili y.
While ou expe imen s demons a e gains on a single map
opology using EECBS demons a ions, u u e ex ensions
should e alua e he amewo k on di e se g aph s uc u es
o assess gene aliza ion and de elop expe - ee app oaches
such as ein o cemen lea ning o educe eliance on cos ly
expe demons a ions.
4
Lea ned Rep esen a ions Enhance Mul i Agen Pa h Planning
Acknowledgemen s
Funded by
he Eu opean Union
This esea ch p ojec is pa o he
AI4REALNET p ojec . AI4REALNET
has ecei ed unding om Eu opean
Union’s Ho izon Eu ope Resea ch and
Inno a ion p og amme unde he G an
Ag eemen No 101119527. Views and opinions exp essed
a e howe e hose o he au ho (s) only and do no nec-
essa ily e lec hose o he Eu opean Union. Nei he he
Eu opean Union no he g an ing au ho i y can be held e-
sponsible o hem.
Re e ences
Bengio, Y., Lodi, A., and P ou os , A. Machine lea ning
o combina o ial op imiza ion: a me hodological ou
d’ho izon. Eu opean Jou nal o Ope a ional Resea ch,
290(2):405–421, 2021.
Be he , Q., Blondel, M., Teboul, O., Cu u i, M., Ve , J.-
P., and Bach, F. Lea ning wi h di e en iable pe ubed
op imize s. Ad ances in neu al in o ma ion p ocessing
sys ems, 33:9508–9519, 2020.
Elmach oub, A. N. and G igas, P. Sma “p edic , hen
op imize”. Managemen Science, 68(1):9–26, 2022.
Hamil on, W., Ying, Z., and Lesko ec, J. Induc i e ep e-
sen a ion lea ning on la ge g aphs. Ad ances in neu al
in o ma ion p ocessing sys ems, 30, 2017.
Huang, T., Koenig, S., and Dilkina, B. Lea ning o esol e
con lic s o mul i-agen pa h inding wi h con lic -based
sea ch. In P oceedings o he AAAI con e ence on a i i-
cial in elligence, olume 35, pp. 11246–11253, 2021.
Huang, T., Li, J., Koenig, S., and Dilkina, B. Any ime mul i-
agen pa h inding ia machine lea ning-guided la ge
neighbo hood sea ch. In P oceedings o he AAAI Con e -
ence on A i icial In elligence, olume 36, pp. 9368–9376,
2022.
Ka alias, N. and Loukas, A. E dos goes neu al: an unsupe -
ised lea ning amewo k o combina o ial op imiza ion
on g aphs. Ad ances in Neu al In o ma ion P ocessing
Sys ems, 33:6659–6672, 2020.
Li, J., Chen, Z., Ha abo , D., S uckey, P. J., and Koenig, S.
Any ime mul i-agen pa h inding ia la ge neighbo hood
sea ch. In In e na ional Join Con e ence on A i icial
In elligence 2021, pp. 4127–4135. Associa ion o he
Ad ancemen o A i icial In elligence (AAAI), 2021a.
Li, J., Ruml, W., and Koenig, S. Eecbs: A bounded-
subop imal sea ch o mul i-agen pa h inding. In P o-
ceedings o he AAAI con e ence on a i icial in elligence,
olume 35, pp. 12353–12362, 2021b.
Pogan
ˇ
ci
´
c, M. V., Paulus, A., Musil, V., Ma ius, G., and
Rolinek, M. Di e en ia ion o blackbox combina o ial
sol e s. In In e na ional Con e ence on Lea ning Rep e-
sen a ions, 2019.
Rol
´
ınek, M., Musil, V., Paulus, A., Vlas elica, M., Michaelis,
C., and Ma ius, G. Op imizing ank-based me ics wi h
blackbox di e en ia ion. In P oceedings o he IEEE/CVF
Con e ence on Compu e Vision and Pa e n Recogni ion,
pp. 7620–7630, 2020.
Sa o e i, G., Ke , J., Shi, Y., Wagne , G., Kuma , T. S.,
Koenig, S., and Chose , H. P imal: Pa h inding ia e-
in o cemen and imi a ion mul i-agen lea ning. IEEE
Robo ics and Au oma ion Le e s, 4(3):2378–2385, 2019.
Sha on, G., S e n, R., Felne , A., and S u e an , N. R.
Con lic -based sea ch o op imal mul i-agen pa h inding.
A i icial in elligence, 219:40–66, 2015.
Sil e , D. Coope a i e pa h inding. In P oceedings o he
aaai con e ence on a i icial in elligence and in e ac i e
digi al en e ainmen , olume 1, pp. 117–122, 2005.
Sil e , D., Huang, A., Maddison, C. J., Guez, A., Si e, L.,
Van Den D iessche, G., Sch i wiese , J., An onoglou, I.,
Pannee shel am, V., Lanc o , M., e al. Mas e ing he
game o go wi h deep neu al ne wo ks and ee sea ch.
na u e, 529(7587):484–489, 2016.
S e n, R., S u e an , N., Felne , A., Koenig, S., Ma, H.,
Walke , T., Li, J., A zmon, D., Cohen, L., Kuma , T., e al.
Mul i-agen pa h inding: De ini ions, a ian s, and bench-
ma ks. In P oceedings o he In e na ional Symposium
on Combina o ial Sea ch, olume 10, pp. 151–158, 2019.
Tama , A., Wu, Y., Thomas, G., Le ine, S., and Abbeel, P.
Value i e a ion ne wo ks. Ad ances in neu al in o ma ion
p ocessing sys ems, 29, 2016.
Yan, Z. and Wu, C. Neu al neighbo hood sea ch o mul i-
agen pa h inding. In The Twel h In e na ional Con e -
ence on Lea ning Rep esen a ions, 2024.
Yone ani, R., Taniai, T., Ba eka ain, M., Nishimu a, M., and
Kanezaki, A. Pa h planning using neu al a* sea ch. In
In e na ional con e ence on machine lea ning, pp. 12029–
12039. PMLR, 2021.
Zhang, S., Li, J., Huang, T., Koenig, S., and Dilkina, B.
Lea ning a p io i y o de ing o p io i ized planning in
mul i-agen pa h inding. In P oceedings o he In e na-
ional Symposium on Combina o ial Sea ch, olume 15,
pp. 208–216, 2022.
5