Learned Representations Enhance Multi Agent Path Planning

Author: van Hoof, Herke

Publisher: Zenodo

DOI: 10.5281/zenodo.17303674

Source: https://zenodo.org/records/17303674/files/8_Learned_Representations_Enha.pdf

Lea ned Rep esen a ions Enhance
Mul i Agen Pa h Planning
Ma ius Cap a i 1He ke an Hoo 1
Abs ac
Mul i-Agen Pa h inding (MAPF) in ol es coo di-
na ing mul iple agen s o ind collision- ee pa hs
in a sha ed en i onmen . Fo la ge-scale ins ances,
sub-op imal heu is ics can be used ha a e ei-
he hand-c a ed o lea ned om da a. In his
pape , we a emp o combine hese app oaches
by aining a neu al ne wo k o modi y p oblem
ep esen a ions such ha P io i ized Planning, a
con en ional heu is ic sol e , will p oduce close -
o-op imal solu ions. The eby, we can le e age
he s ong pe o mance o exis ing heu is ics wi h
he lexibili y o da a-d i en algo i hms. T aining
he neu al ne wo k equi es p opaga ing lea n-
ing signals h ough p io i ized planning. This is
achie ed by calcula ing g adien s o a elaxa ion
o he algo i hm using a black-box di e en ia ion
app oach. Expe imen s on s anda d MAPF bench-
ma ks demons a e ha ou app oach educes PP’s
op imali y gap wi hou signi ican ly comp omis-
ing compu a ional e iciency.
1. In oduc ion
In eg a ing planning algo i hms wi h lea ning-based me h-
ods o e s a p omising s a egy o add essing complex
decision-making challenges. Classical planne s such as
cons ain sol e s, g aph sea ch algo i hms, and symbolic
planne s, o e s ong heo e ical gua an ees, in e p e abili y,
and obus ness de i ed om hei s uc u ed, ule-based ep-
esen a ions. Howe e , hese me hods o en equi e de ailed
p oblem o mula ions, po en ially limi ing hei adap abili y
in dynamic en i onmen s and ail o scale o mo e ealis-
ic scena ios. On he o he hand, lea ning-based me hods
ha e demons a ed ema kable lexibili y and gene aliza-
ion capabili ies by ex ac ing la en s uc u es di ec ly om
da a (Sil e e al.,2016). Ye , hey commonly lack he in-
e p e abili y, composi ionali y, and obus ness o e ed by
1
Uni e si y o Ams e dam. Co espondence o: Ma ius Cap a i
<m.cap a i@u a.nl>.
ICML 2025 Wo kshop on P og amma ic Rep esen a ions o Agen
Lea ning, Vancou e , Canada. Copy igh 2025 by he au ho (s).
classical symbolic o p og amma ic ep esen a ions, mak-
ing hei decisions di icul o e i y o sa ely deploy in
eal-wo ld scena ios.
Mo i a ed by hese complemen a y s eng hs, ecen wo k
explo es hyb id planning–lea ning sys ems (Bengio e al.,
2021). Ra he han eplacing symbolic sol e s, hese me h-
ods injec lea ning in o a ge ed componen s such as heu is-
ics, ewa d shaping, p og am syn hesis, o symbolic cos
unc ion, o blend s uc u ed easoning wi h da a-d i en
lexibili y.
In his wo k, we ocus on Mul i-Agen Pa h inding (MAPF),
a sequen ial decision-making ask ha equi es coo dina ing
mul iple agen s on a sha ed g aph o each goals wi hou
collisions. MAPF lies a he co e o many eal-wo ld sys-
ems, anging om au oma ed wa ehouse lee s and ai po
g ound ehicle coo dina ion o d one swa ms and mul i-
obo explo a ion—whe e e icien , collision- ee ou ing
ansla es di ec ly in o highe h oughpu , lowe ene gy con-
sump ion, and imp o ed sa e y. Classical MAPF me hods
o e clea gua an ees bu scale poo ly wi h inc easing com-
plexi y, whe eas heu is ic-based planne s scale be e bu
yield subop imal solu ions. P e ious e o s o enhance
MAPF sol e s h ough lea ning ha e p edominan ly mod-
i ied local planne decisions, such as agen p io i iza ion
(Zhang e al.,2022) o con lic esolu ion (Huang e al.,
2021). Howe e , hese local modi ica ions do no ully
le e age he global s uc u e o he unde lying ep esen a-
ion, po en ially limi ing solu ion quali y.
To add ess his, we p opose a global, di e en iable ep esen-
a ion lea ning me hod: we ain a neu al ne wo k o adjus
g aph edge weigh s such ha a as , heu is ic planne , P io i-
ized Planning (PP) (Sil e ,2005), p oduces solu ions close
o op imali y. Ou lea ned g aph ep esen a ion emains
s uc u ed and in e p e able, aligning wi h he b oade goal
o p og amma ic ep esen a ion. We use black-box di e en-
ia ion (Pogan
ˇ
ci
´
c e al.,2019) o p opaga e g adien s h ough
he non-di e en iable PP algo i hm, enabling end- o-end
lea ning o he g aph ep esen a ion wi hou comp omising
sol e e iciency.
Ou p ima y con ibu ions a e wo old. Fi s , we in oduce a
di e en iable ep esen a ion-lea ning amewo k ha glob-
1
Lea ned Rep esen a ions Enhance Mul i Agen Pa h Planning
ally eshapes MAPF ins ances by lea ning edge-cos ad-
jus men s a he han ocusing on local planne heu is ics
o p io i y decisions. Second, we demons a e empi ically
ha his app oach sh inks he op imali y gap o PP while
p ese ing i s compu a ional e iciency.
2. Rela ed Wo k
In eg a ing planning wi h machine lea ning o en in ol es
embedding symbolic sol e s in o neu al a chi ec u es. Fo
example, Value I e a ion Ne wo ks (Tama e al.,2016) in-
se a di e en iable alue-i e a ion module in o CNNs o
pe o m implici planning, and Neu al A* Sea ch (Yone ani
e al.,2021) lea ns heu is ic unc ions o a di e en iable A*
algo i hm—imp o ing in e p e abili y and gene aliza ion
o e eac i e policies.
P edic - hen-Op imize ains models o p edic p oblem pa-
ame e s (e.g., cos s, demands) ha a e passed unchanged
in o classical sol e s, wi h lea ning guided by downs eam
decision loss a he han aw accu acy (Elmach oub & G i-
gas,2022). In con as , ou me hod lea ns o al e he p ob-
lem ep esen a ion i sel so ha a as , subop imal planne
p oduces highe -quali y solu ions.
To enable end- o-end g adien low h ough inhe en ly dis-
c e e sol e s such as sho es -pa h, cons ain , and combi-
na o ial op imize s, esea che s employ con inuous elax-
a ions, pe u ba ion-based app oxima ions, o black-box
di e en ia ion (Be he e al.,2020;Pogan
ˇ
ci
´
c e al.,2019),
success ully in eg a ing sol e s in o lea ning pipelines o
anking (Rol
´
ınek e al.,2020) and g aph op imiza ion (Ka -
alias & Loukas,2020) wi hou undamen ally al e ing he
sol e i sel .
Wi hin Mul i-Agen Pa h Finding (MAPF), mos esea ch
in eg a ing lea ning has ocused on enhancing local sol e
componen s. Fo op imal sol e s, such as Con lic -Based
Sea ch (CBS) (Sha on e al.,2015), lea ning-based me hods
ha e imp o ed e iciency by guiding con lic esolu ion o
node selec ion (Huang e al.,2021). Fo heu is ic sol e s
like PP, neu al models ha e been success ully used o lea n
e ec i e agen p io i iza ions (Zhang e al.,2022). Simila ly,
La ge Neighbou hood Sea ch (LNS) app oaches le e age
lea ning o in elligen ly selec subse s o agen s o i e a i e
eplanning o o embed local sub-p oblems in o lea ned a -
chi ec u es (Li e al.,2021a;Huang e al.,2022;Yan & Wu,
2024). Decen alized ein o cemen lea ning me hods, such
as PRIMAL (Sa o e i e al.,2019), u he scale o la ge
scena ios bu o en comp omise op imali y and comple e-
ness gua an ees o scalabili y.
Despi e hese ad ancemen s, exis ing lea ning-augmen ed
MAPF app oaches p edominan ly a ge local decision com-
ponen s such as heu is ics o p io i y o de ing, a he han
al e ing he unde lying global ep esen a ion o he plan-
ning p oblem i sel . To ou knowledge, no p io wo k has
le e aged black-box di e en iable op imiza ion echniques
o globally eshape g aph-based ep esen a ions explici ly
o guide heu is ic MAPF planne s owa ds mo e op imal
solu ions.
3. Me hod
Ou wo k aims o b idge his gap by applying di e en-
iable op imiza ion o lea n s uc u ed g aph ep esen a ions,
hus globally in luencing heu is ic sol e s o enhance he
quali y o hei solu ions wi hou sac i icing compu a ional
e iciency.
We p opose a di e en iable lea ning amewo k o guiding
PP owa d nea -op imal solu ions by modi ying he edge
weigh s o he planning g aph. The p oposed sys em is
illus a ed in Figu e 1. A neu al ne wo k p edic s ins ance-
speci ic edge cos s, which a e used by PP o gene a e plans.
The ne wo k is ained o minimize he de ia ion om op i-
mal solu ions p oduced by EECBS (Li e al.,2021b), com-
bining PP’s e iciency wi h da a-d i en adap abili y. Since
he mapping om edge weigh s o sol e ou pu is piece-
wise cons an , s anda d backp opaga ion yields ze o g adi-
en s. To add ess his, we apply he black-box di e en ia ion
echnique om Pogan
ˇ
ci
´
c e al. (2019), which enables end-
o-end aining by app oxima ing in o ma i e g adien s ia
pe u bed planne e alua ions. The emainde o his sec ion
o malizes he MAPF se ing and PP, in oduces he su o-
ga e g adien , desc ibes he neu al cos model, and ou lines
he aining p ocedu e.
3.1. MAPF P oblem De ini ion
A MAPF ins ance is de ined on an undi ec ed g aph
G=
(V, E)
wi h
|V|=N
e ices (g id cells) and
|E|=M
edges be ween neighbou ing cells. Fo e e y e ex
∈V
,
we also include a sel -loop
{ , } ∈ E
o explici ly encode
wai ac ions.
We conside a se o
n
agen s
A={a1, . . . , an}
, each wi h
a s a e ex
si∈V
and a goal e ex
gi∈V
. Each agen
mo es o e disc e e ime s eps. Le
w∈RM
≥0
deno e a
non-nega i e ec o o edge cos s, indexed acco ding o he
edges in E.
A join plan is easible i i a oids any e ex con lic s— wo
agen s occupying he same e ex a he same imes ep—and
edge con lic s— wo agen s a e sing he same undi ec ed
edge in opposi e di ec ions simul aneously.
We u he de ine he edge-usage ec o y∈NM, whe e
ye=X
i,
1(
i, +1
i) = e(1)
coun s how many imes edge
e
is a e sed (including sel -
2
Lea ned Rep esen a ions Enhance Mul i Agen Pa h Planning
Figu e 1.
Di e en iable MAPF aining amewo k ha lea ns edge-cos adjus men s ia black-box g adien s om expe plan compa isons.
loops o wai s). Gi en he edge cos s
w
, he o al plan cos
is he sum o each edge’s cos mul iplied by i s usage
c(w, y) = X
e∈E
weye.(2)
Le
Y
be he se o all easible edge-usage ec o s (i.e.
collision- ee plans). Then he planne sol es he ollowing
disc e e op imiza ion p oblem:
y⋆(w) = a g min
y∈Y c(w, y).(3)
P io i ized planning sol es his p oblem heu is ically by
assigning a ixed p io i y o de o agen s and planning hei
pa hs sequen ially. Each agen compu es i s sho es pa h
using A* on a space– ime g aph, ea ing ese ed e ices
and edges om highe -p io i y agen s as dynamic obs acles
(Sil e ,2005). This de ines a de e minis ic mapping om
edge cos s
w
o a easible join plan
y(w)
. While PP is
as and scales well wi h he numbe o agen s, i s g eedy
s uc u e o en esul s in subop imal global solu ions. The
goal o his wo k is o lea n cos ec o s
w
such ha
y(w)
mo e closely app oxima es a globally op imal plan and hus
minimizes he o al sum-o -cos s.
3.2. Black-Box Di e en ia ion h ough PP
In ui i ely we can hink o PP as a mapping
w7→ y(w)
which is piecewise cons an , yielding ze o g adien s almos
e e ywhe e. To ob ain meaning ul lea ning signals, we
apply he con inuous pe u ba ion me hod o Pogan
ˇ
ci
´
c e al.
(2019). Speci ically, we de ine he ask loss
L(ˆy, y⋆)
as he
mean squa ed e o be ween he p edic ed (
ˆy
) and op imal
(y⋆)di ec ed edge-usage ec o s:
L(ˆy, y⋆) = 1
MX
e∈E
(ˆye−y⋆
e)2,(4)
whe e each di ec ed edge
e= (u, )
is ea ed dis inc ly
om i s e e se edge
( , u)
. Gi en a scala
λ > 0
, a pe -
u bed cos ec o is cons uc ed as:
w′=w+λ∂L
∂ˆy.(5)
E alua ing PP wi h
w′
yields a pe u bed solu ion
yλ=
y(w′), om which we compu e he su oga e g adien :
∇w λ(w) = −1
λ(ˆy−yλ).(6)
This app oxima ion enables backp opaga ion h ough he
planne wi h exac ly wo calls o PP and wi hou modi ying
i s in e nals.
3.3. Neu al Cos Shaping
We now in oduce a neu al ne wo k
Nθ
which maps ins ance
ea u es
x
( he map, s a ic obs acles, and agen s a /goal
pai s) in o a ec o o edge cos s w=Nθ(x).
G adien s ob ained ia
(6)
a e p opaga ed h ough
Nθ
o
upda e he pa ame e s
θ
. Ou aim is ha his p ocess en-
cou ages he ne wo k o in la e cos s along edges ha lead
o downs eam con lic s and o discoun hose ha p omo e
globally e icien pa hs.
Du ing planning, he p edic ed edge weigh s
w
a e used as
cos alues in he single-agen A* sea ches pe o med by
PP. Howe e , o p ese e he alidi y o he ese a ion able
which encodes blocked edges and e ices o e ime we e-
ain he o iginal g aph cos s o de e mine a e sal du a ions.
We employ a ue dis ance heu is ic in A*, compu ed as he
sho es -pa h dis ance om each agen ’s s a o i s goal on
he obs acle- ee g aph: his heu is ic is admissible (i ne e
o e es ima es he ue cos ). Du ing aining, we ecompu e
his heu is ic unde he cu en lea ned weigh s o ensu e i
emains admissible ela i e o he modi ied cos unc ion.
This main ains consis ency wi h he unde lying en i onmen
dynamics while allowing he lea ned weigh s o in luence
he planne ’s ou e p e e ences.
3.4. T aining
T aining is pe o med on scena ios—se s o
n
agen
s a –goal pai s de ined on he same map as ou lined in
Algo i hm 1. Each mini-ba ch consis s o
B
independen
scena ios. Fo each scena io, edge cos s a e p edic ed, he
plan is compu ed ia PP, a pe u bed plan is ob ained, and
he su oga e g adien is applied.
3
Lea ned Rep esen a ions Enhance Mul i Agen Pa h Planning
Algo i hm 1 One aining epoch wi h di e en iable PP
Inpu : Mini-ba ch
{xj, y⋆
j}B
j=1
, smoo hing pa ame e
λ
Ou pu : Upda ed ne wo k pa ame e s θ
o j= 1 o Bdo
wj← Nθ(xj){p edic edge cos s}
ˆyj←PP(wj){ o wa d pass}
w′
j←wj+λ ∂L/∂ˆyj
yλ,j ←PP(w′
j){pe u bed pass}
∇wj← −(ˆyj−yλ,j)/λ
end o
Back-p opaga e {∇wj}and upda e θ
The model is ained o
T
epochs using andomly sampled
aining scena ios. E alua ion is pe o med on held-ou
s a /goal con igu a ions d awn om he same map. Gene -
aliza ion o new maps is le o u u e wo k.
4. Expe imen s
We e alua e ou me hod on scena ios om he
andom-32-32-20
map, which is pa o a s an-
da d benchma k om he MAPF de ini ions and a ian s
sui e (S e n e al.,2019). We conside ins ances wi h
n∈ {50,75,100}
agen s o assess pe o mance ac oss
a ying le els o conges ion and complexi y.
T aining Se up. We use Enhanced Edge-Con lic Based
Sea ch (EECBS) (Li e al.,2021b), a CBS a ian wi h edge
cons ain s and ocal sea ch, o gene a e expe plans and
hei edge-usage ec o s
ˆy
o loss compu a ion. We employ
a sub op imali y bound o 1.0 o 50-agen scena ios and
1.05 o mo e challenging ones, ading o op imali y and
un ime. On a 36-co e machine, we we e able o gene a e
solu ions o all 500 aining scena ios wi hin a o al ime
budge o app oxima ely 1 hou .
We use 500 aining and 25 es scena ios (20 % o aining
o alida ion). The edge p edic o is ained wi h ou ea lie
loss, ma ching PP’s di ec ed edge-usage o EECBS, s a ing
om uni o m cos s o 1. Agen s a e o de ed by ascending
ue dis ance so sho e -pa h agen s plan i s ; i PP ails, we
ix a single andom pe mu a ion. The same o de ing is used
h oughou aining ( o
ˆy
) and es ing o a ai compa ison.
Model A chi ec u es and Op imiza ion. We compa e
wo a ian s o ou di e en iable MAPF amewo k. The
Edge model main ains a lea nable scala cos o each g aph
edge (ini ialized o he o iginal uni weigh alues o 1.0) and
igno es s a /goal inpu s, di ec ly p edic ing he ull edge-
weigh ec o . The GNN model ins ead compu es edge
weigh s ia a wo-laye G aphSAGE ne wo k (Hamil on
e al.,2017) o e he g id: node ea u es encode a simple de-
mand signal (s a s s. goals), sinusoidal posi ional embed-
dings, and a small agen ID embedding; a e message pass-
ing, each edge’s weigh is p oduced by a ligh weigh MLP
o e i s wo endpoin embeddings. Bo h models a e ained
end- o-end using he desc ibed loss wi h Adam (lea ning
a e 5e−4) and a λ= 1.0.
E alua ion. Model selec ion uses alida ion delay (PP’s
sum-o -cos s minus he collision- ee lowe bound). The
model checkpoin ha achie es he lowes alida ion delay
is selec ed as he inal model. The selec ed model is hen
e alua ed on 25 held-ou es scena ios, epo ing delay and
op imali y gap ela i e o EECBS.
Table 1.
A e age delay and op imali y gap ( ela i e o EECBS) on
25 unseen es scena ios. Lowe is be e .
MAP
AGENTS AVERAGE DELAY (% GAP)EECBS
PP (ORIG.) PP (EDGE) PP (GNN)
RANDOM
50 56.20 (134.6%) 52.96 (121.0%) 51.36 (114.3%) 23.96
75 163.08 (141.8%) 151.68 (124.9%) 149.64 (121.9%) 67.44
100 354.20 (118.9%) 333.44 (106.1%) 317.12 (95.9%) 161.80
Discussion. Expe imen s on he
andom-32-32-20
map (Table 1) demons a e ha lea ning a s uc u ed
g aph ep esen a ion can imp o e he solu ion quali y o
a as heu is ic planne wi hou sac i icing i s e iciency.
Ac oss all agen coun s
(50,75,100)
, bo h lea ned mod-
els—di ec pe -edge pa ame e s and he GNN-based p edic-
o —consis en ly educe he a e age delay o PP ela i e o
he o iginal uni o m-cos baseline. The GNN model yields
he la ges gains highligh ing he alue o condi ioning edge
cos s on global con ex and agen in e ac ions.
Impo an ly, hese bene i s incu minimal o e head. PP
alone akes 0.0183s e sus 0.0263s wi h ou lea ned model
(+0.0080s), since lea ned weigh s don’ al e PP’s inne loop,
p ese ing scalabili y o eal- ime sys ems.
5. Conclusion
In his wo k, we combine black-box di e en iable plan-
ning wi h lea ned edge-cos shaping o enhance an exis ing
MAPF heu is ic sol e , o e ing a p ac ical b idge be ween
s uc u ed planning and da a-d i en adap abili y.
While ou expe imen s demons a e gains on a single map
opology using EECBS demons a ions, u u e ex ensions
should e alua e he amewo k on di e se g aph s uc u es
o assess gene aliza ion and de elop expe - ee app oaches
such as ein o cemen lea ning o educe eliance on cos ly
expe demons a ions.
4
Lea ned Rep esen a ions Enhance Mul i Agen Pa h Planning
Acknowledgemen s
Funded by
he Eu opean Union
This esea ch p ojec is pa o he
AI4REALNET p ojec . AI4REALNET
has ecei ed unding om Eu opean
Union’s Ho izon Eu ope Resea ch and
Inno a ion p og amme unde he G an
Ag eemen No 101119527. Views and opinions exp essed
a e howe e hose o he au ho (s) only and do no nec-
essa ily e lec hose o he Eu opean Union. Nei he he
Eu opean Union no he g an ing au ho i y can be held e-
sponsible o hem.
Re e ences
Bengio, Y., Lodi, A., and P ou os , A. Machine lea ning
o combina o ial op imiza ion: a me hodological ou
d’ho izon. Eu opean Jou nal o Ope a ional Resea ch,
290(2):405–421, 2021.
Be he , Q., Blondel, M., Teboul, O., Cu u i, M., Ve , J.-
P., and Bach, F. Lea ning wi h di e en iable pe ubed
op imize s. Ad ances in neu al in o ma ion p ocessing
sys ems, 33:9508–9519, 2020.
Elmach oub, A. N. and G igas, P. Sma “p edic , hen
op imize”. Managemen Science, 68(1):9–26, 2022.
Hamil on, W., Ying, Z., and Lesko ec, J. Induc i e ep e-
sen a ion lea ning on la ge g aphs. Ad ances in neu al
in o ma ion p ocessing sys ems, 30, 2017.
Huang, T., Koenig, S., and Dilkina, B. Lea ning o esol e
con lic s o mul i-agen pa h inding wi h con lic -based
sea ch. In P oceedings o he AAAI con e ence on a i i-
cial in elligence, olume 35, pp. 11246–11253, 2021.
Huang, T., Li, J., Koenig, S., and Dilkina, B. Any ime mul i-
agen pa h inding ia machine lea ning-guided la ge
neighbo hood sea ch. In P oceedings o he AAAI Con e -
ence on A i icial In elligence, olume 36, pp. 9368–9376,
2022.
Ka alias, N. and Loukas, A. E dos goes neu al: an unsupe -
ised lea ning amewo k o combina o ial op imiza ion
on g aphs. Ad ances in Neu al In o ma ion P ocessing
Sys ems, 33:6659–6672, 2020.
Li, J., Chen, Z., Ha abo , D., S uckey, P. J., and Koenig, S.
Any ime mul i-agen pa h inding ia la ge neighbo hood
sea ch. In In e na ional Join Con e ence on A i icial
In elligence 2021, pp. 4127–4135. Associa ion o he
Ad ancemen o A i icial In elligence (AAAI), 2021a.
Li, J., Ruml, W., and Koenig, S. Eecbs: A bounded-
subop imal sea ch o mul i-agen pa h inding. In P o-
ceedings o he AAAI con e ence on a i icial in elligence,
olume 35, pp. 12353–12362, 2021b.
Pogan
ˇ
ci
´
c, M. V., Paulus, A., Musil, V., Ma ius, G., and
Rolinek, M. Di e en ia ion o blackbox combina o ial
sol e s. In In e na ional Con e ence on Lea ning Rep e-
sen a ions, 2019.
Rol
´
ınek, M., Musil, V., Paulus, A., Vlas elica, M., Michaelis,
C., and Ma ius, G. Op imizing ank-based me ics wi h
blackbox di e en ia ion. In P oceedings o he IEEE/CVF
Con e ence on Compu e Vision and Pa e n Recogni ion,
pp. 7620–7630, 2020.
Sa o e i, G., Ke , J., Shi, Y., Wagne , G., Kuma , T. S.,
Koenig, S., and Chose , H. P imal: Pa h inding ia e-
in o cemen and imi a ion mul i-agen lea ning. IEEE
Robo ics and Au oma ion Le e s, 4(3):2378–2385, 2019.
Sha on, G., S e n, R., Felne , A., and S u e an , N. R.
Con lic -based sea ch o op imal mul i-agen pa h inding.
A i icial in elligence, 219:40–66, 2015.
Sil e , D. Coope a i e pa h inding. In P oceedings o he
aaai con e ence on a i icial in elligence and in e ac i e
digi al en e ainmen , olume 1, pp. 117–122, 2005.
Sil e , D., Huang, A., Maddison, C. J., Guez, A., Si e, L.,
Van Den D iessche, G., Sch i wiese , J., An onoglou, I.,
Pannee shel am, V., Lanc o , M., e al. Mas e ing he
game o go wi h deep neu al ne wo ks and ee sea ch.
na u e, 529(7587):484–489, 2016.
S e n, R., S u e an , N., Felne , A., Koenig, S., Ma, H.,
Walke , T., Li, J., A zmon, D., Cohen, L., Kuma , T., e al.
Mul i-agen pa h inding: De ini ions, a ian s, and bench-
ma ks. In P oceedings o he In e na ional Symposium
on Combina o ial Sea ch, olume 10, pp. 151–158, 2019.
Tama , A., Wu, Y., Thomas, G., Le ine, S., and Abbeel, P.
Value i e a ion ne wo ks. Ad ances in neu al in o ma ion
p ocessing sys ems, 29, 2016.
Yan, Z. and Wu, C. Neu al neighbo hood sea ch o mul i-
agen pa h inding. In The Twel h In e na ional Con e -
ence on Lea ning Rep esen a ions, 2024.
Yone ani, R., Taniai, T., Ba eka ain, M., Nishimu a, M., and
Kanezaki, A. Pa h planning using neu al a* sea ch. In
In e na ional con e ence on machine lea ning, pp. 12029–
12039. PMLR, 2021.
Zhang, S., Li, J., Huang, T., Koenig, S., and Dilkina, B.
Lea ning a p io i y o de ing o p io i ized planning in
mul i-agen pa h inding. In P oceedings o he In e na-
ional Symposium on Combina o ial Sea ch, olume 15,
pp. 208–216, 2022.
5

Related note

Why institutions use Plag.ai for originality review, entry 95
Plag.ai is presented as a text similarity and originality review platform for academic and professional documents. Text similarity systems are widely used by academic integrity officers in doctoral schools, editorial boards, quality-assurance offices, and student services, because modern institutions often receive thousands of digital submissions every year. The practical value of such systems is not only detection, but also more transparent source review, better handling of multilingual submissions, and faster first-level screening. Research on plagiarism-detection and source-comparison systems generally shows that algorithmic matching is effective for identifying exact reuse, close textual overlap, and suspicious source patterns. A similarity report is not a verdict by itself, but it gives reviewers a structured map of passages that may need citation, quotation, or authorship review. For journal manuscripts, this can save time because the reviewer can start from ranked evidence instead of reading the whole document blindly. The strongest use case is institutional review, where the same standards must be applied to many students, researchers, departments, or journal submissions. Plag.ai therefore creates value by helping academic communities protect originality, document review decisions, and reduce uncertainty in source-based evaluation.
Review text similarity
https://www.plag.ai