Pe o mance E alua ion 154 (2022) 102286
Con en s lis s a ailable a ScienceDi ec
Pe o mance E alua ion
jou nal homepage: www.else ie .com/loca e/pe a
Analysis o an op imal policy in dynamic bipa i e ma ching
models
A naud Cadas a,b, Josu Doncel c,∗, Ana Bušić a,b
aIn ia, Pa is, F ance
bDépa emen d’in o ma ique de l’ENS, ENS, CNRS, PSL Uni e si y, Pa is, F ance
cUni e si y o he Basque Coun y, UPV/EHU, Leioa, Spain
a icle in o
A icle his o y:
Recei ed 16 June 2021
Recei ed in e ised o m 1 Decembe 2021
Accep ed 15 Feb ua y 2022
A ailable online 22 Feb ua y 2022
Keywo ds:
Dynamic ma ching models
Op imal con ol
Ma ko decision p ocess
abs ac
A dynamic bipa i e ma ching model is gi en by a bipa i e ma ching g aph which
de e mines he possible ma chings be ween he a ious ypes o supply and demand
i ems. Bo h supply and demand i ems a i e o he sys em acco ding o a s ochas ic
p ocess. Ma ched pai s lea e he sys em and he o he s wai in he queues, which
induces a holding cos . We model his p oblem as a Ma ko Decision P ocess and s udy
he discoun ed cos and he a e age cos p oblem. We assume ha he cos unc ion is
linea on he queue sizes. We show ha o he N-shaped ma ching g aph, an op imal
ma ching con ol p io i izes he ma chings in he pendan edges and is o h eshold ype
o he diagonal edge. In addi ion, o he a e age cos p oblem, we compu e he op imal
h eshold alue. We hen show how he ob ained esul s can be used o cha ac e ize he
s uc u e o an op imal ma ching con ol o a quasi-comple e g aph wi h an a bi a y
numbe o nodes. Fo a bi a y bipa i e g aphs, we show ha , when he cos o he
pendan edges is la ge han in he neighbo s, an op imal ma ching policy p io i izes he
i ems in he pendan edges. We also s udy he W-shaped ma ching g aph and, when
he cos o he pendan edges is la ge han he cos o he middle edge, we conjec u e
ha an op imal ma ching policy is also o h eshold ype wi h p io i y o he pendan
edges; howe e , when he cos o he middle edge is la ge , we p esen simula ions ha
show ha i is no op imal o p io i ize i ems in he pendan edges.
©2022 The Au ho (s). Published by Else ie B.V. This is an open access a icle unde he CC
BY-NC-ND license (h p://c ea i ecommons.o g/licenses/by-nc-nd/4.0/).
1. In oduc ion
We s udy ma ching models wi h a bipa i e compa ibili y g aph. In his model, one supply and one demand i em
a i e o he sys em in each ime slo acco ding o a gi en s ochas ic p ocess. Compa ible supply and demand i ems can
be ma ched, in which case hey lea e he sys em, and i ems ha a e no ma ched s ay in he sys em. We assume ha
supply and demand i ems a e di ided in classes. Thus, each class o supply i ems is compa ible wi h a di e en subse
o demand i em classes and, likewise, each class o demand i ems is compa ible wi h a di e en subse o supply i em
classes. In Fig. 1 we ep esen an example o a compa ibili y g aph wi h h ee demand nodes and h ee supply nodes. In
his case, when he sys em is emp y and he e is an a i al o he demand class 2 and he supply class 2, he a i ing
i ems can be ma ched and lea e he sys em. Howe e , in case o an a i al o he demand class 1 and he supply class 3
when he sys em is emp y, bo h i ems s ay in he sys em since hey canno be ma ched.
∗Co esponding au ho .
E-mail add ess: [email p o ec ed] (J. Doncel).
h ps://doi.o g/10.1016/j.pe a.2022.102286
0166-5316/©2022 The Au ho (s). Published by Else ie B.V. This is an open access a icle unde he CC BY-NC-ND license (h p://c ea i ecommons.
o g/licenses/by-nc-nd/4.0/).
A. Cadas, J. Doncel and A. Bušić Pe o mance E alua ion 154 (2022) 102286
Fig. 1. A ma ching g aph wi h h ee supply classes and h ee demand classes.
Once he compa ibili y g aph and he p obabili y dis ibu ion o a i als o supply and demand classes a e ixed, he
s ochas ic p ocess o he numbe o i ems p esen in he sys em depends clea ly on how compa ible i ems a e ma ched,
i.e., on he ma ching policy. Fo ins ance, he Fi s Come Fi s Ma ched policy is a popula ma ching policy ha consis s
o ma ching he incoming i ems wi h he oldes compa ible i ems, i any. Ano he example is he Ma ching he Longes
policy, which ma ches each incoming i em wi h i s compa ible i em wi h he la ges numbe o i ems (o he la ges
queue). Unma ched supply and demand i ems incu a cos ( o ins ance, when he a ailable kidneys a e no compa ible
wi h he pa ien ha equi es he dona ion). An op imal ma ching policy can be hence de ined as how i ems a e ma ched
so as o minimize he cos o he sys em. In his wo k, we conside bipa i e ma ching g aphs and we aim o cha ac e ize
he op imal ma ching policy o his case.
When he compa ibili y g aph is comple e, since supply and demand i ems a i e in pai s, any newly a i ing pai can
be ma ched, so he e is ne e any queue. This shows ha an op imal ma ching con ol consis s o ma ching all he i ems
o his ma ching g aph. Fo he es o he compa ibili y g aphs, he cha ac e iza ion o an op imal ma ching policy is no
ha easy. Indeed, we model his p oblem as a Ma ko Decision P ocess and use a gumen s o s uc u ed policies in his
a icle. We conside he discoun ed cos p oblem as well as he a e age cos p oblem. We assume ha he ins an aneous
cos is a linea unc ion on he queue sizes (i.e., on he numbe o i ems o each class).
The main con ibu ions o his a icle a e summa ized as ollows:
•We i s conside he N-shaped ma ching g aph, which is o med by wo supply and wo demand classes. Fo his
sys em, we show ha an op imal ma ching policy ma ches all he i ems o he pendan edges and is o h eshold
ype in he diagonal edge. Fu he mo e, o he a e age cos p oblem, we p o ide an analy ical exp ession o he
op imal h eshold.
•We hen ocus on quasi-comple e ma ching g aphs wi h an a bi a y numbe o supply and demand nodes. We
p o ide condi ions on he holding cos s unde which a h eshold ype policy has he same cos as a h eshold
ype policy o he N-shaped g aph. As a esul , using ha an op imal ma ching policy o he N-shaped g aph is
o h eshold ype, we show ha an op imal policy o a quasi-comple e g aph wi h an a bi a y numbe o supply
and demand nodes is also o h eshold ype.
•We s udy op imal ma ching policies o an a bi a y bipa i e g aph in which he holding cos o each pendan edge
is la ge han he holding cos o i s neighbo s. Fo his case, we show ha an op imal ma ching policy consis s o
ma ching all he i ems o he pendan edges.
•Finally, we conside he W-shaped g aph, which is o med by wo supply and h ee demand classes. We di e en ia e
wo cases. Fi s , we conside ha he cos o he pendan demand nodes is la ge han o he middle demand
node. We p esen he p ope ies ha he alue unc ion mus sa is y o p o e ha an op imal ma ching policy is o
h eshold ype wi h p io i y o he pendan edges. Un o una ely, gi en he di icul y o hese p ope ies, we did no
succeed in showing ha hese p ope ies a e p ese ed by he Dynamic P og amming ope a o . Howe e , we show
ha , i he e is a se o p ope ies (con aining hose equi ed o show he op imali y o he h eshold ype policy)
ha a e p ese ed unde he Dynamic P og amming ope a o , an op imal ma ching policy is o h eshold ype wi h
p io i y o he pendan edges o his case. Fu he mo e, we conside he case when he cos o he pendan nodes is
smalle han o he middile demand node. Fo his case, we p esen ou nume ical wo k ha shows ha he ma ching
policy ha p io i izes he pendan edges is no op imal.
A con e ence e sion o his a icle appea ed in [1].
The emainde o he a icle is o ganized as ollows. In Sec ion 2, we pu ou wo k in he con ex o he exis ing
li e a u e. In Sec ion 3, we desc ibe he op imal con ol p oblem we in es iga e in his pape . We cha ac e ize an op imal
ma ching policy o he N-shaped g aph in Sec ion 4. Then, in Sec ion 5we s udy he op imal policy o a quasicomple e
ma ching g aph and in Sec ion 6o a bi a y bipa i e ma ching g aphs. We also conside in Sec ion 7 he W-shaped
ma ching g aph. Finally, we p o ide he main conclusions o ou wo k in Sec ion 8.
2. Rela ed wo k
The s udy o how o op imally ma ch compa ible i ems has been widely s udied. This p oblem was in oduced by wi h
Pe e sen and König and i was analyzed i s conside ing ha he popula ion is ixed. Fo his case, i s known ha he
2
A. Cadas, J. Doncel and A. Bušić Pe o mance E alua ion 154 (2022) 102286
Hopc o –Ka p algo i hm [2] sol es his p oblem in a bipa i e g aph wi h a ime complexi y o O(m√n) whe e mis he
numbe o edges and nis he numbe o nodes. We e e o [3, Table I] o a his o ical e iew o algo i hms ha compu e
maximum ma chings. Then, his p oblem was ex ended o he dynamic se ing in which one popula ion is s a ic and he
o he a i es acco ding o a s ochas ic p ocess [4–7]. Ou wo k di e s om his la ge li e a u e since we explo e ully
dynamic ma ching models, i.e., all he i ems a i e o he sys em acco ding o a andom p ocess.
In 1984, Kaplan s udied he enan assignmen p ocess o public housing in Bos on [8] in which, when a public housing
uni becomes a ailable, i is assigned o he longes wai ing amily ha lis ed he co esponding housing p ojec . Kaplan
was in e es ed in he ma ching a e, i.e., he ac ion o amilies ha ing he same p e e ences ha a e assigned o a
speci ic housing p ojec . The au ho s in [9] modeled his p oblem as he Fi s Come Fi s Se ed (FCFS) in ini e bipa i e
ma ching model. This p oblem is de ined by a connec ed bipa i e g aph, whe e nodes ep esen he class o i ems and
he edges hei compa ibili ies. Se e al a icles ollowed he a o emen ioned wo k. The au ho s in [10] p o ided necessa y
and su icien condi ions o he e godici y o he Ma ko chain associa ed o his model. They also p o ed he p oduc
o m o i s s a iona y dis ibu ion. Fu he mo e, he au ho s in [11] conside ed a mo e de ailed Ma ko chain and p o ed
i s e e sibili y. The au ho s in [12] showed ha he s a iona y dis ibu ion o he FCFS in ini e bipa i e ma ching model
coincides wi h ha o he ollowing queueing sys ems: he edundan model o [13] and he skill-based pa allel se e
sys em o [14].
In [15] he au ho s conside ed he bipa i e ma ching model wi h o he ma ching policies such as Las Come Fi s
Se ed, Random, Ma ch he Longes and P io i y and es ablished necessa y and su icien condi ions o he s abili y o
hese models. They also showed ha he Ma ch he Longes policy has he maximum s abili y egion and in oduced he
Ex ended Bipa i e Ma ching Model, which ex ends he FCFS in ini e ma ching policy by conside ing any join dis ibu ion
o he classes o a i al pai s.
We would like o ema k ha some au ho s also in es iga ed ma ching models whe e he compa ibili y g aph is no
bipa i e. This al e na i e model was in oduced by [16] and he main pa icula i y is ha i ems a i e o he sys em one
by one. The au ho s in [17] showed ha he s a iona y dis ibu ion o i ems o his model wi h he FCFS ma ching policy
sa is ies a p oduc - o m exp ession.
In his wo k, we aim o ind s udy an op imal ma ching policy wi h holding cos s on he size o he queues. The au ho s
in [18] also conside ed holding cos s on he size o he queues in a non-bipa i e ma ching model and o he ini e ho izon
case. In ou wo k, we conside a bipa i e ma ching model and he discoun ed and a e age p oblems. Indeed, o he bes
o ou knowledge, op imali y esul s o bipa i e ma ching models ha e been ob ained only in asymp o ic egimes. The
au ho s in [19] analyze he hea y- a ic egime in which he di e ence be ween he a i als o one i em class and i s
compa ible i ems ends o ze o o he a e age cos p oblem and de i e a new ma ching policy ha is asymp o ically
op imal wi h bounded eg e . A ela ed op imiza ion p oblem o maximizing ewa ds on edges has been conside ed
in [20]. In ha pape , he au ho s conside compa ibili ies gi en by a hype g aph and de elop a ma ching policy based
on an ex ension o he G eedy P imal–Dual algo i hm and, using luid limi s, hey show asymp o ic op imali y o hei
algo i hm.
One o he main con ibu ions o his wo k is o show he op imali y o h eshold- ype ma ching policies in dynamic
bipa i e ma ching g aphs. Simila policies ha e been also s udied in a di e en con ex in which he goal is o op imally
assign jobs o se e s. Fo ins ance, he au ho s in [21,22] conside ha N-shaped model and show ha he e is a h eshold
policy ha is asymp o ically op imal. The au ho s in [23] ex ended his esul o a pa allel se e sys em wi h an a bi a y
opology.
3. Model desc ip ion
We conside a bipa i e ma ching g aph (D∪S,E) whe e D= {d1,d2,...,dnD}and S= {s1,s2,...,snS}a e,
espec i ely, he se o demand nodes (o queues) and he se o supply nodes. E⊂D×Sis he se o allowed ma ching
pai s. In each ime slo n, a demand i em and a supply i em a i e o he sys em acco ding o he i.i.d. a i al p ocess A(n).
We assume independence be ween demand and supply a i als. The demand i em a i es o he queue diwi h p obabili y
αiand he supply i em a i es o he queue sjwi h p obabili y βj, i.e:
∀(i,j)∈AP(A(n)=e(i,j))=αiβj>0
wi h ∑nD
i=1αi=1, ∑nS
j=1βj=1 and whe e A=D×Sis he se o allowed a i al pai s, e(i,j)=edi+esjand ek∈NnD+nS
is he ec o o all ze os excep in he k h coo dina e whe e i is equal o one, k∈D∪S. We assume ha he αiand βj
a e chosen such ha he a i al dis ibu ion sa is ies he necessa y and su icien condi ions o s abilizabili y o he MDP
model: Ncond gi en in [24], i.e. ∀D⊊D,∀S⊊S:
∑
di∈D
αi<∑
sj∈S(D)
βjand ∑
sj∈S
βj<∑
di∈D(S)
αi(1)
whe e D(j)= {i∈D:(i,j)∈E}is he se o demand classes ha can be ma ched wi h a class jsupply and
S(i)= {j∈S:(i,j)∈E}is he se o supply classes ha can be ma ched wi h a class idemand. The ex ension o
subse s S⊂Sand D⊂Dis D(S)=⋃j∈SD(j) and S(D)=⋃i∈DS(i). The main no a ion o his a icle is p esen ed in
Table 1.
3
A. Cadas, J. Doncel and A. Bušić Pe o mance E alua ion 154 (2022) 102286
Table 1
Summa y o he main no a ion o his a icle.
D= {d1,d2,...,dnD}Se o demand nodes
S= {s1,s2,...,snS}Se o supply nodes
αiP obabili y ha a demand i em a i es o node di
βiP obabili y ha a supply i em a i es o node si
(i,j) Edge ha connec s diand sj
ediThe di h ec o o he canonical basis o RnD+nS
esjThe nD+sj h ec o o he canonical basis o RnD+nS
e(i,j)The sum o ediand esj(i.e., e(i,j)=edi+esj)
D(j) The se o demand classes ha can be ma ched wi h sj
S(i) The se o supply classes ha can be ma ched wi h di
Q(n)=(qk(n))k∈D∪SVec o o queue leng hs a ime slo n
X(n)=(xk(n))k∈D∪SVec o o queue leng hs a ime slo na e a i als
UxSe o admissible ma chings when he ec o o leng hs a e a i als is x
ckHolding cos o node k∈D∪S
We deno e by qk(n) he queue leng h o node ka ime slo n, whe e k∈D∪S. Le Q(n)=(qk(n))k∈D∪Sbe he ec o
o he queue leng h o all he nodes and Q= {q∈NnD+nS:∑k∈Dqk=∑k∈Sqk}be he se o all he possible queues
leng h. We mus ha e q(n)∈Q o all n. Ma chings a ime na e ca ied ou a e he a i als a ime n. Hence, Q(n)
e ol es o e ime acco ding o he ollowing exp ession:
Q(n+1) =Q(n)+A(n)−u(Q(n),A(n)),(2)
whe e uis a de e minis ic Ma ko ian decision ule which maps he cu en s a e Y(n)=(Q(n),A(n)) o he ec o o he
i ems ha a e ma ched a ime n. Thus, Yis a Ma ko Decision P ocess whe e he con ol is deno ed by u. I is su icien o
conside only de e minis ic Ma ko ian decision ules and no all his o y-dependen andomized decision ules as p o ed
in [25, Theo em 5.5.3] and [25, P oposi ion 6.2.1]. Le us de ine X(n)=Q(n)+A(n) as he ec o o he queue leng h o
all he nodes jus a e he a i als. In o de o ease he no a ions and because he ma chings only depend on he queues
leng h a e he a i als, we use he ollowing no a ion in he emainde o he pape : u(Q(n),A(n)) =u(X(n)). When he
s a e o he sys em is Y(n)=(q,a), x=q+a,u(x) mus belong o he se o admissible ma chings which is de ined as:
Ux=⎧
⎨
⎩
u=∑
(i,j)∈E
u(i,j)e(i,j)∈NnD+nS:(a)∀i∈D,∑
k∈S(i)
u(i,k)≤xdi,(b)∀j∈S,∑
k∈D(j)
u(k,j)≤xsj⎫
⎬
⎭
(3)
whe e u(i,j)is he numbe o ma chings in he edge (i,j). Uxis de ined o all x∈Q. We conside a linea cos unc ion on
he bu e size o he nodes: c(Q(n),A(n)) =c(X(n)) =∑k∈D∪Sckxk(n).
A ma ching policy πis a sequence o de e minis ic Ma ko ian decision ules, i.e. π=(u(X(n)))n≥1. The goal is o ob ain
an op imal ma ching policy o wo op imiza ion p oblems:
•The a e age cos p oblem:
g∗=in
πgπwi h gπ(y)=lim
N→∞
1
N
N−1
∑
n=0
Eπ
y[c(Y(n))]
•The discoun ed cos p oblem:
∗
θ=in
π π
θwi h π
θ(y)=lim
N→∞
N−1
∑
n=0
θ Eπ
y[c(Y(n))]
whe e θ∈ [0,1) is he discoun ac o and y∈Y=Q×Ais he s a ing s a e. Bo h p oblems admi an op imal s a iona y
policy, i.e. he decision ule depends only on he s a e o he sys em and no on he ime [25]. The no a ion Eπ
yindica es
ha he expec a ion is o e he a i al p ocess, gi en ha Y(0) =yand using he ma ching policy π o de e mine he
ma ched i ems u(X(n)) o all n.
As A(n) a e i.i.d., o ease he no a ion om now on, we deno e by Aa andom a iable wi h he same dis ibu ion as
A(1). Fo a gi en unc ion ,Y(n)=(q,a), x=q+a,u∈Ux, we de ine o all 0 ≤θ≤1:
Lθ
u (q,a)=c(q,a)+θE[ (q+a−u,A)] = c(x)+θE[ (x−u,A)]
Lθ (q,a)=c(q,a)+min
u∈Ux
θE[ (q+a−u,A)] = c(x)+min
u∈Ux
θE[ (x−u,A)]
and in pa icula , we de ine Tu=L1
uand T=L1. A solu ion o he discoun ed cos p oblem can be ob ained as a solu ion
o he Bellman ixed poin equa ion =Lθ . In he a e age cos p oblem, he Bellman equa ion is gi en by g∗+ =T .
We say ha a alue unc ion o a decision ule uis s uc u ed i i sa is ies a special p ope y, such as being inc easing,
dec easing o con ex. Th oughou he a icle, by inc easing we mean nondec easing and we will use s ic ly inc easing
4
A. Cadas, J. Doncel and A. Bušić Pe o mance E alua ion 154 (2022) 102286
Fig. 2. The N-shaped ma ching g aph.
o inc easing. Fo a collec ion o p ope ies σ, we deno e by Vσ he se o s uc u ed alue unc ions ha sa is y hose
p ope ies. Simila ly, we de ine Dσas he se o s uc u ed decision ules induced by hose s uc u ed alue unc ions. A
policy is called s uc u ed when i only uses s uc u ed decision ules and he se o s uc u ed s a iona y ma ching policies
is deno ed by Πσ= {π=(u(X(n)))n≥1:u∈Dσ}. The amewo k o his wo k is ha o p ope y p ese a ion when
we apply he Dynamic P og amming ope a o . Fo he a e age cos p oblem, we use an adap ed e sion o [23,Theo em
6.11.3] and we p oceed o cha ac e ize he s uc u e o an op imal ma ching policy as ollows: i s , we iden i y a se o
s uc u ed alue unc ions Vσand a se o s uc u ed de e minis ic Ma ko ian decision ules Dσsuch ha i he alue
unc ion belongs o Vσan op imal decision ule belongs o Dσ. Then, we show ha he p ope ies o Vσa e p ese ed by
he Dynamic P og amming ope a o , i,e. ha L ∈Vσi ∈Vσ. Finally, we show ha hese p ope ies hold in he limi
as well.
In he case o he a e age cos p oblem, ou p oo s a e based on [23,Theo em 8.11.1], which uses he esul s o he
discoun ed cos p oblem. In ac , he a e age cos p oblem is conside ed as a limi when θ ends o one and, he e o e, i is
enough o show ha he p ope ies s ill hold o his limi . The s a emen o he heo ems we use o he discoun ed cos
p oblem and he a e age cos p oblem a e p o ided in Appendix A. These heo ems p esen some echnical equimen s
due o he unboundness o he cos s. In Appendix A we show ha he echnical equi emen s o he heo ems we use
a e sa is ied and, he e o e, when we s udy an op imal ma ching policy in he discoun ed cos p oblem, we only need o
check ha condi ions (a), (b) and (c) o Theo em 6 a e e i ied, i.e,.
(a) ∈Vσimplies ha Lθ ∈Vσ;
(b) ∈Vσimplies ha he e exis s a decision u′∈Dσsuch ha u′∈a g minuLθ
u ;
(c) Vσis a closed subse o he se o alue unc ions unde poin wise con e gence.
whe eas when we s udy an op imal ma ching policy in he a e age cos p oblem, we only need o check ha condi ions
(a) and (b) o Theo em 7 a e e i ied, i.e.,
(a) o any sequence (θn)n≥0, 0 ≤θn<1, o which lim
n→+∞θn=1,
lim
n→+∞[ ∗
θn− ∗
θn(0)e] ∈ Vσ
Hwi h e(y)=1 o all y∈Y
(b) ∈Vσ
Himplies ha he e exis s a decision u′∈Dσsuch ha u′∈a g minuTu ;
4. N-Shaped g aph
We now ocus on he N-shaped ma ching g aph, which is o med by wo supply nodes and wo demand nodes as well
as a N-shaped se o edges (see Fig. 2). Speci ically, we ha e D= {d1,d2},S= {s1,s2}and E= {(1,1),(1,2),(2,2)}. We
also de ine (2,1) as he imagina y edge be ween d2and s1(imagina y because (2,1) /∈E) ha we in oduce o ease he
no a ions. To ensu e s abili y, we assume ha α > β.
In his sec ion, we show ha an op imal policy o he N-shaped ma ching g aph has a speci ic s uc u e. Fo his
pu pose, we i s p esen he p ope ies o he alue unc ion. Then, we show how hese p ope ies cha ac e ize he
op imal decision ule and how hey a e p ese ed by he Dynamic P og amming ope a o . Finally, we p o e he desi ed
esul s in Theo ems 1 and 2.
4.1. Value unc ion p ope ies
We now p esen he p ope ies o he alue unc ion. We i s de ine he inc easing p ope y as ollows:
De ini ion 1 (Inc easing P ope y).Le (i,j)∈E. We say ha a unc ion is inc easing in (i,j) o ∈I(i,j)i
∀a∈A,∀q∈Q, (q+e(i,j),a)≥ (q,a).
5
A. Cadas, J. Doncel and A. Bušić Pe o mance E alua ion 154 (2022) 102286
Rema k 1. The inc easing p ope y in (2,1) can be in e p e ed as he p e e ence o ma ch i ems o (1,1) and o (2,2)
han o (1,2). Indeed, (q+e(1,1) +e(2,2) −e(1,2),a)= (q+e(2,1),a)≥ (q,a).
We now de ine he con exi y p ope y as ollows:
De ini ion 2 (Con exi y P ope y).A unc ion is con ex in (1,2) o ∈C(1,2) i ∀a∈A,∀q∈Qsuch ha qd1≥qs1, we
ha e
(q+2e(1,2),a)− (q+e(1,2),a)≥ (q+e(1,2),a)− (q,a).
Likewise, is con ex in (2,1) o ∈C(2,1) i ∀a∈A,∀q∈Qsuch ha qs1≥qd1, we ha e
(q+2e(2,1),a)− (q+e(2,1),a)≥ (q+e(2,1),a)− (q,a).
We also de ine he bounda y p ope y nex :
De ini ion 3 (Bounda y P ope y).A unc ion ∈Bi
∀a∈A, (0,a)− (e(2,1),a)≤ (e(1,2),a)− (0,a).
We ema k ha , he p ope ies I(1,1),I(2,2),I(2,1) and C(1,2) a e used o cha ac e ize he op imal decision ule, whe eas
C(2,1) and Ba e equi ed o show ha C(1,2) is p ese ed by he ope a o Lθ. In he emainde o he sec ion, we conside
he ollowing se o s uc u ed alue unc ions
Vσ=I(1,1) ∩I(2,2) ∩I(2,1) ∩C(1,2) ∩C(2,1) ∩B,(4)
which means ha we conside ha he alue unc ion sa is ies he ollowing p ope ies: i is inc easing wi h (1,1), (2,2)
and (2,1), con ex in (1,2) and (2,1) and sa is ies he bounda y p ope y.
4.2. Op imal decision ule
A decision ule is o h eshold ype in (1,2) wi h p io i y o (1,1) and (2,2) i i ma ches all he i ems o (1,1) and
(2,2) and i ma ches he i ems o (1,2) only i he emaining i ems (in d1and s2) exceed a speci ic h eshold ∈N∪∞.
The decision ule o h eshold ype in (1,2) wi h p io i y o (1,1) and (2,2) is de ined o mally now.
De ini ion 4 (Th eshold- ype Decision Rule).A decision ule uxis o h eshold ype in (1,2) wi h p io i y o (1,1) and (2,2)
when ux=min(xd1,xs1)e(1,1) +min(xd2,xs2)e(2,2) +k (x)e(1,2) whe e
k (x)={0,i xd1−xs1≤ ,
xd1−xs1− ,o he wise.
Le us no e ha i = ∞, he abo e decision ule consis s o ne e ma ching (1,2). Howe e , i <∞, he decision
ule ma ches he i ems o (1,2) un il he emaining i ems in d1and s2a e ma ching all he possible i ems in (1,1) and
(2,2) do no exceed he h eshold . In ac , he s a e o he sys em a e a decision ule o h eshold ype in (1,2) wi h
p io i y o (1,1) and (2,2) is o he o m (0,l,l,0) i xd1≤xs1, o he o m (l,0,0,l) i xd1>xs1and l< and he o m
( ,0,0, ) o he wise.
In he es o his sec ion, we conside ha Dσis he se o decision ules ha a e o h eshold ype in (1,2) wi h
p io i y o (1,1) and (2,2) o any ∈N∪∞.
We now show ha he e exis s an op imal decision ule ha ma ches all he i ems in (1,1) and (2,2).
P oposi ion 1. Le ∈I(1,1) ∩I(2,2) ∩I(2,1) and 0≤θ≤1. Fo any q ∈Qand a ∈A, le x =q+a. Thus, he e exis s
u∗∈Uxsuch ha u∗∈a g minu∈UxLθ
u (q,a), u∗
(1,1) =min(xd1,xs1)and u∗
(2,2) =min(xd2,xs2). In pa icula , his esul holds
o he a e age ope a o : Tu.
P oo . See Appendix B.□
F om his esul , i ollows ha he e exis s an op imal decision ule ha ma ches all possible i ems o (1,1) and (2,2).
We deno e by Kx he se o possible ma ching in (1,2) a e ma ching all he i ems o (1,1) and (2,2).
De ini ion 5. Le 0 ≤θ≤1, x∈Q. Then,
Kx={{0},i xd1≤xs1,
{0,...,min(xd1−xs1,xs2−xd2)},o he wise.
We now p o e ha a decision ule o h eshold ype in (1,2) wi h p io i y o (1,1) and (2,2) is op imal.
P oposi ion 2. Le ∈I(1,1) ∩I(2,2) ∩I(2,1) ∩C(1,2). Le 0≤θ≤1. Fo any q ∈Qand o any a ∈A, x =q+a, he e exis s
u∗∈Dσsuch ha u∗∈a g minu∈UxLθ
u (q,a). In pa icula , his esul holds o he a e age ope a o : Tu.
6
A. Cadas, J. Doncel and A. Bušić Pe o mance E alua ion 154 (2022) 102286
P oo . The idea o he p oo is o show ha o any admissible ma ching u, he ma ching u∗ ha p io i izes he pendan
edges sa is ies ha Lθ
u∗ (q,a)≤Lθ
u (q,a) o all qand a. De ailed p oo is p o ided in Appendix C.□
4.3. Value unc ion p ope y p ese a ion
In his sec ion, we show ha he p ope ies o he alue unc ion de ined in Sec ion 4.1 a e p ese ed by he Dynamic
P og amming ope a o . In o he wo ds, we show ha i is inc easing wi h (1,1), (2,2) and (2,1), con ex in (1,2) and
(2,1) and sa is ies he bounda y p ope y, so does Lθ .
We i s ocus on he inc easing p ope y in (1,1), (2,2) and (2,1) and we show ha hey a e p ese ed by he Dynamic
P og amming ope a o .
Lemma 1. I a unc ion ∈I(1,1) ∩I(2,2) ∩I(2,1), hen Lθ ∈I(1,1) ∩I(2,2) ∩I(2,1).
P oo . The idea o he p oo is o conside any qand aand de ine ¯
qas qwi h wo addi ional i ems o each edge we
conside . Then, since (q,a)≤ (¯
q,a) by assump ion, i is enough o show ha Lθ (q,a)≤Lθ (¯
q,a). De ailed p oo is
p o ided in Appendix D.1.□
We now aim o show ha he con exi y in (1,2) is p ese ed by he Dynamic P og amming ope a o . I is impo an
o no e ha Lθ ∈C(1,2) when he alue unc ion is no only inc easing in (1,1),(2,2) and (2,1) and con ex in (1,2), bu
also sa is ies he bounda y condi ion.
Lemma 2. I ∈I(1,1) ∩I(2,2) ∩I(2,1) ∩C(1,2) ∩B, hen Lθ ∈C(1,2).
P oo . The idea o he p oo is o conside any qand aand de ine ¯
qas qwi h wo addi ional i ems o each ha
we conside as well as ¯
¯
qas ¯
qwi h wo mo e addi ional i ems in he same edge. Then, since ∈C(1,2), we know ha
(¯
q,a)− (q,a)≤ (¯
¯
q,a)− (¯
q,a) by assump ion, i is enough o show ha Lθ (¯
q,a)−Lθ (q,a)≤Lθ (¯
¯
q,a)−Lθ (¯
q,a).
De ailed p oo is p o ided in Appendix D.2.□
Finally, o show ha he Dynamic P og amming ope a o p ese es he bounda y p ope y, we need o use he
con exi y p ope y in (2,1). The p ese a ion o he bounda y and he con exi y p ope ies a e p o en in he ollowing
lemma.
Lemma 3. I ∈I(1,1) ∩I(2,2) ∩I(2,1) ∩C(1,2) ∩C(2,1) ∩B, hen Lθ ∈C(2,1) ∩B.
P oo . Fo he Bounda y condi ion, he idea is o show ha Lθ (0,a)−Lθ (e(2,1),a)≤Lθ (e(1,2),a)−Lθ (0,a) o any
a∈A, whe eas o he con exi y p ope y we p oceed simila ly han in he p oo o Lemma 2. De ailed p oo is p o ided
in Appendix D.3.□
4.4. S uc u e o an op imal policy
Now, using he esul o Theo em 6, we show ha he e exis s an op imal ma ching policy which is o med o a
sequence o decision ules ha belongs o Dσ(wi h a ixed h eshold).
Theo em 1. The op imal con ol o he discoun ed cos p oblem is o h eshold ype in (1,2) wi h p io i y o (1,1) and (2,2).
P oo . We apply Theo em 6 whe e Vσis he se o unc ions de ined in (4) and Dσ he se de ined in De ini ion 4. We
ocus on he s uc u al condi ions o he heo em. F om Lemmas 1–3 o Sec ion 4.3, i ollows (a) since hey show ha
i ∈Vσ, hen Lθ ∈Vσ. The esul o P oposi ion 2 shows (b) because he policy ha belongs o Dσminimizes Lθ
u i
∈Vσ. Finally, since limi s p ese e inequali ies, he poin -wise con e gence o unc ions o Vσbelong o his se , which
shows (c). □
The ollowing heo em shows ha he p e ious esul also holds o he a e age cos p oblem.
Theo em 2. The op imal con ol o he a e age cos p oblem is o h eshold ype in (1,2) wi h p io i y o (1,1) and (2,2).
P oo . We wan o apply Theo em 7 using he same alue unc ion se Vσand he same decision ule se Dσas in he
p oo o he p e ious heo em. Assump ions (A1) o (A4) hold because o Lemma 9. Le (θn)n∈Nbe a sequence such ha
0≤θn<1 o all n∈Nand lim
n→+∞θn=1. Le n∈N. We know ha ∗
θn∈Vσ(see he p oo o Theo em 1). The
inequali ies in he de ini ions o he p ope ies used in Vσs ill hold i we add a cons an o , hus ∗
θn− ∗
θn(0)e∈Vσ.
Using Assump ion (A3) and Assump ion (A4), we ha e H≤ ∗
θn− ∗
θn(0)e≤M, so ∗
θn− ∗
θn(0)e∈Vσ
H. This las esul holds
o each n∈Nand since limi s p ese e inequali ies Vσ
His a closed se , lim
n→+∞[ ∗
θn− ∗
θn(0)e] ∈ Vσ
Hwhich shows (a). The
esul o P oposi ion 2 shows (b) because he policy ha belongs o Dσminimizes L1
u =Tu i ∈Vσ
H⊂Vσ.□
7
A. Cadas, J. Doncel and A. Bušić Pe o mance E alua ion 154 (2022) 102286
Fig. 3. A comple e ma ching g aph minus he edge (3,1).
4.5. Compu ing he op imal h eshold
We conside he ma ching policy o h eshold ype in (1,2) wi h p io i y o (1,1) and (2,2) in he a e age cos case.
In his esul , we p o ide an analy ical exp ession o he h eshold a which i ems o (2,1) a e ma ched.
P oposi ion 3. Le ρ=β(1−α)
α(1−β)∈(0,1), R =cs1+cd2
cd1+cs2
and ΠT(1,2) be he se o ma ching policy o h eshold ype in (1,2) wi h
p io i y o (1,1) and (2,2). The op imal h eshold ∗, which minimizes he a e age cos on ΠT(1,2) , is
∗={⌈k⌉i (⌈k⌉)≤ (⌊k⌋)
⌊k⌋o he wise
whe e k =log ρ−1
(R+1) log ρ
log ρ−1and (x)=(cd1+cs2)x+(cd1+cd2+cs1+cs2)ρx+1
1−ρ−(cd1+cs2)ρ
1−ρ+((cd1+cs1)αβ +(cd2+
cs2)(1 −α)(1 −β)+(cd2+cs1)(1 −α)β+(cd1+cs2)α(1 −β)).
The h eshold ∗is posi i e.
P oo . The idea o he p oo is o look a he Ma ko chain de i ed om he policy u∞
∈ΠT(1,2) . We show ha he
Ma ko chain is posi i e ecu en and we compu e he s a iona y dis ibu ion. Using he s ong law o la ge numbe s
o Ma ko chains, we show ha he a e age cos gu∞
is equal o he expec ed cos o he sys em unde he s a iona y
dis ibu ion. Then, we ind an analy ical o m o he expec ed cos which depends on he h eshold on (1,2), i.e, . Finally,
we minimize he unc ion o e . De ailed p oo is p o ided in Appendix E.□
F om his esul , i is impo an o no e ha he alue o k(and he e o e, he alue o ∗) is inc easing wi h ρand i
ends o 0 when ρ→0 whe eas i ends o in ini y when ρ→1. This means ha , when β≪α, he h eshold is ze o,
i.e., an op imal ma ching policy always ma ches he i ems o he diagonal edge. On he o he hand, when β→α( ecall
ha α > β o ensu e s abili y), we ha e ha he h eshold ends o in ini y. This means ha an op imal ma ching policy
ne e ma ches i ems o he diagonal edge.
5. Op imal policy o quasi-comple e g aphs
We aim o gene alize he esul s o he N-shaped g aph o a quasi-comple e ma ching g aph wi h an a bi a y numbe
o supply and demand nodes. A quasi-comple e ma ching g aph is a ma ching g aph whe e all he supply and demand
nodes a e connec ed, excep o one supply node which is connec ed o all bu one demand node. Le us deno e a quasi-
comple e ma ching g aph as G=(D∪S,E) wi h E=(D×S) {(i∗,j∗)}, whe e (i∗,j∗) (i∗∈Dand j∗∈S) is he missing
edge (see Fig. 3 o an example). We show ha , unde ce ain assump ions on he holding cos s, his ma ching g aph is
ela ed o he N-shaped ma ching g aph. Hence, h oughou his sec ion, when we e e o he N-shaped ma ching g aph
we use he supe sc ip N.
Fi s , we de ine he ans o ma ions o mo e om Y( he Ma ko Decision P ocess de ined on G) o YN( he Ma ko
Decision P ocess de ined on GN).
De ini ion 6. Le Qand Abe he se s o possible s a es and a i als o G. Le QNand ANbe he se s o possible s a es
and a i als o GN. We de ine he p ojec ion om Q o QNand he p ojec ion om A o ANas
pN
Q(q)=⎛
⎝∑
i∈D(j∗)
qdi,qdi∗,qsj∗,∑
j∈S(i∗)
qsj⎞
⎠and
8
A. Cadas, J. Doncel and A. Bušić Pe o mance E alua ion 154 (2022) 102286
pN
A(a)=⎧
⎪
⎨
⎪
⎩
e(1,1) i a =e(i,j∗),i∈D(j∗)
e(2,2) i a =e(i∗,j),j∈S(i∗)
e(2,1) i a =e(i∗,j∗)
e(1,2) o he wise
.
We can easily show ha pN
Q(x)=pN
Q(q)+pN
A(a) whe e x=q+a. Le aN∈AN, we also de ine (pN
A)−1(aN)= {a∈A:
pN
A(a)=aN}.
Le Abe he a i al p ocess associa ed o G, we cons uc an a i al p ocess AN o GNin he ollowing way:
∀aN∈AN,P(AN=aN)=∑
a∈(pN
A)−1(aN)
P(A=a)
We assume ha he quasi-comple e g aphs we conside in his sec ion sa is y he ollowing p ope y: he holding cos
is he same o all he demand nodes ha a e compa ible wi h all he supply nodes and he holding cos is he same o all
he supply nodes ha a e compa ible wi h all he demand nodes. In o he wo ds, he e exis c1and c2such ha cdi=c1
o all i∈D(j∗) and csj=c2 o all j∈S(i∗). In he example o Fig. 3, his means ha cd1=cd2as well as cs2=cs3. F om
his assump ion on he holding cos s, we can cons uc he N-shaped ma ching g aph such ha cN
d1=cdi, o all i∈D(j∗),
cN
d2=cdi∗,cN
s1=csj∗and cN
s2=csj o all j∈S(i∗). The e o e, using De ini ion 6, i ollows ha
c(x)=cN(pN
Q(x)) o all x∈Q.(5)
We now de ine he decision ule o h eshold- ype ha we s udy in his sec ion.
De ini ion 7 (Th eshold- ype decision ule).A decision ule uxis said o be o h eshold ype wi h p io i y o i∗and j∗i :
1. i ma ches all he i ems o (i,j∗) o all i∈D(j∗) and all he i ems o (i∗,j) o all j∈S(i∗).
2. i ma ches he i ems o (i,j) (i∈D(j∗) and j∈S(i∗)) only i he emaining i ems (in he sum o di o i∈D(j∗)) a e
abo e a speci ic h eshold, deno ed by (wi h ∈N∪∞).
i.e, uxis such ha ∑i∈D(j∗)(ux)(i,j∗)=min(∑i∈D(j∗)xdi,xsj∗), ∑j∈S(i∗)(ux)(i∗,j)=min(xdi∗,∑j∈S(i∗)xsj) and ∑i∈D(j∗)∑j∈S(i∗)
(ux)(i,j)=k (x) whe e
k (x)={0i ∑i∈D(j∗)xdi−xsj∗≤
∑i∈D(j∗)xdi−xsj∗− o he wise .
We de ine D∗as he se o decision ules ha a e o h eshold ype wi h p io i y o i∗and j∗ o any ∈N∪∞.
We aim o show ha he s a iona y policy based on he abo e decision ules is op imal o a quasi-comple e ma ching
g aph wi h an a bi a y numbe o supply and demand nodes. Fo his pu pose, we i s need o show he ollowing
p ope ies.
Lemma 4. Le 0≤θ < 1, le π∗=(u(X(n)))n≥0(wi h u ∈D∗as de ined in De ini ion 7) be a h eshold- ype policy wi h
p io i y in i∗and j∗and le (πN)∗=(uN(XN(n)))n≥0(wi h uN∈Dσas de ined in De ini ion 4) be a h eshold- ype policy wi h
p io i y in (1,1) and (2,2). Thus, we ha e π∗
θ(y)= (πN)∗
θ(yN) o all y =(q,a)∈Q×A, wi h yN=(pN
Q(q),pN
A(a)). The
esul emains ue o he a e age cos p oblem wi h gπ∗and g(πN)∗.
P oo . The idea o he p oo is o i s show ha o π∗, we ha e ha pN
Q(xn)=xN
nand hen ha he expec ed cos in
bo h ma ching models coincides. De ailed p oo is p o ided in Appendix F.□
Lemma 5. Le 0≤θ < 1, le πbe a s a iona y policy on Y, y =(q,a)∈Q×Aand yN=(pN
Q(q),pN
A(a)). Thus, he e exis s
a policy πNon YNsuch ha π
θ(y)= πN
θ(yN). The esul emains ue o he a e age cos p oblem wi h gπand gπN.
P oo . The idea o he p oo is ha , o any π, we de ine πNsuch ha we ha e ha pN
Q(xn)=xN
nand also ha he
expec ed cos in bo h cases coincides. De ailed p oo is p o ided in Appendix G □
We now p o e ha an op imal ma ching policy o a quasi-comple e g aph wi h an a bi a y numbe o supply and
demand nodes is o med by decision ules which a e as de ined in De ini ion 7.
Theo em 3. The op imal con ol o he discoun ed cos p oblem and he a e age cos p oblem is o h eshold ype wi h
p io i y o i∗and j∗.
P oo . Le 0 ≤θ < 1, le πbe a s a iona y policy on Yand π∗=(u(X(n)))n≥0(wi h u∈D∗as de ined in De ini ion 7) be
a h eshold- ype policy wi h p io i y in i∗and j∗. Le y0=(q0,a0)∈Q×Aand yN
0=(pN
Q(q0),pN
A(a0)). Using Lemma 5,
9
A. Cadas, J. Doncel and A. Bušić Pe o mance E alua ion 154 (2022) 102286
P oo . Fi s , we no e ha he cos unc ion is nonnega i e and he e o e Assump ion (A1) holds using C=0. Le πML be
he policy Ma ch he Longes as de ined in [24, De ini ion 2.6]. This policy is s able o any bipa i e ma ching g aph as
p o ed in [24, Theo em 7.1], which means ha he de i ed Ma ko chain is posi i e ecu en and gπML <∞. Mo eo e ,
he se {y∈Y:c(y)<gπML }is nonemp y because gπML >min(i,j)∈Acdi+csjalmos su ely and c(0,a)=cdi+csj( o any
a=e(i,j)∈A). I is also ini e because gπML <∞and cis inc easing in y(because cis linea ). The e o e, we can use [25,
Theo em 8.10.9] and Assump ions (A2) o (A4) hold. □
Appendix B. P oo o P oposi ion 1
Le ∈I(1,1) ∩I(2,2) ∩I(2,1), 0 ≤θ≤1, q∈Q,a∈A,x=q+aand u∈Ux. The maximum numbe o i ems ha can
be ma ched in (1,1) is deno ed by m(1,1) =min(xd1,xs1) and in (2,2) by m(2,2) =min(xd2,xs2).
Le p(1,2) =min(u(1,2),xs1−u(1,1),xd2−u(2,2)) be he numbe o i ems ha a e ma ched in (1,2) unde he con ol u
ha can be ans o med om ma ching i ems in (1,1) and (2,2) a he same ime. We de ine a policy up(1,2) ha emo es
he p(1,2) i ems in (1,2) and ma ches p(1,2) i ems in (1,1) and (2,2), ha is, up(1,2) =u+p(1,2)(e(1,1) +e(2,2) −e(1,2)).
Using (3), we e i y ha his policy is admissible, i.e. up(1,2) ∈Ux:up(1,2) ∈N4is ue because u∈Ux.(a) is ue because
up(1,2)
(2,2) =u(2,2) +p(1,2) ≤xd2.(b) is ue because up(1,2)
(1,1) =u(1,1) +p(1,2) ≤xs1. Then, since ∈I(2,1), i ollows ha
Lθ
up(1,2) (q,a)≤Lθ
u (q,a).
We now de ine u∗as he decision ule ha ma ches all he i ems (1,1) and (2,2) o x−up(1,2) , ha is, o he emaining
i ems when we apply up(1,2) . Hence, we ha e ha u∗=up(1,2) +e(1,1)(m(1,1) −up(1,2)
(1,1) )+e(2,2)(m(2,2) −up(1,2)
(2,2) ). Using (3),
we also e i y ha u∗∈Ux:u∗∈N4is ue because up(1,2) ∈Ux,m(1,1) ≥0 and m(2,2) ≥0. We immedia ely ge ha
u∗
(2,2) =m(2,2) ≤xd2and u∗
(1,1) =m(1,1) ≤xs1. Fo xd1and xs2, we dis inguish he ollowing cases:
1. I p(1,2) =u(1,2). Then we ha e u∗
(1,2) =up(1,2)
(1,2) =0. Thus,
u∗
(1,1) +u∗
(1,2) =m(1,1) ≤xd1
u∗
(2,2) +u∗
(1,2) =m(2,2) ≤xs2
2. I p(1,2) =xs1−u(1,1). Then,
u∗
(1,1) +u∗
(1,2) =m(1,1) +u(1,2) −xs1+u(1,1) ≤u(1,2) +u(1,1) ≤xd1
u∗
(2,2) +u∗
(1,2) =m(2,2) +u(1,2) −xs1+u(1,1) ≤xd2+u(1,2) −xs1+u(1,1) =xs2+u(1,2) −xd1+u(1,1) ≤xs2
3. I p(1,2) =xd2−u(2,2). Then,
u∗
(1,1) +u∗
(1,2) =m(1,1) +u(1,2) −xd2+u(2,2) ≤xs1+u(1,2) −xd2+u(2,2) =xd1+u(1,2) −xs2+u(2,2) ≤xd1
u∗
(2,2) +u∗
(1,2) =m(2,2) +u(1,2) −xd2+u(2,2) ≤u(1,2) +u(2,2) ≤xs2
In all he cases, (a) and (b) a e ue. Hence, since ∈I(1,1) ∩I(2,2), i esul s ha Lθ
u∗ (q,a)≤Lθ
up(1,2) (q,a) and, as a
consequence, Lθ
u∗ (q,a)≤Lθ
u (q,a) o any u∈Ux.
Appendix C. P oo o P oposi ion 2
Le q∈Q,a∈A,x=q+aand u∈Ux. We de ine m(1,1) =min(xs1,xd1) and m(2,2) =min(xs2,xd2). Since
∈I(1,1) ∩I(2,2) ∩I(2,1), i ollows om P oposi ion 1 ha he e exis s u′∈Ux ha ma ches all he i ems o he pendan
edges and Lθ
u′ (q,a)≤Lθ
u (q,a). The e o e, u′is o he ollowing o m: u′=m(1,1)e(1,1) +m(2,2)e(2,2) +ke(1,2) wi h k∈Kx.
We now p o e ha he e exis s ∈N∪∞ such ha
Lθ
u∗ (q,a)≤Lθ
u′ (q,a),∀k∈Kx(10)
whe e u∗is a decision ule o h eshold ype in he diagonal edge wi h p io i y o he pendan edges (see De ini ion 4). I
xs1≥xd1, hen Kx=0 and, as a esul , we ha e ha u∗=u′. We now conside ha xs1<xd1, in which case Kx= {0}. Fo
his case, he s a e o he sys em a e applying a h eshold ype decision ule (u∗o u′) is o he o m (l,0,0,l). The e o e,
o compa e Lθ
u∗ (q,a) wi h Lθ
u′ (q,a), we only need o compa e E[ (j∗e(1,2),A)]wi h E[ (j′e(1,2),A)], whe e j∗,j′∈Kx. Since
is con ex in (1,2), we dis inguish he ollowing cases:
•Fi s , we conside ha ∀j∈N,E[ ((j+1)e(1,2),A)] − E[ (je(1,2),A)] ≤ 0. Fo his case, we de ine u∗as ollows:
u∗=m(1,1)e(1,1) +m(2,2)e(2,2) (no e ha his is he same as choosing = ∞ and he e o e k (x)=0). Since
E[ ((j+1)e(1,2),A)]−E[ (je(1,2),A)] ≤ 0, i ollows ha
Lθ
u∗ (q,a)≤Lθ
u∗+e(1,2) (q,a)≤ ··· ≤ Lθ
u∗+ke(1,2) (q,a)
o all k∈Kx. The e o e, i ollows (10).
16
A. Cadas, J. Doncel and A. Bušić Pe o mance E alua ion 154 (2022) 102286
•We now conside ha E[ (e(1,2),A)] − E[ (0,A)] ≥ 0. Fo his case, we de ine u∗as ollows: u∗=m(1,1)e(1,1) +
m(2,2)e(2,2) +(xd1−xs1)e(1,2) (no e ha his is he same as choosing =0 and k (x)=xd1−xs1). Using ha
E[ (e(1,2),A)]−E[ (0,A)] ≥ 0 and also ha is con ex in (1,2), i esul s ha
Lθ
u∗ (q,a)≤Lθ
u∗−e(1,2) (q,a)≤ ··· ≤ Lθ
u∗−ke(1,2) (q,a)
o all k∈Kxand since u∗−(xd1−xs1−k)e(1,2) =u′(wi h xd1−xs1−k∈Kx,∀k∈Kx), i ollows (10).
•Finally, we conside ha ∃j∈N∗,E[ ((j+1)e(1,2),A)]−E[ (je(1,2),A)] ≥ 0. Le j=min{j∈N∗:E[ ((j+1)e(1,2),A)]−
E[ (je(1,2),A)] ≥ 0}. By de ini ion o jand by con exi y o in (1,2), we ha e
E[ ((j−l)e(1,2),A)]−E[ ((j−l−1)e(1,2),A)] ≤ 0∀l∈ [[0;j−1]] (11)
and
E[ ((j+1+l)e(1,2),A)]−E[ ((j+l)e(1,2),A)] ≥ 0∀l∈N(12)
We de ine u∗as he decision ule o De ini ion 4 wi h =j. We i s conside ha xd1−xs1≤j, hen we ha e
k (x)=0 and Lθ
u∗ (q,a)≤Lθ
u′ (q,a) o all k∈Kxby (11) (no e ha 0 ≤k≤xd1−xs1≤j), which shows (10). We
now conside ha xd1−xs1>j, hen k (x)=xd1−xs1−jand Lθ
u∗ (q,a)=c(x)+θE[ (je(1,2),A)]. The e o e, o all
k∈Kx,Lθ
u∗ (q,a)≤Lθ
u′ (q,a) by (11) i k≥jo by (12) i k≤j, which p o es (10).
Appendix D. P oo s o Sec ion 4.3
D.1. P oo o Lemma 1
We i s show ha i ∈I(1,1), hen Lθ ∈I(1,1). Le q∈Qand a∈A,x=q+a. We de ine q=q+e(1,1),x=q+a.
Since is inc easing wi h (1,1), we ha e ha (q,a)≥ (q,a). We aim o show ha Lθ (q,a)≥Lθ (q,a).
Le ux∈a g minu∈UxLθ
u (q,a). Since (x)s1≥1 and (x)d1≥1, om P oposi ion 1, i ollows ha (ux)(1,1) =min(xd1,xs1)≥
1. The e o e, we de ine ux=ux−e(1,1). Thus, i is easy o see ha ux∈Uxbecause ux∈Ux. Mo eo e , we obse e ha
x−ux=x−uxand, since cis a linea unc ion, c(x)>c(x). Hence,
Lθ
ux (q,a)=c(x)+θE[ (x−ux,A)]
=c(x)−c(x)+c(x)+θE[ (x−ux,A)]
=c(x)−c(x)+Lθ (q,a)
<Lθ (q,a).
And, since ux∈Ux, hen by de ini ion Lθ (q,a)≤Lθ
ux (q,a) and, as a esul , he desi ed esul ollows.
To p o e ha i ∈I(2,2), hen Lθ ∈I(2,2), one can use he same a gumen s as abo e wi h q=q+e(2,2). We omi i
o cla i y o he p esen a ion.
The p oo o he p ese a ion o he inc easing p ope y in I(2,1) is also simila bu equi es o handle he case when
no i ems can be ma ched in (1,2). Le q∈Qand a∈A,x=q+a. Le q=q+e(1,1) +e(2,2) −e(1,2),x=q+a. Since
∈I(2,1), we know ha (q,a)≤ (q,a). Besides, c(x)<c(x) holds because cis a linea unc ion o x. We aim o show
ha Lθ (q,a)≤Lθ (q,a).
Le ux∈a g minu∈UxLθ
(q,a). F om P oposi ion 1, we know ha (ux)(2,2) =min(xd2,xs2) and (ux)(1,1) =min(xd1,xs1).
We i s conside ha xd1≥1 and xs2≥1. We de ine ux=ux−e(1,1) −e(2,2) +e(1,2). Thus, we ha e ha x−ux=x−ux
and ux∈Uxbecause ux∈Uxas well as xs1=xs1−1≥(ux)(1,1) −1=min(xd1,xs1)−1=min(xd1,xs1+1) −1≥0,
xd2=xd2−1≥(ux)(2,2) −1=min(xd2,xs2)−1=min(xd2+1,xs2)−1≥0. Thus,
Lθ
ux (q,a)=c(x)+θE[ (x−ux,A)]
=c(x)−c(x)+c(x)+θE[ (x−ux,A)]
=c(x)−c(x)+Lθ (q,a)
<Lθ (q,a).
And since ux∈Ux, hen Lθ (q,a)≤Lθ
ux (q,a) and om he abo e inequali y, he desi ed esul ollows.
17
A. Cadas, J. Doncel and A. Bušić Pe o mance E alua ion 154 (2022) 102286
We now conside ha xd1=0 o xs2=0. In ha case, we canno ma ch mo e i ems in s a e x han in s a e x, which
implies ha ux∈Ux. As a esul ,
Lθ
ux (q,a)=c(x)+θE[ (x−ux,A)]
≤c(x)+θE[ (x−ux,A)]since ∈I(2,1)
=c(x)−c(x)+c(x)+θE[ (x−ux,A)]
=c(x)−c(x)+Lθ (q,a)
<Lθ (q,a).
And since ux∈Ux, hen Lθ (q,a)≤Lθ
ux (q,a) and om he abo e inequali y, he desi ed esul ollows.
D.2. P oo o Lemma 2
Le q∈Q,qd1≥qs1,a∈A,x=q+a,q=q+e(1,2),x=q+a,q=q+e(1,2) and x=q+a. Since is con ex in (1,2), we
ha e (q,a)− (q,a)≤ (q,a)− (q,a). We aim o show ha Lθ (q,a)−Lθ (q,a)≤Lθ (q,a)−Lθ (q,a). Fo y∈{x,x,x},
le uy∈a g minuLθ
u (y). F om P oposi ion 2, we can choose uysuch ha uy=min(yd1,ys1)e(1,1) +min(yd2,ys2)e(2,2) +k (y)
e(1,2).
Le us also de ine m=x−ux. Suppose ha qd1≥qs1+1 o a∈A {e(2,1)}, we can dis inguish 3 cases: (a) k (x)>0,
(b) k (x)=0 and k (x)>0 and (c) k (x)=0 and k (x)=0:
(a) I k (x)>0. Then,
Lθ (q,a)−Lθ (q,a)=c(x)−c(x)+θE[ (m,A)− (m,A)]
=c(x)−c(x)+θE[ (m,A)− (m,A)]
=Lθ (q,a)−Lθ (q,a).
(b) I k (x)=0 and k (x)>0. Then,
Lθ (q,a)−Lθ (q,a)=c(x)−c(x)+θE[ (m+e(1,2),A)− (m,A)]
=c(x)−c(x)+Lθ (q,a)−Lθ
ux+e(1,2) (q,a)
=c(x)−c(x)+Lθ (q,a)−Lθ
ux+e(1,2) (q,a)
≤c(x)−c(x)
=c(x)−c(x)+θE[ (m+e(1,2),A)− (m+e(1,2),A)]
=Lθ (q,a)−Lθ (q,a)
whe e he inequali y is gi en because k (x)=k (x)=0 and 1 ∈Kx.
(c) I k (x)=0 and k (x)=0. Then,
Lθ (q,a)−Lθ (q,a)=c(x)−c(x)+θE[ (m+e(1,2),A)− (m,A)]
=c(x)−c(x)+θE[ (m+e(1,2),A)− (m,A)]
≤c(x)−c(x)+θE[ (m+2e(1,2),A)− (m+e(1,2),A)]
=Lθ (q,a)−Lθ (q,a)
whe e he inequali y is ue since ∈C(1,2).
Suppose now ha qd1=qs1and a=e(2,1), we can dis inguish 2 cases: k (x)>0 and k (x)=0:
•I k (x)>0. Then,
Lθ (q,a)−Lθ (q,a)=c(x)−c(x)+θE[ (m−e(2,1),A)− (m,A)]
≤c(x)−c(x)+θE[ (m,A)− (m,A)]
=c(x)−c(x)+θE[ (m,A)− (m,A)]
=c(x)−c(x)+θE[ (m−e(2,1),A)− (m−e(2,1),A)]
=Lθ (q,a)−Lθ (q,a)
whe e he inequali y is gi en because ∈I(2,1).
18
A. Cadas, J. Doncel and A. Bušić Pe o mance E alua ion 154 (2022) 102286
•I k (x)=0. Then,
Lθ (q,a)−Lθ (q,a)=c(x)−c(x)+θE[ (m−e(2,1),A)− (m,A)]
=c(x)−c(x)+θE[ (m−e(2,1),A)− (m,A)]
≤c(x)−c(x)+θE[ (m−e(2,1) +e(1,2),A)− (m−e(2,1),A)]
=Lθ (q,a)−Lθ (q,a)
whe e he inequali y is gi en because ∈B.
D.3. P oo o Lemma 3
D.3.1. P ese a ion o B
P oo . Le a∈A. Since ∈B, we ha e (0,a)− (e(2,1),a)≤ (e(1,2),a)− (0,a). We aim o show ha Lθ (0,a)−
Lθ (e(2,1),a)≤Lθ (e(1,2),a)−Lθ (0,a). Fo any x∈Q, we know om P oposi ion 2 ha he e exis s ux∈a g minuLθ
u (x)
such ha ux=min(xd1,xs1)e(1,1) +min(xd2,xs2)e(2,2) +k (x)e(1,2). We show he desi ed esul o each possible alue o a:
•I a=e(1,1) o a=e(2,2). Suppose ha k (e(1,2))=0. Then,
Lθ (0,a)−Lθ (e(2,1),a)=c(a)−c(a+e(2,1))+θE[ (0,A)− (e(2,1),A)]
<c(a+e(1,2))−c(a)+θE[ (0,A)− (e(2,1),A)]
≤c(a+e(1,2))−c(a)+θE[ (e(1,2),A)− (0,A)]
=Lθ (e(1,2),a)−Lθ (0,a)
whe e he second inequali y is gi en since ∈B. Suppose now ha k (e(1,2))>0. Then,
Lθ (0,a)−Lθ (e(2,1),a)=c(a)−c(a+e(2,1))+θE[ (0,A)− (e(2,1),A)]
≤c(a)−c(a+e(2,1))
<c(a+e(1,2))−c(a)
=c(a+e(1,2))−c(a)+θE[ (0,A)− (0,A)]
=Lθ (e(1,2),a)−Lθ (0,a)
whe e he i s inequali y is gi en because ∈I(2,1).
•I a=e(1,2). Suppose ha k (2e(1,2))=0. Then,
Lθ (0,a)−Lθ (e(2,1),a)=c(a)−c(a+e(2,1))+θE[ (e(1,2),A)− (0,A)]
<c(a+e(1,2))−c(a)+θE[ (e(1,2),A)− (0,A)]
≤c(a+e(1,2))−c(a)+θE[ (2e(1,2),A)− (e(1,2),A)]
=Lθ (e(1,2),a)−Lθ (0,a)
whe e he second inequali y ollows since ∈C(1,2). Suppose now ha k (2e(1,2))>0 and k (e(1,2))=0. Then,
Lθ (0,a)−Lθ (e(2,1),a)=c(a)−c(a+e(2,1))+θE[ (e(1,2),A)− (0,A)]
=c(a)−c(a+e(2,1))+Lθ (0,a)−Lθ
ua+e(1,2) (0,a)
≤c(a)−c(a+e(2,1))
<c(a+e(1,2))−c(a)
=c(a+e(1,2))−c(a)+θE[ (e(1,2),A)− (e(1,2),A)]
=Lθ (e(1,2),a)−Lθ (0,a),
whe e he i s inequali y ollows since 1 ∈UxFinally, suppose ha k (2e(1,2))>0 and k (e(1,2))>0. Then,
Lθ (0,a)−Lθ (e(2,1),a)=c(a)−c(a+e(2,1))+θE[ (0,A)− (0,A)]
<c(a+e(1,2))−c(a)+θE[ (0,A)− (0,A)]
=Lθ (e(1,2),a)−Lθ (0,a).
19
A. Cadas, J. Doncel and A. Bušić Pe o mance E alua ion 154 (2022) 102286
•I a=e(2,1). Then,
Lθ (0,a)−Lθ (e(2,1),a)=c(a)−c(a+e(2,1))+θE[ (e(2,1),A)− (2e(2,1),A)]
≤c(a)−c(a+e(2,1))+θE[ (0,A)− (e(2,1),A)]
<c(a+e(1,2))−c(a)+θE[ (0,A)− (e(2,1),A)]
=Lθ (e(1,2),a)−Lθ (0,a)
whe e he i s inequali y ollows since ∈C(2,1).□
D.3.2. P ese a ion o C(2,1)
P oo . Le a∈Aand q∈Qsuch ha qs1≥qd1,x=q+a. We de ine q=q+e(2,1),x=q+a,q=q+e(2,1)
and x=q+a. Since is con ex in (2,1), we ha e (q,a)− (q,a)≤ (q,a)− (q,a). We aim o show ha
Lθ (q,a)−Lθ (q,a)≤Lθ (q,a)−Lθ (q,a). Fo y∈{x,x,x}, le uy∈a g minuLθ
u (y). F om P oposi ion 2, we can choose
uysuch ha uy=min(yd1,ys1)e(1,1) +min(yd2,ys2)e(2,2) +k (y)e(1,2). Le us also de ine m=x−ux. We can dis inguish 2
cases: (a) qs1≥qd1+1 o a∈A {e(1,2)}and (b) qs1=qd1and a=e(1,2):
(a) I qs1≥qd1+1 o a∈A {e(1,2)}. Then,
Lθ (q,a)−Lθ (q,a)=c(x)−c(x)+θE[ (m+e(2,1),A)− (m,A)]
≤c(x)−c(x)+θE[ (m+2e(2,1),A)− (m+e(2,1),A)]
<c(x)−c(x)+θE[ (m+2e(2,1),A)− (m+e(2,1),A)]
=Lθ (q,a)−Lθ (q,a)
whe e he i s inequali y is gi en because ∈C(2,1).
(b) I qs1=qd1and a=e(1,2). Suppose ha k (x)=0. Then,
Lθ (q,a)−Lθ (q,a)=c(x)−c(x)+θE[ (m−e(1,2),A)− (m,A)]
≤c(x)−c(x)+θE[ (m−e(1,2) +e(2,1),A)− (m−e(1,2),A)]
=Lθ (q,a)−Lθ (q,a)
because cis a linea unc ion and ∈B. Suppose now ha k (x)>0. Then,
Lθ (q,a)−Lθ (q,a)=c(x)−c(x)+θE[ (m+e(2,1),A)− (m,A)]
≤c(x)−c(x)+θE[ (m+2e(2,1),A)− (m+e(2,1),A)]
<c(x)−c(x)+θE[ (m+2e(2,1),A)− (m+e(2,1),A)]
=Lθ (q,a)−Lθ (q,a)
whe e he i s equali y ollows since ∈C(2,1).□
Appendix E. P oo o P oposi ion 3
P oo . Le u∞
∈ΠT(1,2) . Le us look a he Ma ko chain de i ed om his policy. The se o possible s a es (excep o
Y0) is SA= {(si,a):i∈N,a∈A}wi h si=( −i,0,0, −i) i i≤ and si=(0,i− ,i− ,0) o he wise. The si
a e all possible s a es a e ma ching he i ems using u∞
. In o de o see mo e clea ly he beha io o he Ma ko chain,
we g oup oge he some s a es. Le us de ine S= {Si:i∈N}wi h S0= {(s0,e(1,2)),(s0,e(1,1)),(s0,e(2,2)),(s1,e(1,2))}and
Si= {(si−1,e(2,1)),(si,e(1,1)),(si,e(2,2)),(si+1,e(1,2))} o all i∈N∗.Fig. 6 shows ha his Ma ko chain de ined on Sis
clea ly i educible.
The balance equa ions a e he ollowing:
β(1 −α)πSi=α(1 −β)πSi+1i=0,1, . . .
Sol ing hese equa ions unde he cons ain ha ∑∞
i=0πSi=1 gi e:
πSi=ρi(1 −ρ)i=0,1, . . . (13)
wi h ρ=β(1−α)
α(1−β)∈(0,1). So (13) is he s a iona y dis ibu ion o he Ma ko chain on S, which hus is posi i e ecu en .
Using hese esul s, we can easily conclude ha he Ma ko chain on SAis also i educible. Then, since he a i al p ocess
20
A. Cadas, J. Doncel and A. Bušić Pe o mance E alua ion 154 (2022) 102286
Fig. 6. The g aph associa ed o he Ma ko chain de i ed om u∞
and de ined on he s a e space S.pe(1,1) =αβ,pe(2,2) =(1 −α)(1 −β),
pe(1,2) =α(1 −β) and pe(2,1) =β(1 −α).
does no depend on he s a e and because he ollowing condi ion mus be sa is ied πSi=π(si−1,e(2,1))+π(si,e(1,1))+π(si,e(2,2))+
π(si+1,e(1,2)), we can hink o he ollowing as he s a iona y dis ibu ion:
π(si,a)=ρi(1 −ρ)pa(14)
wi h paas de ined in Fig. 6 (i.e. pe(1,1) =αβ,pe(2,2) =(1 −α)(1 −β), pe(1,2) =α(1 −β) and pe(2,1) =β(1 −α)), o all i∈N
and a∈A. Le us e i y ha (14) is indeed a s a iona y dis ibu ion:
∑
k∈N∑
a∈A
π(sk,a)p((sk,a),(si,a′)) =pa′(π(si−1,e(2,1))+π(si,e(1,1))+π(si,e(2,2))+π(si+1,e(1,2)))
=pa′(ρi−1(1 −ρ)pe(2,1) +ρi(1 −ρ)pe(1,1) +ρi(1 −ρ)pe(2,2) +ρi+1(1 −ρ)pe(1,2) )
=pa′ρi(1 −ρ)(β(1 −α)
ρ+αβ +(1 −α)(1 −β)+ρα(1 −β))
=pa′ρi(1 −ρ)
=π(si,a′)
Hence, ∑i∈N∑a∈Aπ(si,a)=∑i∈Nρi(1 −ρ)∑a∈Apa=∑i∈Nρi(1 −ρ)=1 and, he e o e, (14) is he s a iona y
dis ibu ion o he Ma ko chain de i ed om he policy u∞
and his Ma ko chain is posi i e ecu en . Using he
mono one con e gence heo em and he s ong law o la ge numbe o Ma ko chains, we can compu e he a e age
cos gu∞
:
gu∞
(y)=lim
N→∞
1
N
N−1
∑
n=0
Eu∞
y[c(Y(n))] = Eπ[c(Y)]
whe e Eπmeans he expec a ion o e he s a iona y dis ibu ion πde ined as (14). Finally, we compu e his alue:
Eπ[c(Y)] =
∑
i=1∑
a∈A
c(s −i+a)π(s −i,a)+∑
i∈N∑
a∈A
c(s +i+a)π(s +i,a)
=
∑
i=1
((cd1+cs2)i+cd1+cs1)ρ −i(1 −ρ)αβ +((cd1+cs2)i+cd2+cs1)ρ −i(1 −ρ)(1 −α)β
+((cd1+cs2)i+cd1+cs2)ρ −i(1 −ρ)α(1 −β)+((cd1+cs2)i+cd2+cs2)ρ −i(1 −ρ)(1 −α)(1 −β)
+∑
i∈N
((cd2+cs1)i+cd1+cs1)ρ +i(1 −ρ)αβ +((cd2+cs1)i+cd2+cs1)ρ +i(1 −ρ)(1 −α)β
+((cd2+cs1)i+cd1+cs2)ρ +i(1 −ρ)α(1 −β)+((cd2+cs1)i+cd2+cs2)ρ +i(1 −ρ)(1 −α)(1 −β)
=
∑
i=1
(cd1+cs2)iρ −i(1 −ρ)+(cd1+cs1)ρ −i(1 −ρ)αβ +(cd2+cs1)ρ −i(1 −ρ)(1 −α)β
+(cd1+cs2)ρ −i(1 −ρ)α(1 −β)+(cd2+cs2)ρ −i(1 −ρ)(1 −α)(1 −β)
+∑
i∈N
(cd2+cs1)iρ +i(1 −ρ)+(cd2+cs1)ρ +i(1 −ρ)αβ +(cd2+cs1)ρ +i(1 −ρ)(1 −α)β
+(cd2+cs1)ρ +i(1 −ρ)α(1 −β)+(cd2+cs2)ρ +i(1 −ρ)(1 −α) (15)
Using basic algeb a, one can show ha , o any c, he ollowing p ope ies hold:
∑
i=1
c·i·ρ −i=c( −ρ1−ρ
1−ρ)and ∑
i∈N
c·i·ρ −i=cρ +1
1−ρ.
21
A. Cadas, J. Doncel and A. Bušić Pe o mance E alua ion 154 (2022) 102286
Mo eo e , o any cand q, we ha e ha
q·c·(1 −ρ)
∑
i=1
ρ −i=q·c·(1 −ρ ) and q·c·(1 −ρ)∑
i∈N
ρ +i=q·c·ρ .
Using hese p ope ies in (15), we ob ain ha
Eπ[c(Y)] = (cd1+cs2)( −ρ1−ρ
1−ρ)+(1 −ρ )((cd1+cs1)αβ +(cd1+cs2)α(1 −β)
+(cd2+cs1)α(1 −β)+(cd2+cs2)(1 −α)(1 −β)) +(cd2+cs1)ρ +1
1−ρ+ρ ((cd1+cs1)αβ
+(cd1+cs2)α(1 −β)+(cd2+cs1)α(1 −β)+(cd2+cs2)(1 −α)(1 −β))
=(cd1+cs2) +(cd1+cd2+cs1+cs2)ρ +1
1−ρ−(cd1+cd2)ρ
1−ρ+((cd1+cs1)αβ
+(cd1+cs2)α(1 −β)+(cd2+cs1)α(1 −β)+(cd2+cs2)(1 −α)(1 −β)).(16)
We aim o ob ain he alue o ha minimizes (16). To his end, we compu e he second de i a i e o (16) wi h espec
o and i esul s in
(cd1+cd2+cs1+cs2)ρ +1
1−ρ(log ρ)2,
which is posi i e o ρ∈(0,1), i.e., (16) is con ex. As a consequence, he minimum o (16) is achie ed when i s de i a i e
wi h espec o is equal o ze o:
cd1+cs2+(cd1+cd2+cs1+cs2)ρ +1
1−ρ(log ρ)=0⇐⇒ 1+(1 +R)ρ +1
1−ρ(log ρ)=0,
whe e R=cs1+cd2
cd1+cs2
. The oo o he p e ious equa ion is
0=1
log ρlog (ρ−1
(R+1) log ρ)−1.
Since 0is no necessa ily in ege , he op imal h eshold ∗is ob ained by compu ing he alue o (16) o he ceil o
0and he loo o 0and choosing he minimum o hese alues. □
We now show ha ∗is always posi i e.
Lemma 10. ∗is always posi i e.
P oo . F om Taylo ’s Theo em, i ollows ha o all ρ∈(0,1)
1−ρ
(1 +R)log(ρ)∈(0,1
R+1).
Since R>0, we ha e ha o all ρ∈(0,1)
log (1−ρ
(1 +R)log(ρ))<0.
Hence,
log (1−ρ
(1+R))
log(ρ)>0.
As a esul , ∗is posi i e since 0is posi i e.
Appendix F. P oo o Lemma 4
Le 0 ≤θ < 1, le π∗=(u(X(n)))n≥0(wi h u∈D∗as de ined in De ini ion 7) be a h eshold- ype policy wi h p io i y
in i∗and j∗and le (πN)∗=(uN(XN(n)))n≥0(wi h uN∈Dσas de ined in De ini ion 4) be a h eshold- ype policy wi h
p io i y in (1,1) and (2,2). Le y0=(q0,a0)∈Q×A, wi h yN
0=(pN
Q(q0),pN
A(a0)).
Fi s , le us no e ha (πN)∗is he ‘‘p ojec ion’’ o π∗on GNin he ollowing sense. Le x∈Q,u(x)∈D∗ he ma ching
o s a e x(u(x)∈Ux) unde he policy π∗and uN(pN
Q(x)) ∈Dσ he ma ching o s a e pN
Q(x) (uN(pN
Q(x)) ∈UN
pN
Q(x))
unde he policy (πN)∗. We can easily see ha uN
(1,1)(pN
Q(x)) =∑i∈D(j∗)u(i,j∗)(x), uN
(2,2)(pN
Q(x)) =∑j∈S(i∗)u(i∗,j)(x) and
uN
(1,2)(pN
Q(x)) =∑i∈D(j∗)∑j∈S(i∗)u(i,j)(x).
22
A. Cadas, J. Doncel and A. Bušić Pe o mance E alua ion 154 (2022) 102286
Then, we a e going o p o e by induc ion ha o any aN
1∈AN,...,aN
n∈ANand any a1∈(pN
A)−1(aN
1),...,an∈
(pN
A)−1(aN
n), we ha e pN
Q(xn)=xN
n.
We al eady speci ically chose xN
0 o e i y his p ope y: pN
Q(x0)=pN
Q(q0)+pN
A(a0)=qN
0+aN
0=xN
0. Now, assume ha
pN
Q(xn−1)=xN
n−1. Then,
pN
Q(qn)=⎛
⎝∑
i∈D(j∗)
(qn)di,(qn)di∗,(qn)sj∗,∑
j∈S(i∗)
(qn)sj⎞
⎠
=⎛
⎝∑
i∈D(j∗)
(xn−1−u(xn−1))di,(xn−1−u(xn−1))di∗,
(xn−1−u(xn−1))sj∗,∑
j∈S(i∗)
(xn−1−u(xn−1))sj⎞
⎠
=⎛
⎝∑
i∈D(j∗)
(xn−1)di−∑
j∈S(i)
u(xn−1)(i,j),
(xn−1)di∗−∑
j∈S(i∗)
u(xn−1)(i∗,j),(xn−1)sj∗−∑
i∈D(j∗)
u(xn−1)(i,j∗),
∑
j∈S(i∗)
(xn−1)sj−∑
i∈D(j)
u(xn−1)(i,j)⎞
⎠
=pN
Q(xn−1)−∑
i∈D(j∗)
u(xn−1)(i,j∗)e(1,1) −∑
j∈S(i∗)
u(xn−1)(i∗,j)e(2,2)
−∑
i∈D(j∗)∑
j∈S(i∗)
u(xn−1)(i,j)e(1,2)
=pN
Q(xn−1)−uN(pN
Q(xn−1))(1,1)e(1,1) −uN(pN
Q(xn−1))(2,2)e(2,2)
−uN(pN
Q(xn−1))(1,2)e(1,2)
=pN
Q(xn−1)−uN(pN
Q(xn−1))
=xN
n−1−uN(xN
n−1)
=qN
n
and pN
A(an)=aN
n(because pN
A((pN
A)−1(aN)) =aN o all aN∈AN). Thus, pN
Q(xn)=pN
Q(qn)+pN
A(an)=qN
n+aN
n=xN
n. Finally,
using his p ope y and (5), we ha e
Eπ∗
y0[c(Y(n))] = ∑
a1∈A,...,an∈A
c(xn)
n
∏
k=1
P(A=ak)
=∑
aN
1∈AN,...,aN
n∈AN∑
a1∈(pN
A)−1(aN
1)··· ∑
an∈(pN
A)−1(aN
n)
c(xn)
n
∏
k=1
P(A=ak)
=∑
aN
1∈AN,...,aN
n∈AN∑
a1∈(pN
A)−1(aN
1)··· ∑
an∈(pN
A)−1(aN
n)
cN(pN
Q(xn))
n
∏
k=1
P(A=ak)
=∑
aN
1∈AN,...,aN
n∈AN∑
a1∈(pN
A)−1(aN
1)··· ∑
an∈(pN
A)−1(aN
n)
cN(xN
n)
n
∏
k=1
P(A=ak)
=∑
aN
1∈AN,...,aN
n∈AN
cN(xN
n)
n
∏
k=1∑
ak∈(pN
A)−1(aN
k)
P(A=ak)
23
A. Cadas, J. Doncel and A. Bušić Pe o mance E alua ion 154 (2022) 102286
=∑
aN
1∈AN,...,aN
n∈AN
cN(xN
n)
n
∏
k=1
P(AN=aN
k)
=E(πN)∗
yN
0[cN(YN(n))].
This equali y is ue o any n∈N. The e o e, π∗
θ(y0)= (πN)∗
θ(yN
0) and gπ∗(y0)=g(πN)∗(yN
0). This was done o any
y0=(q0,a0)∈Q×Awi h yN
0=(pN
Q(q0),pN
A(a0)), gi ing he inal esul .
Appendix G. P oo o Lemma 5
P oo . Le 0 ≤θ < 1, le πbe a s a iona y policy on Y,y0=(q0,a0)∈Q×Aand yN
0=(pN
Q(q0),pN
A(a0)). We s a by
cons uc ing a his o y dependen policy πN=(uN
n)n≥0on YN ha will make YN‘‘ ollow’’ (in some sense) he p ojec ion
o Yon GN. Fi s , le us in oduce new independen andom a iables ˆ
A(n) ha we sample jus a e he a i als on GN.
ˆ
A(n) is de ined on Awi h he ollowing dis ibu ion: ∀n∈N∗,∀a∈A,∀aN∈AN,
P(ˆ
A(n)=a|AN(n)=aN)={P(A(n)=a)
P(AN(n)=aN)i a ∈(pN
A)−1(aN)
0O he wise .
Then, we de ine uN
n he decision ule o πNa ime nbased on he his o y o he ajec o y, i.e.
(AN(1),ˆ
A(1),...,AN(n),ˆ
A(n)) =(aN
1,ˆ
aN
1,...,aN
n,ˆ
aN
n),
he ini ial s a e y0=(q0,a0) and he s a iona y policy π. Le ˆ
xn∈Qbe he s a e we end up by s a ing in x0=q0+a0
and ollowing he dynamics (2) wi h he sequence o a i als ˆ
aN
1,...,ˆ
aN
nand unde he policy π. Le u∈Uˆ
xnbe he
decision ule applied o he s a e ˆ
xnunde he policy π. We cons uc uN
n∈UpN
Q(ˆ
xn)such ha (uN
n)(1,1) =∑i∈D(j∗)u(i,j∗),
(uN
n)(2,2) =∑j∈S(i∗)u(i∗,j)and (uN
n)(1,2) =∑i∈D(j∗)∑j∈S(i∗)u(i,j).
Now, le us show by induc ion ha , unde πN, we ha e pN
Q(xn)=xN
n o any aN
1∈AN,...,aN
n∈ANand any
ˆ
a1∈(pN
A)−1(aN
1),...,ˆ
an∈(pN
A)−1(aN
n) such ha ˆ
a1=a1,...,ˆ
an=an.
We al eady speci ically chose yN
0 o e i y his p ope y: pN
Q(x0)=pN
Q(q0)+pN
A(a0)=qN
0+aN
0=xN
0. Now, assume ha
pN
Q(xn−1)=xN
n−1. Fi s , le us no e ha xn−1=ˆ
xn−1because ˆ
a1=a1,...,ˆ
an−1=an−1and hey bo h ollow he same
dynamics unde he same policy π. Then,
pN
Q(qn)=⎛
⎝∑
i∈D(j∗)
(qn)di,(qn)di∗,(qn)sj∗,∑
j∈S(i∗)
(qn)sj⎞
⎠
=⎛
⎝∑
i∈D(j∗)
(xn−1−u(xn−1))di,(xn−1−u(xn−1))di∗
,(xn−1−u(xn−1))sj∗,∑
j∈S(i∗)
(xn−1−u(xn−1))sj⎞
⎠
=⎛
⎝∑
i∈D(j∗)
(xn−1)di−∑
j∈S(i)
u(xn−1)(i,j),
(xn−1)di∗−∑
j∈S(i∗)
u(xn−1)(i∗,j),(xn−1)sj∗−∑
i∈D(j∗)
u(xn−1)(i,j∗),
∑
j∈S(i∗)
(xn−1)sj−∑
i∈D(j)
u(xn−1)(i,j)⎞
⎠
=pN
Q(xn−1)−∑
i∈D(j∗)
u(xn−1)(i,j∗)e(1,1) −∑
j∈S(i∗)
u(xn−1)(i∗,j)e(2,2)
−∑
i∈D(j∗)∑
j∈S(i∗)
u(xn−1)(i,j)e(1,2)
24
A. Cadas, J. Doncel and A. Bušić Pe o mance E alua ion 154 (2022) 102286
=pN
Q(xn−1)−∑
i∈D(j∗)
u(ˆ
xn−1)(i,j∗)e(1,1) −∑
j∈S(i∗)
u(ˆ
xn−1)(i∗,j)e(2,2)
−∑
i∈D(j∗)∑
j∈S(i∗)
u(ˆ
xn−1)(i,j)e(1,2)
=pN
Q(xn−1)−(uN
n)(1,1)e(1,1) −(uN
n)(2,2)e(2,2) −(uN
n)(1,2)e(1,2)
=xN
n−1−uN
n
=qN
n
and pN
A(an)=aN
n(because pN
A((pN
A)−1(aN)) =aN o all aN∈AN). Thus, pN
Q(xn)=pN
Q(qn)+pN
A(an)=qN
n+aN
n=xN
n.
Then, using his p ope y and (5), we ha e
Eπ
y0[c(Y(n))] = ∑
a1∈A,...,an∈A
c(xn)
n
∏
k=1
P(A(k)=ak)
=∑
aN
1∈AN,...,aN
n∈AN∑
a1∈(pN
A)−1(aN
1)··· ∑
an∈(pN
A)−1(aN
n)
c(xn)
n
∏
k=1
P(A(k)=ak)
=∑
aN
1∈AN,...,aN
n∈AN∑
a1∈(pN
A)−1(aN
1)··· ∑
an∈(pN
A)−1(aN
n)
cN(pN
Q(xn))
n
∏
k=1
P(A(k)=ak)
=∑
aN
1∈AN,...,aN
n∈AN∑
ˆ
a1∈(pN
A)−1(aN
1)··· ∑
ˆ
an∈(pN
A)−1(aN
n)
cN(xN
n)
n
∏
k=1
P(A(k)=ˆ
ak)
=∑
aN
1∈AN,...,aN
n∈AN∑
ˆ
a1∈(pN
A)−1(aN
1)··· ∑
ˆ
an∈(pN
A)−1(aN
n)
cN(xN
n)
n
∏
k=1
P(ˆ
A(k)=ˆ
ak|AN(k)=aN
k)P(AN(k)=aN
k)
=∑
aN
1∈AN,...,aN
n∈AN∑
ˆ
a1∈(pN
A)−1(aN
1)··· ∑
ˆ
an∈(pN
A)−1(aN
n)
cN(xN
n)
n
∏
k=1
P(AN(k)=aN
k,ˆ
A(k)=ˆ
ak)
=EπN
yN
0[cN(YN(n))].
This equali y is ue o any n∈N. The e o e, π
θ(y0)= πN
θ(yN
0) and gπ(y0)=gπN(yN
0). □
Appendix H. P oo o P oposi ion 4
Le q∈Q,a∈A,x=q+aand u∈Ux. We de ine x(i,j)=min(xdi,xsj) o all (i,j)∈E o ease he no a ions. We assume
ha he ma ching g aph has mpendan edges, i.e., E∗= {(i1,j1),(i2,j2),...,(im,jm)}. Fo k=1,...,m, we de ine
pk=min(x(ik,jk)−u(ik,jk),∑
(i,j)∈N((ik,jk))
u(i,j)).
We now obse e ha , o all (i,j)∈N((ik,jk)), we de ine 0 ≤pi,j≤u(i,j)such ha pk=∑(i,j)∈N((ik,jk)) pi,j. Hence, we
de ine
u′=u+
m
∑
k=1
⎛
⎝pke(ik,jk)−∑
(i,j)∈N((ik,jk))
pi,je(i,j)⎞
⎠.
We assume ha u′∈Ux. Since ∈Vσ, i ollows ha
Lθ
u′ (q,a)≤Lθ
u (q,a).(17)
We now de ine mk=x(ik,jk)−u′
(ik,jk) o all k=1,...,mand u∗=u′+∑m
k=1mke(ik,jk).We assume ha u∗∈Ux. Using
ha ∈Vσ, we ha e ha Lθ
u∗ (q,a)≤Lθ
u′ (q,a),and, aking in o accoun (17), i ollows ha Lθ
u∗ (q,a)≤Lθ
u (q,a),
which holds o any u∈Uxand he e o e also o u∈a g min Lθ (q,a). As a esul , u∗∈a g min Lθ (q,a). Besides, o any
pendan edge (ik,jk), we ha e ha
u∗
(ik,jk)=u′
(ik,jk)+x(ik,jk)−u′
(ik,jk)=x(ik,jk),
and he desi ed esul ollows i we show ha u′∈Uxand u∗∈Ux.
25