scieee Science in your language
[en] (orig)

Reinforcement Learning-Based Control for Robotic Flexible Element Disassembly

Author: Tapia Sal Paz, Benjamín,Sorrosal Yarritu, Gorka,Mancisidor Barinagarrementeria, Aitziber,Calleja Elcoro, Carlos,Cabanes Axpe, Itziar
Publisher: MDPI
Year: 2025
DOI: 10.3390/math13071120
Source: https://addi.ehu.eus/bitstream/10810/75086/1/mathematics-13-01120-v2.pdf
Academic Edi o : Yujiong Liu
Recei ed: 11 Ma ch 2025
Re ised: 24 Ma ch 2025
Accep ed: 27 Ma ch 2025
Published: 28 Ma ch 2025
Ci a ion: Tapia Sal Paz, B.; So osal,
G.; Mancisido , A.; Calleja, C.;
Cabanes, I. Rein o cemen
Lea ning-Based Con ol o Robo ic
Flexible Elemen Disassembly.
Ma hema ics 2025,13, 1120. h ps://
doi.o g/10.3390/ma h13071120
Copy igh : © 2025 by he au ho s.
Licensee MDPI, Basel, Swi ze land.
This a icle is an open access a icle
dis ibu ed unde he e ms and
condi ions o he C ea i e Commons
A ibu ion (CC BY) license
(h ps://c ea i ecommons.o g/
licenses/by/4.0/).
A icle
Rein o cemen Lea ning-Based Con ol o Robo ic Flexible
Elemen Disassembly
Benjamín Tapia Sal Paz 1,2,* , Go ka So osal 1, Ai zibe Mancisido 2, Ca los Calleja 1and I zia Cabanes 2
1Ike lan Technology Resea ch Cen e, Basque Resea ch and Technology Alliance (BRTA),
20500 A asa e, Spain; [email p o ec ed] (G.S.); [email p o ec ed] (C.C.)
2Depa men o Au oma ic Con ol and Sys em Enginee ing, Bilbao School o Enginee ing,
Uni e si y o he Basque Coun y (UPV/EHU), 48013 Bilbao, Spain; ai zibe [email p o ec ed] (A.M.);
i zia [email p o ec ed] (I.C.)
*Co espondence: [email p o ec ed]
Abs ac : Disassembly plays a i al ole in sus ainable manu ac u ing and ecycling p o-
cesses, acili a ing he eco e y and euse o aluable componen s. Howe e , au oma ing
disassembly, especially o lexible elemen s such as cables and ubbe seals, poses sig-
ni ican challenges due o hei nonlinea beha io and dynamic p ope ies. T adi ional
con ol sys ems s uggle o handle hese asks e icien ly, equi ing adap able solu ions
ha can ope a e in uns uc u ed en i onmen s ha p o ide online adap a ion. This pape
p esen s a ein o cemen lea ning (RL)-based con ol s a egy o he obo ic disassembly o
lexible elemen s. The p oposed me hod ocuses on low-le el con ol, in which he p ecise
manipula ion o he obo is essen ial o minimize o ce and a oid damage du ing ex ac-
ion. An adap i e ewa d unc ion is ailo ed o accoun o a ying ma e ial p ope ies,
ensu ing obus pe o mance ac oss di e en ope a ional scena ios. The RL-based ap-
p oach is e alua ed in a simula ion using so ac o –c i ic (SAC), deep de e minis ic policy
g adien (DDPG), and p oximal policy op imiza ion (PPO) algo i hms, benchma king hei
e ec i eness in dynamic en i onmen s. The expe imen al esul s indica e he sa is ac o y
pe o mance o he obo unde ope a ional condi ions, achie ing an adequa e success a e
and o ce minimiza ion. No ably, he e is a leas a 20% educ ion in o ce compa ed o
adi ional planning me hods. The adap i e ewa d unc ion u he enhances he abili y
o he obo ic sys em o gene alize ac oss a ange o lexible elemen disassembly asks,
making i a p omising solu ion o eal-wo ld applica ions.
Keywo ds: in elligen con ol; obo ic con ol; decision-making; ein o cemen lea ning
(RL); obo ic disassembly
MSC: 68T05
1. In oduc ion
Disassembly is a c i ical s age in he li ecycle managemen o p oduc s spanning in-
dus ies such as elec onics, au omo i e, and household appliances. As global indus ies
inc easingly p io i ize sus ainabili y, disassembly has eme ged as a key enable o epai ,
ecycling, and epu posing ini ia i es, aligned wi h he p inciples o he ci cula econ-
omy [
1
]. By eco e ing aluable ma e ials and componen s, disassembly educes was e and
suppo s he ein eg a ion o pa s in o manu ac u ing p ocesses. Howe e , au oma ing
disassembly emains a o midable challenge due o he inhe en complexi y, a iabili y,
and unp edic abili y o he asks in ol ed [2–5]. Key challenges include he ollowing:
Ma hema ics 2025,13, 1120 h ps://doi.o g/10.3390/ma h13071120
Ma hema ics 2025,13, 1120 2 o 21
•
P oduc complexi y: Disassembly o en in ol es p oduc s wi h nume ous, in ica ely
connec ed componen s. The complexi y ises wi h he numbe o pa s and he
in icacy o hei connec ions, equi ing sophis ica ed handling o a oid damaging
aluable elemen s.
•
P oduc a iabili y: Va iabili y ac oss di e en p oduc s, o e en be ween di e en
e sions o he same p oduc , necessi a es highly adap able disassembly p ocesses.
T adi ional au oma ed sys ems s uggle o accommoda e his a iabili y wi hou
ex ensi e econ igu a ion.
•
Condi ion o Componen s: The condi ion o he componen s o a p oduc can a y
widely. Pa s may be damaged, wo n ou , o con amina ed, complica ing he disas-
sembly p ocess and equi ing adap able s a egies o e ec i ely handle hem.
Despi e hese challenges, manual disassembly emains widely used, as human ope a-
o s excel a managing di e se and unp edic able scena ios. Howe e , manual p ocesses
a e inhe en ly labo -in ensi e, ime-consuming, and cos ly, highligh ing he g owing need
o au oma ed solu ions. Robo s, wi h hei lexibili y and ad anced capabili ies, o e a
compelling al e na i e o adi ional au oma ion me hods [
4
,
6
]. Ye , con en ional obo ic
con ol echniques, which ely on p ede ined models and de e minis ic app oaches, o en
s uggle o accommoda e he complex in e ac ions be ween obo ic manipula o s and com-
ponen s, pa icula ly in uns uc u ed en i onmen s [
4
,
7
,
8
]. This limi a ion unde sco es he
necessi y o adap i e and in elligen con ol me hods capable o handling he dynamic and
uns uc u ed na u e o disassembly asks [9–11].
Rein o cemen lea ning has eme ged as a powe ul ool o obo ic con ol in complex
and dynamic en i onmen s, pa icula ly in physical in e ac ion asks. Unlike adi ional
me hods, RL enables obo s o lea n op imal policies h ough ial and e o , elimina ing
he need o p ecise physical models. This capabili y is especially bene icial in disassembly
asks in ol ing lexible elemen s such as cables, seals, and ubbe componen s, which
exhibi nonlinea and unp edic able beha io s ha a e di icul (i no impossible) o
model accu a ely. Recen ad ancemen s in RL ha e demons a ed i s e ec i eness in
sol ing high-dimensional con ol p oblems, making i well sui ed o physical in e ac ion
asks ha demand bo h p ecision and adap abili y [
9
,
12
–
18
]. Howe e , despi e hese
ad ancemen s, he applica ion o RL in obo ic disassembly (pa icula ly o handling
lexible elemen s) emains unde explo ed [
19
–
21
]. Add essing his gap is c ucial, as i
p esen s unique challenges ha mus be o e come o enable mo e e icien and au onomous
disassembly p ocesses.
This s udy add essed his gap by p oposing an RL-based con ol s a egy o he
obo ic disassembly o lexible elemen s. The p ima y objec i e was o de elop a sys em
capable o adap ing o he dynamic and nonlinea beha io s o lexible ma e ials while
minimizing he in e ac ion o ces o p e en damage. This s udy ocused on low-le el
con ol, in which he obo in e ac s di ec ly wi h unknown lexible elemen s, making
on- he- ly adjus men s o ensu e sa e and e icien ex ac ion. The key con ibu ions o his
s udy a e as ollows.
1.
RL-based con ol s a egy: he design and implemen a ion o an RL-based con ol
s a egy ailo ed o he disassembly o lexible elemen s, emphasizing o ce minimiza-
ion and adap abili y.
2.
Adap i e ewa d unc ion: he in oduc ion o an adap i e ewa d unc ion ha
no malizes ask complexi y based on ma e ial p ope ies, ensu ing consis en pe o -
mance ac oss a ying elas ici ies.
3.
Algo i hm compa ison: A compa a i e analysis o s a e-o - he-a RL algo i hms (SAC
DDPG, and PPO) o e alua e hei e ec i eness in dynamic disassembly en i onmen s.
By benchma king hese algo i hms, his wo k p o ides p ac ical insigh s in o hei
Ma hema ics 2025,13, 1120 3 o 21
applicabili y o eal-wo ld disassembly asks while also iden i ying key limi a ions,
such as challenges in gene alizing o he unseen di ec ion o ex ac ion scena ios.
4.
expe imen al alida ion: A comp ehensi e expe imen al e alua ion in a simula ed
en i onmen , demons a ing he abili y o gene alize ac oss di e en disassembly
scena ios and ma e ial cha ac e is ics.
This pape is o ganized as ollows. Sec ion 2p o ides a comp ehensi e e iew o
ela ed wo ks, ocusing on ad ancemen s in obo ic disassembly and RL-based con ol
s a egies. Sec ion 3p esen s he p oblem o mula ion (Sec ion 3.1) and elabo a es on
he design o he ewa d unc ion (Sec ion 3.2). In Sec ion 4, he expe imen al se up is
desc ibed in de ail, including he implemen a ion speci ics (Sec ion 4.1) and expe imen s
conduc ed (Sec ion 4.2). Sec ion 5analyses he expe imen al esul s and p o ides insigh s
and obse a ions. Finally, Sec ion 6concludes he pape by summa izing he indings and
discussing he exis ing challenges and po en ial a enues o u u e esea ch in his domain.
2. Rela ed Wo k
Robo ic disassembly has a ac ed signi ican a en ion as indus ies seek o au oma e
he eco e y and ecycling o aluable componen s om end-o -li e p oduc s. T adi ional
app oaches p ima ily ely on p ede ined sequences and de e minis ic con ol me hods. Fo
example, e . [
3
] explo ed he use o s uc u ed assembly da a o guide obo ic disassembly,
highligh ing challenges such as p oduc a iabili y and he need o adap able sys ems.
Howe e , hese me hods o en all sho in uns uc u ed en i onmen s, whe e p oduc
condi ions and con igu a ions exhibi high a iabili y [9].
To add ess hese limi a ions, ecen ad ancemen s ha e ocused on imp o ing lex-
ibili y and adap abili y in obo ic disassembly. Fo ins ance, e . [
4
] p oposed a hyb id
app oach ha combines ule-based me hods wi h machine lea ning o enhance sys em
adap abili y. Despi e his p og ess, he disassembly o lexible elemen s (such as cables
and so ma e ials) emains a majo challenge due o hei complex and unp edic able
in e ac ions [19–21].
T adi ional obo ic sys ems, p ima ily designed o manipula ing igid objec s, s ug-
gle wi h he complexi ies in oduced ia lexible elemen s. Va ious app oaches ha e been
explo ed o o e come his issue, anging om model-based con ol o da a-d i en ech-
niques [
5
]. Model-based me hods equi e p ecise physical models o lexible elemen s,
which a e o en di icul o ob ain and may lack gene alizabili y ac oss di e en ma e ials
and con igu a ions. In con as , da a-d i en app oaches, pa icula ly hose le e aging
a i icial in elligence (AI), ha e shown p omise in adap ing o he a iabili y o lexible
elemen s by aining on la ge da ase s.
Se e al s udies ha e ocused on speci ic asks wi hin lexible objec manipula ion, such
as cable ou ing and so objec g asping. Fo ins ance, e . [
22
] de eloped a me hod o cable
ou ing ha combines isual eedback wi h machine lea ning, enabling obo s o adap o
di e se cable ypes and ou ing pa hs. Despi e hese ad ancemen s, he applica ion o such
echniques o disassembly emains limi ed, pa icula ly in scena ios whe e lexible elemen s
a e en angled wi h igid componen s o equi e p ecise manipula ion o p e en damage.
Lea ning-based echniques p o ide a p omising solu ion o hese challenges by o e -
ing he lexibili y and adap abili y equi ed o complex, dynamic applica ions. In obo ics,
wo p ima y me hodologies (lea ning om demons a ion (L D) and ein o cemen lea n-
ing) a e commonly employed. While L D is e ec i e when human demons a ions can
guide obo ic beha io , i is less sui able o disassembly asks, which a e highly a iable
and unp edic able. The uniqueness o each disassembly scena io makes i imp ac ical
o accoun o all possible cases h ough demons a ions alone. Consequen ly, a mo e
Ma hema ics 2025,13, 1120 4 o 21
au onomous app oach is needed—one ha enables obo s o na iga e and espond o ask
complexi ies wi hou ex ensi e human in e en ion.
Se e al app oaches a emp o mi iga e hese limi a ions, including gene a i e ad e -
sa ial imi a ion lea ning (GAIL) [
23
]. Howe e , GAIL is hea ily dependen on he quali y
and di e si y o expe demons a ions. When he es en i onmen de ia es om aining
examples, pe o mance de e io a es, equi ing addi ional aining o an expanded da ase .
Simila ly, in e se ein o cemen lea ning (IRL) de i es an expe ’s cos unc ion be o e
op imizing policies h ough ein o cemen lea ning. While e ec i e in some con ex s, IRL
is compu a ionally expensi e, s uggles wi h gene aliza ion, and o en equi es subs an ial
expe da a and dedica ed ha dwa e [24].
Fo complex and dynamic asks such as disassembly, RL p esen s a compelling al e na-
i e o me hods elian on p ede ined examples. RL allows obo ic agen s o lea n h ough
accumula ed expe ience, a he han explici demons a ions, enabling hem o adap o he
unp edic able na u e o disassembly asks [
21
,
25
]. In RL, he agen con inuously e ines
i s con ol policies based on en i onmen al eedback, op imizing pe o mance o e ime
h ough ial and e o [
26
]. This pa adigm is pa icula ly ad an ageous in disassembly
ope a ions, whe e p ecise adjus men s in o ce modula ion, compliance, and posi ioning
a e essen ial. By le e aging RL, obo ic sys ems can au onomously manage en i onmen al
a iabili y, e ec i ely add essing challenges ha adi ional con ol echniques and o he
lea ning-based app oaches s uggle o o e come.
Rein o cemen lea ning has p o en o be a powe ul ool o obo ic con ol in physical
in e ac ion asks, pa icula ly whe e con en ional me hods all sho due o scena io com-
plexi y and unp edic abili y [
21
,
25
]. By enabling obo s o lea n con ol policies h ough
di ec in e ac ion wi h he en i onmen , RL is especially well sui ed o asks in uns uc u ed
o dynamic se ings. The heo e ical ounda ions o RL, es ablished in [
26
], ha e since been
ex ended o a wide ange o obo ic applica ions.
In obo ic disassembly, RL has been applied in wo p ima y a eas: high-le el sequence
planning and low-le el con ol. High-le el planning ocuses on op imizing he o de o
disassembly ac ions, as demons a ed in [
27
], whe e RL was used o de e mine op imal se-
quences o elec onic de ice disassembly. This app oach p io i izes mac o-le el decisions,
such as minimizing he disassembly ime o maximizing ma e ial eco e y. Con e sely, low-
le el con ol in ol es he p ecise manipula ion o indi idual componen s. RL has shown
e ec i eness in ine mo o con ol asks such as g asping and manipula ion [
21
,
28
]. Fo
ins ance, e . [
29
] applied RL o he manipula ion o so objec s, ou pe o ming adi ional
con ol me hods in scena ios whe e objec beha io is di icul o model.
While signi ican p og ess has been made in bo h obo ic physical in e ac ion asks
and ein o cemen lea ning, hei in e sec ion emains unde explo ed. In pa icula , he
applica ion o RL o low-le el con ol in lexible elemen disassembly has no been com-
p ehensi ely in es iga ed. Mos exis ing s udies ocus on ela ed physical in e ac ion asks,
such as assembly, o emphasize high-le el planning, wi h ela i ely ew add essing he
unique challenges posed by low-le el con ol in lexible elemen disassembly [19–21].
No ably, p e ious esea ch has p ima ily ocused on he disassembly o igid objec s,
whe e p oblem o mula ion is simpli ied by se ing he objec i e o ce o ze o. How-
e e , hese s udies equen ly highligh he inabili y o handle lexible elemen s as a key
limi a ion. The need o apid adap a ion o a ying elas ic p ope ies and he complex
in e ac ions in ol ed in lexible elemen disassembly make RL a pa icula ly p omising ye
unde explo ed app oach.
This pape aims o b idge hese gaps by p oposing an RL-based con ol s a egy
speci ically designed o lexible elemen disassembly. Unlike p io s udies ocused on
high-le el planning o igid objec manipula ion, his esea ch emphasizes low-le el con ol,
Ma hema ics 2025,13, 1120 5 o 21
de eloping a obus and adap able sys em capable o handling he complexi ies inhe en in
lexible elemen disassembly.
3. P oblem Fo mula ion
The me hodology p oposed in his wo k le e ages ein o cemen lea ning o de elop
a obo ic con ol s a egy o disassembling lexible elemen s. This sec ion desc ibes he
p oblem o mula ion (Sec ion 3.1) and he design o he ewa d unc ion, including an
adap i e ewa d mechanism o handling a ying elas ici ies (Sec ion 3.2).
3.1. P oblem Fo mula ion
The disassembly ask is o mula ed as a Ma ko decision p ocess (MDP), whe e he
obo in e ac s wi h i s en i onmen o lea n an op imal con ol policy. The MDP is de ined
by he uple (O,A,P,R,γ), whe e he ollowing applies:
•Ois he s a e space, ep esen ing he obo ’s obse a ions o i s en i onmen .
•Ais he ac ion space, consis ing o he obo ’s possible mo emen s.
•Pis he ansi ion p obabili y unc ion, desc ibing he dynamics o he en i onmen .
•Ris he ewa d unc ion, p o iding eedback o he obo based on i s ac ions.
•γis he discoun ac o , balancing immedia e and u u e ewa ds.
The en i onmen in his s udy is de ined by he lexible elemen o be disassembled
and he s a e o he obo , which includes he posi ion o he end e ec o and applied
o ces. The RL agen ’s p ima y objec i e is o ex ac he lexible elemen while minimizing
applied o ces o p e en damage o bo h he en i onmen and he obo ic sys em. This
p oblem o mula ion is in en ionally designed o simplici y, ensu ing e icien lea ning
and p ac ical implemen a ion. Howe e , i is he ou come o ex ensi e e alua ions o mo e
complex ep esen a ions, which ul ima ely did no yield signi ican imp o emen s. The
a ionale behind hese design choices and hei implica ions will be u he explo ed in he
discussion sec ion.
3.1.1. S a e Space (O)
The s a e space,
O
, ep esen s he obo ’s obse a ions o i s en i onmen , which
include he ollowing:
• The posi ion o he end e ec o ela i e o he g asping poin (eex,eey,eez).
•
The Ca esian o ce exe ed ia he end e ec o
(Fee)
, compu ed as he Euclidean no m
o he o ce componen s:
Fee =qF2
xee +F2
yee +F2
zee .
•
The dis ance,
d
, be ween he end-e ec o posi ion
(eeposi ion)
and he g asping
poin (Gposi ion):
d=∥Gposi ion −eeposi ion∥.
These obse a ions a e c i ical o he obo o moni o i s p og ess, adjus i s ac ions,
and ensu e sa e and e icien disassembly. The s a e space is o mally de ined as ollows:
O −→
eex,eey,eez
Fee =
Fxee +Fyee +Fzee 

d=
Gposi ion −eeposi ion

(1)

Ma hema ics 2025,13, 1120 6 o 21
3.1.2. Ac ion Space (A)
The ac ion space
A
consis s o con inuous Ca esian mo emen s o he end e ec o
o he obo
(ax
,
ay
,
az)
while main aining a ixed o ien a ion. These ac ions allow o ine-
g ained con ol o e he mo emen s o he obo , enabling p ecise adjus men s du ing he
disassembly p ocess. The ac ion space is de ined as ollows:
A → ax,ay,az∈ ℜ[0, 0.05](2)
whe e he ange
[
0,0.05
]
ensu es ha he mo emen s o he obo a e inc emen al and
con olled, minimizing he isk o an excessi e o ce applica ion.
3.2. Rewa d Func ion Design
The ein o cemen lea ning agen ecei es eedback om he en i onmen h ough a
ewa d unc ion, which is undamen al in shaping he lea ning p ocess o he RL-based
con olle . The ewa d unc ion is designed o implici ly encode he ask objec i e by
assigning a nume ical alue o each s a e–ac ion pai . This alue quan i ies he immedia e
bene i o penal y associa ed wi h he chosen ac ion by he agen . The goal o he RL agen
is o lea n a policy ha maximizes he cumula i e ewa d o e ime, he eby op imizing
ask pe o mance.
In he con ex o lexible elemen disassembly, he ewa d unc ion
R
is o mula ed o
balance wo key ac o s: ask p og ess and o ce minimiza ion. The objec i e is o guide he
obo owa d e icien disassembly while minimizing physical in e ac ion o ces o p e en
damage o bo h he lexible elemen and he obo ic sys em. To achie e his, he p oposed
ewa d unc ion is de ined as ollows:
R=α×d−β×Fee2(3)
whe e he ollowing applies:
•
(
d
) ep esen s he p og ess made in he disassembly ask, measu ed using he dis ance
be ween he g asping poin and he cu en posi ion o he end-e ec o .
•
(
Fee
) deno es he physical in e ac ion o ces exe ed ia he obo , which should be
minimized o p e en damage o he lexible elemen s and ensu e sa e handling.
•
(
α
) and (
β
) a e ixed weigh ing coe icien s ha go e n he ade-o be ween ask
p og ess and o ce minimiza ion. These coe icien s de e mine he ela i e impo ance
o each objec i e in he ewa d unc ion, ensu ing a balanced op imiza ion s a egy.
The coe icien s
α
= 2.5 and
β
= 1 we e de e mined h ough a sys ema ic analysis o
p oduce a ewa d su ace (illus a ed in Figu e 1) ha aligns wi h he expec ed physical
beha io o he sys em. This analysis assumed a lexible elemen wi h an elas ic p ope y o
k
= 50 [N/m]. The ewa d unc ion is s uc u ed o e lec eal-wo ld disassembly scena ios,
whe e success ul ex ac ion ypically occu s when he end-e ec o eaches app oxima ely
0.3 m om he g asping poin . This o mula ion ensu es ha he ewa d unc ion e ec i ely
incen i izes bo h e iciency and sa e y in he disassembly p ocess.
Figu e 1shows he dis ibu ion o he ewa d unc ion wi hin a plane de ined by
he g asping poin and he p e e ed ex ac ion di ec ion. In he le sub igu e, highe
ewa d alues a e obse ed in egions aligned wi h he ex ac ion ajec o y, ein o cing he
impo ance o ollowing an op imal pa h. The igh sub igu e, which p esen s a pa allel
iew along he ex ac ion di ec ion, highligh s he exis ence o a peak ewa d poin along he
ex ac ion pa h. This peak co esponds o he loca ion whe e success ul disassembly occu s,
demons a ing how he ewa d unc ion di ec s he RL agen owa d op imal pe o mance.
Ma hema ics 2025,13, 1120 7 o 21
−0.2
−0.4
G asping
Poin
−0.5
−1.0
−1.5
−2.0
−2.5
−3.0
−0.4
−0.2
−0.4
−0.2
Y axis
Ex ac ion
Di ec ion Ex ac ion
Di ec ion
0.0 0.2 0.4 0.6 0.8 1.0
Figu e 1. Rewa d unc ion alue dis ibu ion conside ing a plane ha con ains he g asping poin
and he p e e ed di ec ion o ex ac ion. (Le ) shows he ewa d dis ibu ion in a plane ha includes
he g asping poin and he p e e ed ex ac ion di ec ion. (Righ ) p o ides a iew pa allel o he
ex ac ion di ec ion.
Adap i e Rewa d Func ion
The pe o mance o a PPO-based RL agen ained using he ixed ewa d unc ion
de ined in Equa ion (3) is illus a ed in Figu e 2. The agen was e alua ed using ou
lexible elemen s wi h dis inc elas ic p ope ies (
k
= 10, 50, 100, and 200 [N/m]). While
he agen success ully ex ac s he elemen wi h he elas ic p ope y used du ing aining
(k= 50 [N/m])
, i s pe o mance de e io a es when es ed on elemen s wi h di e en elas-
ici ies. Speci ically, o he elemen wi h
k
= 10 [N/m], he obo ’s end-e ec o emains
nea he g asping poin , ailing o comple e he ex ac ion. Con e sely, o elemen s wi h
highe elas ici ies (
k
= 100 and
k
= 200 [N/m]), he obo applies excessi e o ce e en
a e ex ac ion, inc easing he isk o damaging he elemen . These esul s highligh he
limi a ions o a ixed ewa d unc ion in handling ma e ials wi h a ying elas ic p ope ies.
Simila conclusions we e d awn when e alua ing he SAC and DDPG algo i hms.
Dis ance [m] Dis ance [m]
Figu e 2. Pe o mance o he ex ac ion ask o lexible elemen s wi h di e en elas ic p ope ies
using a PPO RL agen ained wi h he ixed ewa d unc ion in Equa ion (3). The agen was ained
on an elemen wi h k= 50 [N/m]. The plo demons a es ha he agen pe o ms e icien ly only o
he k= 50 [N/m] elemen while s uggling o adap o elemen s wi h o he elas ic p ope ies.
To add ess hese limi a ions, an adap i e ewa d unc ion is in oduced. This app oach
dynamically no malizes he o ce componen o he ewa d unc ion based on he elas ic
p ope ies o he elemen being disassembled in each episode. By inco po a ing he elas ic
p ope y o he ma e ial, he adap i e ewa d unc ion ensu es ha he RL agen can adjus
i s beha io o he speci ic cha ac e is ics o each lexible elemen , enabling consis en
lea ning and pe o mance ac oss a wide ange o ma e ials.
Ma hema ics 2025,13, 1120 8 o 21
The adap i e ewa d unc ion is de ined as ollows:
Rno m =2R − Rmin
Rmax − Rmin −1 (4)
whe e he ollowing applies:
•Ris he ewa d compu ed using Equa ion (3).
•Rmin
and
Rmax
a e he minimum and maximum expec ed ewa d alues o he
episode, es ima ed based on he elas ic cons an o he lexible elemen .
This no maliza ion scales he ewa d unc ion wi hin a ixed ange (
−
1, 1), ensu ing
ha he ewa d signal emains independen o he ma e ial’s elas ici y. Consequen ly, he
RL agen ecei es app op ia ely scaled eedback, ega dless o he elas ic p ope ies o
he elemen . Addi ionally, he weigh ing coe icien s
α
and
β
, which con ol he ade-o
be ween ask p og ess and o ce minimiza ion, a e dynamically adjus ed based on he
elas ic p ope ies o he lexible elemen . Speci ically,
α
is de ined as
α=k2
1.5
, whe e
k
ep esen s he elas ic cons an o he elemen , while
β
emains ixed a
β=
1. This dynamic
adap a ion ensu es ha he ewa d unc ion scales app op ia ely wi h he elas ic p ope y
o he ma e ial, p ese ing a consis en ewa d dis ibu ion, as illus a ed in Figu e 1.
By inco po a ing an adap i e ewa d unc ion, he sys em main ains obus pe o -
mance e en when aced wi h signi ican a ia ions in ma e ial p ope ies, which is a
common challenge in eal-wo ld disassembly asks. This enhancemen enables he RL
agen o gene alize mo e e ec i ely ac oss ma e ials wi h di e en elas ici ies, add essing
he sho comings o he ixed ewa d unc ion. As a esul , ask pe o mance imp o es
while educing he isk o damage o bo h he lexible elemen s and he obo ic sys em,
making he p oposed app oach mo e sui able o p ac ical applica ions.
4. Me hodology
This sec ion ou lines he expe imen al se up, p ocedu e, and e alua ion me ics used
o assess he pe o mance o he p oposed RL-based con ol s a egy o he disassembly o
lexible elemen s. The expe imen s we e conduc ed in a simula ed en i onmen designed
o eplica e eal-wo ld condi ions and challenges.
A key aspec o he me hodology, consis en wi h he p oblem o mula ion, is he
emphasis on main aining simplici y in bo h en i onmen design and he in o ma ion
equi ed o ask execu ion. This decision was mo i a ed by he p ima y objec i e o his
s udy: o e alua e whe he he p oposed app oach can e ec i ely handle ask unce ain ies
and adap o eal-wo ld condi ions. By educing en i onmen al and s a e-space complexi y,
he s udy aims o demons a e ha obus and adap i e con ol s a egies can be de eloped
e en wi h limi ed p io knowledge o highly simpli ied models. This app oach enhances
he p ac ical applicabili y o he solu ion while also p o iding insigh s in o he abili y o
he sys em o gene alize ac oss di e se and unp edic able scena ios.
4.1. Expe imen al Se up
Fo he implemen a ion o he p oposed con ol, his wo k selec s as a use case he
disassembly o sealing elemen s in e ige a o s (Figu e 3). This is a ep esen a i e ask
whe e he disassembly o hese elemen s equi es dynamic ac ions and he adap a ion o
he sys em acco ding o he cu en s a e o he lexible elemen .
Ma hema ics 2025,13, 1120 9 o 21
G ippe
P e e ed Ex ac ion
Di ec ion
Unce ain Ex ac ion
Di ec ion
Flexible Elemen
G asping Poin
In e ac ion
Fo ces
(b)
a)
(a)
Flexible Elemen
(c)
º
Figu e 3. Disassembly ask use case: Ex ac ion o he sealing elemen o a idge doo . (a) eal-
wo ld applica ion; (b) Expe imen al se up used o simula ion, whe e he g ippe eplica es he
a ached poin s in he eal applica ion. (c) Simula ion en i onmen used o eplica e he eal wo ld
in e ac ion o ces.
4.1.1. Simula ed En i onmen
This wo k uses a simula ed en i onmen o alida e he p oposed me hodology in
he disassembly ask. Fo ha , he whole sys em is implemen ed using he ROS 2 (Robo
Ope a ing Sys em) amewo k, simula ed in he Gazebo using he collabo a i e obo
KUKA LBR iiwa14 (KUKA, Augsbu g, Ge many), using a compu e wi h a p ocesso In el
i7 (In el, A lan a, GA, USA) and an N idia RTX 4080 g aphic ca d (GIGABYTE, Singapo e).
The KUKA LBR iiwa14 obo was chosen o i s ad anced kinema ic and dynamic capa-
bili ies, which a e essen ial o execu ing he p ecise and adap i e mo emen s equi ed in
lexible elemen disassembly asks. Addi ionally, he KUKA LBR iiwa14 is a collabo a i e
obo designed o ope a e sa ely alongside humans. This ea u e no only enhances i s
adap abili y bu also acili a es u u e in eg a ion in o human– obo wo kspaces, making i
a e sa ile choice o applica ions equi ing close collabo a ion.
The simula ion eplica ed he eal-wo ld disassembly scena io, as shown in Figu e 3,
whe e he se up ocused on eplica ing he physical and dynamic condi ions o he lexible
elemen ex ac ion. The main aspec s conside ed in he simula ions a e as ollows:
•
Kinema ics and dynamics: he simula ion includes he kinema ic and dynamic models
o he KUKA LBR iiwa14 obo , ensu ing ealis ic in e ac ion wi h he lexible elemen s.
•
Use case wo kspace: he wo kspace mimics he eal-wo ld se up, including he con-
s ain s and p e e ed ex ac ion di ec ion o he lexible elemen .
•
In e ac ion o ces: The o ces exe ed du ing ex ac ion a e simula ed using wo main
componen s; he eac ion o ce o he g ippe (
FG ippe
) and he lexible elemen ’s elas ic
o ce (
Felas ic
). These o ces a e modeled o eplica e he physical in e ac ions be ween
he obo and he lexible elemen du ing disassembly. Howe e , i is impo an o no e
ha he main sim- o- eal gaps a e expec ed in his aspec , as eal-wo ld condi ions may
in oduce addi ional complexi ies, such as unmodeled ic ion, ma e ial impe ec ions,
o dynamic pe u ba ions, which a e no ully cap u ed in he simula ion.
He e, he elas ic o ce is modeled as ollows:
Felas ic =Kelas ic ×d(5)
whe e
Kelas ic
is he elas ic cons an o he lexible elemen , and
d
is he dis ance o he
end-e ec o om he g asping poin (d=0→Felas ic =0).
And he g ippe eac ion o ce (FG ippe ) is modeled as ollows:
Ma hema ics 2025,13, 1120 16 o 21
Table 2. E alua ion o lea ned s a egies unde he combina ion o di e en en i onmen con igu a-
ions: ope a ion ange (O), s uc u ed con igu a ion (S), and unexplo ed con igu a ion (U).
E alua ion o Lea ned S a egies Unde Di e en En i onmen Con igu a ions.
Algo i hm
T aining
Fo ce (k)
T aining
Di ec ion
Tes Fo ce
(k)
Tes
Di ec ion
Mean
Rewa d
Success
Ra e
SAC S S S S 0.85 1.00
SAC O O S S 0.60 1.00
SAC O O O O 0.61 1.00
SAC O O U O 0.48 0.00
SAC O O O U −0.08 0.00
SAC O O U U −0.05 0.00
DDPG S S S S 0.75 1.00
DDPG O O S S 0.44 1.00
DDPG O O O O 0.44 1.00
DDPG O O U O 0.48 0.57
DDPG O O O U −0.25 0.00
DDPG O O U U −0.02 0.00
PPO S S S S 0.80 1.00
PPO O O S S 0.62 1.00
PPO O O O O 0.62 1.00
PPO O O U O 0.62 1.00
PPO O O O U −0.46 0.00
PPO O O U U −0.47 0.00
5.2.2. Adap abili y and Gene aliza ion
An in-dep h analysis was conduc ed using a ious combina ions o en i onmen al con-
igu a ions (Table 2). The s udy aimed o e alua e how e ec i ely hese agen s could manage
di e en condi ions, bo h wi hin hei aining ange and beyond hei aining scena ios.
The esul s in Table 2and Figu e 9show ha all agen s we e able o success ully
lea n he ask when ained and es ed wi hin hei ope a ional ange (
O
). As illus a ed in
Figu e 9, he agen s displayed he co ec ewa d pa e ns du ing episodes, consis en ly
achie ing a success a e o 1.00 ac oss all s uc u ed en i onmen con igu a ions. This
indica es ha he agen s, pa icula ly when dealing wi h amilia condi ions, we e able o
execu e he ask wi h comple e accu acy and eliabili y.
Figu e 9. Rewa d cu es o PPO, SAC, and DDPG algo i hms ained in he ope a ion ange (
O
) and
es ed in s uc u ed (
S
) and ope a ional (
O
) con igu a ions. The cu es show he ewa d p og ession
du ing es ing ac oss a a ie y o elas ic p ope ies and ex ac ion di ec ions wi hin he aining
ange. The h ee algo i hms show simila beha io s in he
O
ange. The shaded egions ep esen he
a iance ac oss mul iple (100) es uns.

Ma hema ics 2025,13, 1120 17 o 21
Howe e , when es ed in p e iously unexplo ed en i onmen s (
U
), he e was a no ice-
able d op in bo h success a e and ewa d alues, as shown in Figu e 10. This pe o mance
d op was pa icula ly e iden when he agen s encoun e ed new ex ac ion di ec ions ha
we e absen om he aining da a. In hese scena ios, all h ee agen s consis en ly ailed
o comple e he ask, as e lec ed in he nega i e ewa ds (app oaching
−
1, he lowes
possible alue due o no maliza ion) and a ze o success a e. On he o he hand, when aced
wi h unseen elas ic p ope ies o he lexible elemen s, he agen s we e able o pe o m
he ask, bu wi h a educed success a e and lowe ewa ds compa ed o amilia scena -
ios. This sugges s ha he agen s we e be e equipped o handle a ia ions in ma e ial
cha ac e is ics han d as ic changes in ask dynamics, such as ex ac ion di ec ion.
Figu e 10. Rewa d cu es o PPO, SAC, and DDPG algo i hms ained in he ope a ional ange
(
O
) and es ed in unexplo ed en i onmen con igu a ions (
U
). The cu es illus a e he agen s
pe o mance when es ed on elas ic p ope ies and ex ac ion di ec ions ou side he aining ange.
All algo i hms show simila pe o mances acing an unseen k bu signi ican pe o mance deg ada ion
in scena ios wi h un amilia ex ac ion di ec ions. The shaded egions ep esen he a iance ac oss
mul iple (100) es uns.
5.3. Discussion
The expe imen al esul s con i m he e ec i eness o he p oposed RL-based con ol
s a egy o lexible elemen disassembly. All h ee algo i hms (PPO, SAC, and DDPG)
main ained high success a es while minimizing he exe ed o ce, an essen ial ac o in
p ese ing he in eg i y o lexible componen s. This aspec is shown in he o ce signa u e
o lexible elemen ex ac ion o Figu e 8, whe e a 20% o ce educ ion is achie ed agains
o cible classical me hodologies. These indings highligh RL-based con ol as a p omising
solu ion o eal-wo ld obo ic disassembly.
The comp ehensi e e alua ion p esen ed in Table 2p o ides se e al key insigh s:
•
As expec ed, he agen pe o ms op imally when ained and es ed in s uc u ed
condi ions, achie ing he highes success a es.
•
In ope a ional condi ions, he agen also demons a es s ong pe o mance, achie ing
a pe ec success a e. This is a c ucial inding, as i alida es he p oposed app oach
and suppo s i s po en ial ans e o eal-wo ld expe imen s.
•
A signi ican obse a ion is ha he only cases whe e he agen ails o comple e he
ask in ol e unknown ex ac ion di ec ions. Howe e , when aced wi h unknown
elas ic p ope ies, he agen success ully adap s, demons a ing i s abili y o gene alize
ac oss di e en ma e ial condi ions.
•
The adap i e ewa d unc ion played a c ucial ole in his success, pa icula ly in
handling a ying elas ic p ope ies o lexible elemen s. The esul s indica e ha dy-
namically no malizing he o ce componen o he ewa d unc ion based on ma e ial
elas ici y allows he RL agen o gene alize e ec i ely ac oss di e se scena ios. By
Ma hema ics 2025,13, 1120 18 o 21
scaling he ewa d unc ion acco ding o he elas ici y (
k
) o he elemen , he sys em
ensu es consis en and meaning ul eedback, ega dless o ma e ial p ope ies.
•
These esul s we e achie ed despi e using simpli ied o ce models and s a e ep esen-
a ions, aligning wi h he objec i e o alida ing he use o simpli ica ions wi hou
comp omising ask success. Fu he mo e, since he agen success ully o e came hese
simpli ica ions h ough he adap i e ewa d unc ion, his inding sugges s ha he
sys em may also be capable o b idging he sim- o- eal gap and handling eal-wo ld
unce ain ies such as noise and unmodeled en i onmen al ac o s.
Fu u e esea ch will ocus on eal-wo ld es ing, speci ically on he disassembly o
e ige a o doo seals, o alida e he obus ness o he app oach beyond simula ion.
Among he e alua ed ein o cemen lea ning algo i hms, PPO eme ged as he mos p ac ical
o eal-wo ld implemen a ion, o e ing a balance be ween aining e iciency, s abili y, and
compu a ional demands. PPO exhibi ed as e and mo e s able con e gence compa ed o
SAC and DDPG, making i well sui ed o eal- ime disassembly asks. While SAC and
DDPG also pe o med well, hei highe compu a ional equi emen s may limi scalabili y
in p ac ical applica ions.
Despi e p omising esul s, ce ain limi a ions we e iden i ied. The pe o mance o
he RL agen de e io a ed in ex eme cases whe e he expec ed ex ac ion di ec ion a ied
signi ican ly. Howe e , i is impo an o emphasize ha hese limi a ions we e obse ed
only in highly ex eme scena ios. Wi hin he ope a ional ange, which encompasses a b oad
spec um o ealis ic con igu a ions, he agen pe o med e ec i ely.
Add essing hese challenges will be essen ial o de eloping mo e obus and gen-
e alizable RL agen s capable o handling a wide ange o disassembly asks. Fu u e
in es iga ions will ocus on enhancing he adap abili y o highly dynamic en i onmen s,
ul ima ely pa ing he way o RL-d i en au oma ion ac oss di e se indus ial applica ions.
6. Conclusions
This pape has p esen ed a ein o cemen lea ning-based con ol s a egy o he
obo ic disassembly o lexible elemen s, add essing key challenges inhe en o dynamic
and uns uc u ed en i onmen s. The app oach cen e ed on low-le el con ol, enabling he
obo ic sys em o lea n adap i e s a egies o ex ac ing lexible componen s, such as cables
and ubbe seals, h ough low- o ce ajec o ies. By u ilizing RL algo i hms, including
SAC, PPO, and DDPG, he s udy demons a ed signi ican ad ancemen s in adap abili y,
e iciency, and o e all ask pe o mance compa ed o adi ional con ol me hods.
The expe imen al esul s showcased he e ec i eness o he RL-based con ol app oach
in achie ing high success a es, consis en ly minimizing o ce exe ion, pa icula ly in
scena ios whe e p ede ined h esholds a e no possible, and main aining e icien op imized
ajec o ies ac oss a ange o ask condi ions. A key con ibu o o his success was he
adap i e ewa d unc ion, which allowed he RL agen s o main ain eliable pe o mance
when dealing wi h elemen s o a ying elas ic p ope ies.
Howe e , he s udy also iden i ied some limi a ions, pa icula ly in he abili y o he RL
agen o gene alize beyond hei ained ask con igu a ions, especially in highly complex
o un o eseen si ua ions. These limi a ions poin owa d u u e esea ch oppo uni ies,
including explo ing hyb id con ol s a egies ha combine RL wi h model-based echniques
and expanding he RL amewo k o mul i-agen sys ems, which could u he enhance
adap abili y and e iciency in complex disassembly scena ios.
Ma hema ics 2025,13, 1120 19 o 21
O e all, his esea ch ad ances in elligen obo ic disassembly by demons a ing he
po en ial o RL-based con ol o handle he complexi ies o lexible elemen disassembly
in unp edic able and dynamic en i onmen s, which is an exis ing gap in cu en obo ic
disassembly s udies [
19
–
21
]. The indings unde sco e he easibili y o au oma ing lexible
elemen disassembly using RL-based con olle s, con ibu ing o mo e sus ainable and
e icien manu ac u ing and ecycling p ac ices. Fu u e wo k will aim o u he e ine hese
con ol s a egies and expand hei applicabili y ac oss a b oade ange o disassembly
asks and eal-wo ld scena ios.
Au ho Con ibu ions: Concep ualiza ion, B.T.S.P., G.S. and A.M.; me hodology, B.T.S.P., G.S. and
A.M.; so wa e, B.T.S.P.; alida ion, B.T.S.P., G.S. and A.M.; o mal analysis, B.T.S.P., G.S. and
A.M.; in es iga ion, B.T.S.P.; esou ces, B.T.S.P., G.S., A.M., C.C. and I.C.; da a cu a ion, B.T.S.P.;
w i ing—o iginal d a p epa a ion, B.T.S.P.; w i ing— e iew and edi ing, B.T.S.P., G.S., A.M., C.C.
and I.C.; isualiza ion, B.T.S.P.; supe ision, G.S., C.C. and A.M.; p ojec adminis a ion, G.S., C.C.
and I.C.; unding acquisi ion, G.S., C.C. and I.C. All au ho s ha e ead and ag eed o he published
e sion o he manusc ip .
Funding: This p ojec was suppo ed by he Eu opean Union’s Ho izon 2020 esea ch and inno a ion
p og am unde Ma ie Sklodowska-Cu ie g an ag eemen No. 955681 and by membe s o he
Vi ual Senso iza ion Resea ch G oup om he Uni e si y o Basque Coun y (Basque Go e men
Re . IT1726-22).
Da a A ailabili y S a emen : The o iginal con ibu ions p esen ed in his s udy a e included in he
a icle. Fu he inqui ies can be di ec ed o he co esponding au ho .
Con lic s o In e es : The au ho s decla e no con lic s o in e es .
Abb e ia ions
The ollowing abb e ia ions a e used in his manusc ip :
RL Rein o cemen Lea ning
SAC So Ac o –C i ic
DDPG Deep De e minis ic Policy G adien
PPO P oximal Policy Op imiza ion
AI A i icial In elligence
L D Lea ning om Demons a ion
IRL In e se Rein o cemen Lea ning
MDP Ma ko Decision P ocess
ROS2 Robo Ope a ing Sys em
kModulus o Elas ici y
Re e ences
1.
Li, J.; Ba wood, M.; Rahimi a d, S. Robo ic disassembly o inc eased eco e y o s a egically impo an ma e ials om elec ical
ehicles. Robo . Compu .-In eg . Manu . 2018,50, 203–212. [C ossRe ]
2. Foo, G.; Ka a, S.; Pagnucco, M. Challenges o obo ic disassembly in p ac ice. P ocedia CIRP 2022,105, 513–518. [C ossRe ]
3.
Vongbunyong, S.; Ka a, S.; Pagnucco, M. Applica ion o cogni i e obo ics in disassembly o p oduc s. CIRP Ann.-Manu . Technol.
2013,62, 31–34. [C ossRe ]
4.
Poschmann, H.; B üggemann, H.; Goldmann, D. Disassembly 4.0: A Re iew on Using Robo ics in Disassembly Tasks as a Way o
Au oma ion. Chem. Ing.-Tech. 2020,92, 341–359. [C ossRe ]
5.
Li, F.; Jiang, Q.; Zhang, S.; Wei, M.; Song, R. Robo skill acquisi ion in assembly p ocess using deep ein o cemen lea ning.
Neu ocompu ing 2019,345, 92–102. [C ossRe ]
6.
Hjo h, S.; Ch ysos omou, D. Human– obo collabo a ion in indus ial en i onmen s: A li e a u e e iew on non-des uc i e
disassembly. Robo . Compu .-In eg . Manu . 2022,73, 102208. [C ossRe ]
7.
Wan, A.; Xu, J.; Chen, H.; Zhang, S.; Chen, K. Op imal Pa h Planning and Con ol o Assembly Robo s o Ha d-Measu ing
Easy-De o ma ion Assemblies. IEEE/ASME T ans. Mecha on. 2017,22, 1600–1609. [C ossRe ]
Ma hema ics 2025,13, 1120 20 o 21
8.
Schneide , D.; Schome , E.; Wolpe , N. A mo ion planning algo i hm o he in alid ini ial s a e disassembly p oblem. In P oceed-
ings o he MMAR: 2015 20 h In e na ional Con e ence on Me hods and Models in Au oma ion and Robo ics, Miedzyzd oje,
Poland, 24–27 Augus 2015; Ins i u e o Elec ical and Elec onics Enginee s: Miedzyzd oje, Poland, 2015; p. 839.
9.
Elguea-Aguinaco, Í.; Se ano-Muñoz, A.; Ch ysos omou, D.; Inzia e-Hidalgo, I.; Bøgh, S.; A ana-A exolaleiba, N. A e iew on
ein o cemen lea ning o con ac - ich obo ic manipula ion asks. Robo . Compu .-In eg . Manu . 2023,81, 102517. [C ossRe ]
10.
Duan, J.; Gan, Y.; Chen, M.; Dai, X. Adap i e a iable impedance con ol o dynamic con ac o ce acking in unce ain
en i onmen . Robo . Au on. Sys . 2018,102, 54–65. [C ossRe ]
11.
Wang, W.; Guo, Q.; Yang, Z.; Jiang, Y.; Xu, J. A s a e-o - he-a e iew on obo ic milling o complex pa s wi h high e iciency and
p ecision. Robo . Compu .-In eg . Manu . 2023,79, 102436. [C ossRe ]
12.
Ma ín-Ma ín, R.; Lee, M.A.; Ga dne , R.; Sa a ese, S.; Bohg, J.; Ga g, A. Va iable Impedance Con ol in End-E ec o Space: An
Ac ion Space o Rein o cemen Lea ning in Con ac -Rich Tasks. In P oceedings o he 2019 IEEE/RSJ In e na ional Con e ence
on In elligen Robo s and Sys ems (IROS), Macau, China, 3–8 No embe 2019.
13.
Schoe le , G.; Nai , A.; Luo, J.; Bahl, S.; Ojea, J.A.; Solowjow, E.; Le ine, S. Deep Rein o cemen Lea ning o Indus ial Inse ion
Tasks wi h Visual Inpu s and Na u al Rewa ds. a Xi 2019, a Xi :1906.05841. [C ossRe ]
14.
Zhang, H.; Wang, W.; Zhang, S.; Zhang, Y.; Zhou, J.; Wang, Z.; Huang, B.; Huang, R. A no el me hod based on deep ein o cemen
lea ning o machining p ocess ou e planning. Robo . Compu .-In eg . Manu . 2024,86, 102688. [C ossRe ]
15.
Engle , P.; Toussain , M. Lea ning manipula ion skills om a single demons a ion. In . J. Robo . Res. 2018,37, 137–154.
[C ossRe ]
16.
Le ine, S.; Wagene , N.; Abbeel, P. Lea ning Con ac -Rich Manipula ion Skills wi h Guided Policy Sea ch. a Xi 2015,
a Xi :1501.05611.
17.
Chebo a , Y.; Kalak ishnan, M.; Yahya, A.; Li, A.; Schaal, S.; Le ine, S. Pa h In eg al Guided Policy Sea ch. In P oceedings o he
2017 IEEE In e na ional Con e ence on Robo ics and Au oma ion (ICRA), Singapo e, 29 May–3 June 2018.
18.
Huang, Y.; Liu, D.; Liu, Z.; Wang, K.; Wang, Q.; Tan, J. A no el obo ic g asping me hod o mo ing objec s based on mul i-agen
deep ein o cemen lea ning. Robo . Compu .-In eg . Manu . 2024,86, 102644. [C ossRe ]
19. Qu, M.; Wang, Y.; Pham, D.T. Robo ic Disassembly Task T aining and Skill T ans e Using Rein o cemen Lea ning. IEEE T ans.
Ind. In o m. 2023,19, 10934–10943. . [C ossRe ]
20.
Qu, M.; Pham, D.T.; Al umi, F.; Gbadebo, A.; Ha ono, N.; Jiang, K.; Ke in, M.; Lan, F.; Micheli, M.; Xu, S.; e al. Robo ic
Disassembly Pla o m o Disassembly o a Plug-In Hyb id Elec ic Vehicle Ba e y: A Case S udy. Au oma ion 2024,5, 50–67.
[C ossRe ]
21.
Se ano-Muñoz, A.; A ana-A exolaleiba, N.; Ch ysos omou, D.; Bøgh, S. Lea ning and gene alising objec ex ac ion skill o
con ac - ich disassembly asks: An in oduc o y s udy. In . J. Ad . Manu . Technol. 2023,124, 3171–3183. [C ossRe ]
22.
Zhang, X.; Sun, L.; Kuang, Z.; Tomizuka, M. Lea ning Va iable Impedance Con ol ia In e se Rein o cemen Lea ning o
Fo ce-Rela ed Tasks. IEEE Robo . Au om. Le . 2021,6, 2225–2232. [C ossRe ]
23.
Ho, J.; E mon, S. Gene a i e Ad e sa ial Imi a ion Lea ning. In Ad ances in Neu al In o ma ion P ocessing Sys ems; Sp inge :
Be lin/Heidelbe g, Ge many, 2016.
24.
Zhao, T.Z.; Kuma , V.; Le ine, S.; Finn, C. Lea ning Fine-G ained Bimanual Manipula ion wi h Low-Cos Ha dwa e. a Xi 2023,
a Xi :2304.13705.
25.
Bel an-He nandez, C.C.; Pe i , D.; Rami ez-Alpiza , I.G.; Nishi, T.; Kikuchi, S.; Ma suba a, T.; Ha ada, K. Lea ning Fo ce Con ol
o Con ac -Rich Manipula ion Tasks wi h Rigid Posi ion-Con olled Robo s. IEEE Robo . Au om. Le . 2020,5, 5709–5716.
[C ossRe ]
26. Su on, R.S.; Ba o, A.G. Rein o cemen Lea ning: An In oduc ion, 2nd ed.; The MIT P ess: Camb idge, MA, USA, 2018; p. 526.
27.
Chen, H.; Liu, Y. Robo ic assembly au oma ion using obus complian con ol. Robo . Compu .-In eg . Manu . 2013,29, 293–300.
[C ossRe ]
28.
K is ensen, C.B.; Sø ensen, F.A.; Nielsen, H.B.; Ande sen, M.S.; Bend sen, S.P.; Bøgh, S. Towa ds a Robo Simula ion F amewo k
o E-Was e Disassembly Using Rein o cemen Lea ning; Else ie : Ams e dam, The Ne he lands, 2019; Volume 38, pp. 225–232.
[C ossRe ]
29.
K oeme , O.; Niekum, S.; Konida is, G. A Re iew o Robo Lea ning o Manipula ion: Challenges, Rep esen a ions, and
Algo i hms. J. Mach. Lea n. Res. 2021,22, 1–82.
30.
Tapia Sal Paz, B.; So osal, G.; Mancisido , A. Hyb id Robo ic Con ol o Flexible Elemen Disassembly. In P oceedings o he
Eu opean Robo ics Fo um 2024, Rimini, I aly, 13–15 Ma ch 2014; Secchi, C., Ma coni, L., Eds.; Sp inge : Cham, Swi ze land, 2024;
pp. 180–185.
31.
Lillic ap, T.P.; Hun , J.J.; P i zel, A.; Heess, N.; E ez, T.; Tassa, Y.; Sil e , D.; Wie s a, D. Con inuous con ol wi h deep
ein o cemen lea ning. a Xi 2015, a Xi :1509.02971.
Ma hema ics 2025,13, 1120 21 o 21
32.
Duan, Y.; Chen, X.; Hou hoo , R.; Schulman, J.; Abbeel, P. Benchma king Deep Rein o cemen Lea ning o Con inuous Con ol.
In P oceedings o he In e na ional Con e ence on Machine Lea ning, New Yo k, NY, USA, 20–22 June 2016.
33.
Akiba, T.; Sano, S.; Yanase, T.; Oh a, T.; Koyama, M. Op una: A Nex -gene a ion Hype pa ame e Op imiza ion F amewo k. In
P oceedings o he 25 h ACM SIGKDD In e na ional Con e ence on Knowledge Disco e y and Da a Mining, Ancho age, AK,
USA, 4–8 Augus 2019.
Disclaime /Publishe ’s No e: The s a emen s, opinions and da a con ained in all publica ions a e solely hose o he indi idual
au ho (s) and con ibu o (s) and no o MDPI and/o he edi o (s). MDPI and/o he edi o (s) disclaim esponsibili y o any inju y o
people o p ope y esul ing om any ideas, me hods, ins uc ions o p oduc s e e ed o in he con en .