Machine Lea ning Algo i hm o Backmapping
Mac omolecules
P ojec Ti le: Polyme In o ma ics Tools o Sus ainable 3D P in ing (PITS3D)
Fellow: D . Pe a Bačo á
Hos Ins i u ion: Uni e sidad de Cádiz, Spain
Du a ion: 20 Mon hs (Janua y 2024– Augus 2025)
1. Execu i e Summa y
This epo p o ides a b ie summa y o a deep lea ning-based me hodology o ein oducing a omic
de ail in o coa se-g ained (CG) con igu a ions o mac omolecules. The app oach employs a U-Ne
con olu ional neu al ne wo k (CNN) model and i has been success ully applied o poly(lac ic acid)
(PLA) sys ems [1], a key ocus o he PITS3D p ojec . Mo e in o ma ion abou he me hodology,
pe o med simula ions and de eloped ools can be ound in e . [1]. The CNN model is a ailable
on Gi Hub [2]. This pa o he p ojec was done in collabo a ion wi h The Cyp us Ins i u e, mo e
speci ically wi h Ele he ios Ch is o i and Vagelis Ha manda is.
2. Machine Lea ning Algo i hm O e iew
2.1. Model A chi ec u e
The a chi ec u e o he Neu al Ne wo k was inspi ed om he wo k done by Li e al. [3], whe e
hey in oduced a gene al app oach o backmapping CG mac o-molecules by u ilizing a “Pix2Pix"
gene al-pu pose condi ional Gene a i e Ad e sa ial Ne wo k (cGAN) o pe o m an image- o-image
ansla ion. In his me hod he a omis ic con igu a ions we e encoded om he XYZ ec o compo-
nen s in o Red-G een-Blue (RGB) alues, ea ing backmapping as a supe - esolu ion p oblem ha
maps low- esolu ion CG images o high- esolu ion a omis ic ones. The me hod was employed, as an
illus a i e example, in CG models o homopolyme s cis-1,4 polyisop ene mel s. Fo simplici y, in ou
wo k [4,1] we u ilize only he gene a o o he “Pix2Pix" cGAN, which is a U-Ne based model. Also,
ins ead o employing RGB alues, he CNN was ained di ec ly on a omic desc ip o s (namely, p ob-
abili y dis ibu ion unc ions o a omic bonds) ha a e capable o c ea ing ep esen a i e a omis ic
con igu a ions in Ca esian coo dina es o high molecula weigh mul i-componen polyme ic sys ems.
The bonds among he connec ed pai s o a oms se ed as he a ge ou pu o he a i icial neu al
ne wo k, while he coo dina es and he ype o he CG pa icles se ed as inpu .
A e a numbe o expe imen s, whe e we examined he neu al ne wo k’s beha iou o di e en
dep hs and ac i a ion unc ions, we concluded ha he neu al ne wo k shown in Figu e 1is eliable o
his ask. We u ilize a U-ne CNN based model, which consis s o an encode and a decode ne wo k,
wi h skip connec ions among hem. Fo he encode we s ack i e down-sample blocks which consis o
a con olu ion laye wi h s ide 2, a leaky ReLU ac i a ion unc ion, and a ba ch-no maliza ion laye .
We no e ha we s a wi h 64 il e s and end up wi h 512. Then we pass he ou pu o he encode o
he decode ne wo k, whe e we ha e i e up-sample blocks which consis o a ansposed con olu ion
laye wi h s ide 2 and a ReLU ac i a ion unc ion. Fo he i s up-sample block we ha e a d opou
laye wi h a a e o 0.5. We no e ha o he las laye o he ne wo k we ha e a ansposed con olu ion
laye wi h s ide 1 and a anh ac i a ion unc ion, because we escale he a ge ou pu alues in he
in e al [-1,1].
1
Fu he mo e, o he aining p ocess we u ilize mini-ba ch g adien descen wi h ba ches o size
64 and Adam op imiza ion algo i hm wi h an ini ial lea ning a e o 0.001, which was dec eased down
o 0.000001 by a ac o o 8 once lea ning s agna es. The CNN was implemen ed in he open sou ce
Tenso Flow 2 pla o m. [5] The compu a ional ime needed o ain he model on a single NVIDIA
Tesla V100-SXM2 GPU o 1000 epochs was a ound 24 hou s.
Figu e 1: Schema ic ep esen a ion o he CNN used o he implemen a ion o he me hod. Figu e 5
om “Physics-In o med Deep Lea ning App oach o Rein oducing A omic De ail in Coa se-G ained
Con igu a ions o Mul iple Poly(lac ic acid) S e eoisome s” by Ele he ios Ch is o i, Pe a Bačo á, and
Vagelis A. Ha manda is. Licensed unde CC BY 4.0. DOI: [10.1021/acs.jcim.3c01870]
2.2. T aining P ocess and Loss Func ion
To imp o e he accu acy o he ne wo ks we u he employ p io knowledge bo h in o he da a p e-
p ocessing and in o he gene a o objec i e. Fo his, he loss unc ion o he neu al ne wo k is
augmen ed wi h se e al penalizing e ms, based on p io knowledge o he physical p ope ies o he
molecula sys em unde s udy, e.g., bond leng hs, bond angles, and dihed al angles. We de ine he
loss unc ion as a linea combina ion o he loss unc ion e ms ha penalize bond ec o s (Lb ), bond
leng hs (Lbl), and dihed al angles Lda, ia:
˜
L=λb Lb +λblLbl +λbaLba +λdaLda (1)
We ea he weigh s o he loss e ms (i.e. alues o λ= (λmae,λbl,λba,λda,λ 0)) as hype pa ame e s.
To ind he op imum hype pa ame e con igu a ion o hose λ alues we pe o m a quick g id sea ch. To
in es iga e s uc u al simila i y be ween he a ge and p edic ed a omis ic con igu a ions, we compu e
p obabili y dis ibu ion unc ions o bond leng hs, bond angles, and dihed al angles, as well as non-
bonded pai adial dis ibu ion unc ions. Fo his, we de ine a me ic pa ame e Ψ, as he o al L1
di e ence o he p obabili y densi ies be ween he a ge and p edic ion o all he dis ibu ions o bond
2
leng hs, Pbl, bond angles, Pba, dihed al angles, Pda, and adial dis ibu ion unc ion, g( ), h ough:
Ψbl =
nbl
X
i=1
|Pp edic ion
bl,i (x)−P a ge
bl,i (x)|(2)
Ψba =
nba
X
i=1
|Pp edic ion
ba,i (x)−P a ge
ba,i (x)|(3)
Ψda =
nda
X
i=1
|Pp edic ion
da,i (x)−P a ge
da,i (x)|(4)
In he abo e equa ions, nbl,nba, and nda deno e he numbe o bond leng hs, bond angles, and dihed al
angles, espec i ely. Mo eo e , o he adial dis ibu ion unc ion,
Ψg =|gp edic ion( )−g a ge ( )|,(5)
wi h being he magni ude o he dis ance be ween non-bonded pa icles. The e o e, a oms di ec ly
pa icipa ing o bond s e ching, bond angle bending, and dihed al angle de ini ions a e excluded om
he adial dis ibu ion unc ion. The sum o hese quan i ies was deno ed as Ψ o al. Lowe Ψ o al alues
indica e highe p edic ion quali y.
3. Applica ion o Poly(lac ic acid) (PLA)
As an illus a i e example o chi al biodeg adable polyme s, he algo i hm was applied o amo -
phous PLA polyme s, which may con ain wo ypes o s e eoisome s: poly(L-lac ide) acid (PLLA)
and poly(D-lac ide) acid (PDLA). The amewo k was es ed on a ious model sys ems, om ho-
mopolyme s e eoisome s o PLA o copolyme s wi h andomly placed chi al cen e s (PDLLA). The
objec i e was wo old: o de elop an e ec i e and e sa ile algo i hm o backmapping chi al molecules
a he all-a om scale and o design a ool capable o p oducing a omis ic con igu a ions o PLA wi h
mul iple molecula weigh s and composi ions. The me hodology uses an all-a om (AA) desc ip ion.
3.1. Backmapping P ocedu e and Valida ion
The p ocedu e in ol es h ee main s ages: p ep ocessing, aining, and pos p ocessing. Du ing p ep o-
cessing, he sys em’s CG coo dina es and pa icle ypes se e as inpu , while p obabili y dis ibu ion
unc ions o a omis ic bond ec o s a e collec ed as he a ge ou pu . The aining phase in ol es a
U-Ne CNN a chi ec u e desc ibed abo e. A c ucial aspec is he loss unc ion, which is augmen ed
wi h physical penalizing e ms o p ope ies like bond ec o s and bond leng hs; p elimina y es s
showed ha penalizing only hese wo e ms (bond ec o s and bond leng hs) yielded he bes esul s.
Finally, pos p ocessing ensu es he physical accu acy and desi ed s e eochemis y o he gene a ed
a omis ic con igu a ions.
The pos p ocessing desc ibed below includes a sho sequence o a omis ic simula ions and cus-
omized codes o examine and co ec he s e eochemis y o he de i ed all-a om con igu a ions. The
codes a e a ailable open-access. [6]
1. Ene gy Minimiza ion: A e ob aining he backmapped s uc u e, ene gy minimiza ion is
pe o med ia a s eepes descen algo i hm. This is s anda d p ac ice due o he ill-posed na u e
o he e e se p oblem.
2. Sho Simula ion: A e y sho simula ion (0.1 ps) is un o slowly in oduce excluded olume
acco ding o he selec ed o ce ield.
3. S e eochemis y Checking and Co ec ion: The ained model migh misplace a oms, espe-
cially he hyd ogen on he chi al a om, leading o inco ec s e eochemis y. A cus om-de eloped
C code checks i he s e eochemis y o each monome co esponds o he desi ed sequence. I
3
no , he posi ions o he hyd ogen a om and he me hyl g oup a e swi ched by a e lec ion ma ix.
This p ocess mimics challenges in PLA syn hesis whe e ull s e eochemis y con ol is di icul .
4. Addi ional Ene gy Minimiza ion and Sho Run: To add ess po en ial o e laps c ea ed
by s e eochemis y co ec ion, an addi ional ene gy minimiza ion s ep ollowed by ano he sho
un is pe o med.
5. Analysis Da a Collec ion: Finally, da a o analysis a e collec ed om sho MD simula ions
o a ew nanoseconds (app oxima elly 10 ns).
The pe cen age o “w ong monome s” in he p edic ed con igu a ion is gene ally low, allowing o
e ec i e co ec ion h ough ene gy minimiza ion.
The backmapping model’s accu acy was igo ously alida ed by compa ing i s p edic ed a omis ic
s uc u es agains e e ence da a de i ed om ex ensi e a omis ic molecula dynamics simula ions.
This alida ion ocused on assessing bo h in amolecula and in e molecula s uc u al de ia ions,
p ima ily u ilizing p obabili y dis ibu ion unc ions o bond leng hs, bond angles, and no ably, dihed al
angles. The esul s consis en ly showed ha ini ial p edic ions had only mino de ia ions, which we e
la gely elimina ed a e sho molecula dynamics equilib a ion uns (poin 5 abo e). Fu he mo e,
he model e ec i ely cap u ed he a omic packing and local a angemen s, demons a ed by s ong
ag eemen in in amolecula and in e molecula adial dis ibu ion unc ions and accu a e ep oduc ion
o sys em densi ies. O e all, he alida ion p ocess con i med he model’s e iciency and e icacy in
gene a ing physically accu a e a omis ic con igu a ions. This can signi ican ly educe compu a ional
demands o equilib a ion.
Re e ences
[1] E. Ch is o i, P. Bačo á, and V. A. Ha manda is. Physics-In o med Deep Lea ning App oach
o Rein oducing A omic De ail in Coa se-G ained Con igu a ions o Mul iple Poly(lac ic acid)
S e eoisome s. Jou nal o Chemical In o ma ion and Modeling, 64(6):1853–1867, 2024.
[2] E. Ch is o i. PLA Backmapping. h ps://gi hub.com/SimEA-ERA/PLA-BackMap-CG.
[3] W. Li, C. Bu kha , P. Polińska, V. A. Ha manda is, and M. Doxas akis. Backmapping Coa se-
G ained Mac omolecules: An E icien and Ve sa ile Machine Lea ning App oach. The Jou nal o
Chemical Physics, 153(4):041101, 2020.
[4] E. Ch is o i, A. Chazi akis, C. Ch ysos omou, M. A. Nicolaou, W. Li, M. Doxas akis, and V. A.
Ha manda is. Deep Con olu ional Neu al Ne wo ks o Gene a ing A omis ic Con igu a ions o
Mul i-Componen Mac omolecules om Coa se-G ained Models. The Jou nal o Chemical Physics,
157(18), 2022.
[5] M. Abadi, A. Aga wal, P. Ba ham, E. B e do, Z. Chen, e al. Tenso Flow: La ge-Scale Machine
Lea ning on He e ogeneous Sys ems, 2015.
[6] P. Bačo á. PLA Analysis Tools. h ps://gi hub.com/pbaco a/PLA_analysis_ ools.gi .
4