scieee Science in your language
[en] (orig)

Machine Learning for RNA Design: LEARNA

Author: Runge, Frederic; Hutter, Frank
Publisher: Zenodo
DOI: 10.5281/zenodo.17284252
Source: https://zenodo.org/records/17284252/files/Machine_Learning_for_RNA_Design-LEARNA.pdf
This is a p ep in o he ollowing chap e : Runge F and Hu e F, Machine Lea ning o RNA Design: LEARNA, published
in RNA Design: Me hods and P o ocols, edi ed by Chu kin A and Ba ash D, 2024, Humana P ess ep oduced wi h
pe mission o Sp inge Science+Business Media, LLC, pa o Sp inge Na u e. The inal au hen ica ed e sion is a ailable
online a : h p://dx.doi.o g/10.1007/978-1-0716-4079-1_5.
Machine Lea ning o RNA Design: LEARNA
F ede ic Runge
Uni e si y o F eibu g
Depa men o Compu e Science
79110 F eibu g, Ge many
[email p o ec ed]i- eibu g.de
F ank Hu e
Uni e si y o F eibu g
Depa men o Compu e Science
79110 F eibu g, Ge many
[email p o ec ed] eibu g.de
Abs ac
Machine lea ning algo i hms, and in pa icula deep lea ning app oaches, ha e ecen ly ga ne ed
a en ion in he ield o molecula biology due o ema kable esul s. In his chap e , we desc ibe
machine lea ning app oaches speci ically de eloped o he design o RNAs wi h a ocus on he
lea na_ ools Py hon package, a collec ion o au oma ed deep ein o cemen lea ning algo i hms o
seconda y s uc u e-based RNA design. We explain he basic concep s o ein o cemen lea ning and
i s ex ension au oma ed ein o cemen lea ning and ou line how hese concep s can be success ully
applied o lea n he design o RNAs. The chap e is s uc u ed o guide h ough he usage o he
di e en p og ams wi h explici examples, highligh ing pa icula use cases o he indi idual ools.
Key wo ds:
● Machine Lea ning
● Au oma ed Machine Lea ning
● Deep Lea ning
● Rein o cemen Lea ning
● Au oma ed Rein o cemen Lea ning
● RNA Design
● Pa ial RNA Design
1 In oduc ion
1.1 Machine Lea ning Me hods o RNA Design
Machine lea ning (ML) is a ield o a i icial in elligence (AI) ocused on building sys ems ha can
lea n om and make decisions based on da a. Unlike adi ional p og amming, whe e ules a e
explici ly de ined, ML algo i hms iden i y pa e ns in da a and make p edic ions o decisions wi hou
being explici ly p og ammed o each ask. Machine lea ning algo i hms ha e ecen ly ga ne ed
signi ican a en ion in bo h academic and indus ial ci cles, pa icula ly wi hin he ealm o molecula
biology(1)(2)(3). Among hese, deep lea ning (DL)(4) me hodologies s and ou as especially
p omising due o hei abili y o le e age he as quan i ies o expe imen al da a gene a ed by we
labo a o ies. In con as o mo e adi ional ML app oaches ha ypically equi e subs an ial da a
p ep ocessing o de i e meaning ul inpu ea u es, deep lea ning can di ec ly be applied o aw da a o
lea n a use ul ep esen a ion by p ocessing i in a ious laye s o a deep neu al ne wo k (DNN). The
ul ima e objec i e o hese models is o gene alize and apply hei lea ned insigh s o new, unseen
asks. Theo e ically, e en basic neu al ne wo ks (NNs) a e uni e sal unc ion app oxima o s(5)(6),
which p opelled deep lea ning o he o e on o nume ous ields wi h ema kable successes(7)(8)(9).
In ligh o hese ad ancemen s, he e has been a no able eme gence o machine lea ning algo i hms
aimed a add essing he complex challenges o RNA s uc u e and unc ion p edic ion
p oblems(10)(11)(12)(13)(14). RNA design, howe e , conside s he in e se p oblem: Gi en a se o
desi ed p ope ies o unc ions, ind one o mo e RNA nucleo ide sequences wi h he desi ed ea u es.
Di e en machine lea ning app oaches we e p oposed o ackle he p oblem om a ious
pe spec i es.
Some esea che s ocus on lea ning o gene a e RNAs om nucleo ide sequence da a di ec ly o e.g.
aid in speci ic applica ion a eas such as ap ame disco e y h ough Sys ema ic E olu ion o Ligands
by Exponen ial En ichmen (SELEX)(15). Fo his applica ion, me hods ypically y o educe he
scale o ini ial lib a ies o expe imen al sc eening app oaches, which is i al o cos and e iciency in
expe imen al wo k lows. These a ied app oaches include using deep lea ning wi h ecu en neu al
ne wo ks (RNNs)(16), a ia ional au oencode s (VAE)(17), and o he inno a i e echniques(18)(19).
The basic lea ning pa adigm o such algo i hms is o lea n a model om RNA sequence da a o
gene a e simila (bu di e en ) sequences o he aining da a se a es ime.
Ano he c i ical aspec o RNA design e ol es a ound i s seconda y s uc u es. The unc ion o a
molecule is in insically linked o i s s uc u e, and hus, algo i hms ha can gene a e RNA sequences
olding in o desi ed s uc u es a e highly sough a e . In his ega d, (20) ecen ly p oposed a
gene a i e app oach consis ing o a deep lea ning based sco ing ne wo k and a second model based on
a gene a i e ad e sa ial ne wo k (GAN)(21) o he gene a i e design o oehold swi ches(22). This
app oach bea s g ea po en ial, since i inco po a es knowledge o he RNA sequence and i s
seconda y s uc u e in o he design p ocess.
In gene al, esea ch o seconda y s uc u e-based RNA design can be oughly di ided in o h ee main
app oaches: one inco po a es human insigh s in o he design p ocess(23)(24) gained om he online
game E e na(25), he second one le e ages seconda y s uc u e in o ma ion o modi y an ini ial
sequence un il i olds in o he desi ed a ge s uc u e(26), while he hi d employs a gene a i e
design app oach based on s uc u e in o ma ion(20)(27)(28).
In his chap e , we del e in o he la e ca ego y o algo i hms, ocusing pa icula ly on he LEARNA
amewo k and i s ela ed algo i hms a ailable in he lea na_ ools Py hon package. The package
ga he s a se o gene a i e app oaches o he RNA design p oblem ha employ au oma ed
ein o cemen lea ning. We will p o ide an in-dep h explo a ion o hese algo i hms, o e ing de ailed
insigh s in o hei backg ounds, ins alla ion p ocesses, inpu o ma s, and p ac ical applica ions. The
chap e is s uc u ed o guide he eade h ough he necessa y backg ound on ein o cemen lea ning,
he basic p inciples o he LEARNA amewo k, and i s a ious implemen a ions. We will also
p o ide insigh s in o a new RNA design pa adigm ha conside s bo h, sequence- and s uc u e-based
design om p o ided mo i s and an algo i hm o sol e i .
Fo easons o simplici y, we o en use LEARNA as an umb ella e m o he algo i hms o he
lea na_ ools package in his chap e since he algo i hms sha e a gene al amewo k. Howe e , we
clea ly ma k cases whe e he algo i hms di e and explain indi idual use cases and p ope ies whe e
necessa y.
1.2 Rein o cemen Lea ning
In o de o unde s and he basic ideas behind he LEARNA amewo k, we would like o gi e a b ie
in oduc ion o he concep o ein o cemen lea ning (RL)(29). Rein o cemen lea ning is a b anch o
machine lea ning whe e an agen lea ns op imal beha io h ough ial-and-e o in e ac ions wi h i s
en i onmen . I in ol es h ee key elemen s: s a es, ac ions, and ewa ds. The agen obse es a s a e
p o ided by he en i onmen ( he si ua ion o condi ion i is in), akes an ac ion, and hen ecei es a
ewa d based on he ou come o ha ac ion. The goal is o maximize he cumula i e ewa ds o e
ime.
Deep Rein o cemen Lea ning (DRL) is an ad anced o m o RL ha in eg a es deep lea ning,
pa icula ly use ul o en i onmen s wi h complex o high-dimensional s a es. In DRL, deep neu al
ne wo ks a e used o in e p e hese s a es and assis he agen in decision-making. This in eg a ion
allows he agen o p ocess and lea n om complex, in ica e da a inpu s, such as images o sequen ial
pa e ns, which is a common equi emen in many sophis ica ed applica ions.
An impo an aspec o DRL is he op imiza ion o he policy, he s a egy ha he agen ollows o
decide i s ac ions. The LEARNA amily o algo i hms uses P oximal Policy Op imiza ion (PPO)(30).
PPO op imizes he policy ne wo k in a manne ha p e en s d as ic changes in he policy be ween
successi e lea ning i e a ions. This con olled app oach o op imiza ion ensu es a s able and consis en
lea ning p ocess, making PPO an a ac i e choice o asks whe e balance and eliabili y in lea ning
a e c ucial.
Th ough he combina ion o deep lea ning and sophis ica ed policy op imiza ion echniques, DRL
enables agen s o lea n e icien ly in complex en i onmen s.
Acco ding o he common con en ion in he ield, we will use he e m ein o cemen lea ning (RL) in
he emainde o his chap e as a synonym o deep ein o cemen lea ning, since all o he desc ibed
algo i hms employ deep neu al ne wo ks o app oxima e he policy o he agen .
1.3 Au oma ed Rein o cemen Lea ning
Du ing he de elopmen o an algo i hm, de elope s o en ha e o decide abou ce ain pa ame e s
ha ha e o be se in ad ance. These so-called hype pa ame e s can ha e subs an ial impac on he
pe o mance o an algo i hm, pa icula ly in he ield o ein o cemen lea ning (RL)(31). Au oma ed
ein o cemen lea ning (Au oRL)(32) enhances RL by in eg a ing Au oma ed Machine Lea ning
(Au oML)(33) p inciples. I aims o au oma e c i ical aspec s o RL, such as selec ing algo i hms,
op imizing hype pa ame e s, and designing ne wo k a chi ec u es, seeking o mi iga e in ensi e
manual uning and enhance e iciency and accessibili y. Au oRL is pa icula ly bene icial in complex
scena ios whe e manual op imiza ion is imp ac ical, esou ce-in ensi e, o in si ua ions whe e he e is
no p io knowledge abou good pa ame e se ings.

1.4 RNA Design as a Rein o cemen Lea ning P oblem
As ou lined in Sec ion 1.1, ein o cemen lea ning (RL) desc ibes a pe iodic in e ac ion be ween an
agen and i s en i onmen . The c ucial s eps o de ine RNA design as a RL p oblem a e (1) o p o ide
in o ma i e s a es ha guide he agen ’s decisions, (2) o de ine ac ions ha ep esen he design o an
RNA sequence, and (3) o de ine a ewa d signal ha helps he agen o lea n a policy ha yields he
desi ed ou comes. The choice o he a chi ec u e o he policy ne wo k, he me hod o policy
op imiza ion, he aining p ocess, as well as o he hype pa ame e s ha ha e o be se in ad ance a e
u he challenges o de elope s o an RL sys em o RNA design.
Two RL app oaches ha e been p oposed o ackle he RNA design p oblem. (26) use a con olu ional
neu al ne wo k a chi ec u e(4) o he policy ne wo k. The agen is ained o modi y a sequence,
gi en o he ne wo k as a N × 4 enso , whe e N is he numbe o bases, ep esen ing he cu en
sequence in one-ho encoding. As a s a ing poin , (26) use a andomly gene a ed inpu sequence wi h
he same leng h as a p o ided seconda y s uc u e. The s a es a e de ined as he en i e inpu sequence;
ac ions co espond o modi ying he ype o a single nucleo ide a a ce ain posi ion, o in some cases
wo nucleo ides. A e each episode he modi ied sequence is olded using ViennaRNA’s
RNAFold(34), and he agen ecei es a ixed posi i e ewa d i he esul ing sequence is p edic ed o
old in o he a ge s uc u e. All o he ac ions ecei e a ewa d o ze o.
In con as , he LEARNA algo i hms ackle he p oblem wi h a gene a i e app oach. Gi en a a ge
seconda y s uc u e, LEARNA p edic s a nucleo ide o each posi ion o he a ge s uc u e wi hou
ecei ing in e media e ewa ds. Once all si es ha e been assigned nucleo ides, he designed candida e
sequence is olded using RNAFold and compa ed o he desi ed s uc u e.
The ac ions co espond o placing a nucleo ide o a gi en posi ion in he a ge s uc u e.
Consequen ly, LEARNA can choose om ou ac ions ha co espond o ‘A’, ‘C’, ‘G’, and ‘U’.
Howe e , he algo i hm le e ages u he in o ma ion p o ided wi h he a ge s uc u e: I a ce ain
posi ion ough o be pai ed, he algo i hm places wo nucleo ides in a single s ep using he
complemen a y nucleo ide o he pai ed posi ion, ollowing Wa son-C ick base in e ac ions (‘A-U’,
‘U-A’, ‘G-C’, ‘C-G’).
The s a es a e de ined as local ep esen a ions o he inpu s uc u e. Mo e p ecisely, o each posi ion
in he a ge s uc u e, LEARNA uses an n-g am cen e ed a ound he cu en posi ion o he a ge
s uc u e. In essence, his app oach can be in e p e ed as a window sliding o e he a ge s uc u e o
p o ide a local iew o each posi ion. To be able o de ine he n-g am a any posi ion ( o example a
posi ion 1 he ups eam pa o he sequence would be unde ined), he a ge s uc u e is i s padded
wi h a padding symbol. The size o he n-g am is a hype pa ame e ha has o be de ined in ad ance.
Figu e 1 gi es an example o an ac ion ollou o LEARNA including s a e ep esen a ions o a
window size o h ee. Howe e , we no e ha he ac ual size o he window o he algo i hms o he
lea na_ ools package is be ween 21 and 65.
[Figu e 1] Illus a ion o an ac ion ollou o he LEARNA algo i hms. The agen sequen ially builds a
candida e sequence by choosing ac ions o place nucleo ides. A pai ed si es, as indica ed by a pai o
b acke s, wo nucleo ides a e placed simul aneously (s ep 1 and s ep 2); while a unpai ed si es a
single nucleo ide is placed (s ep 3-5). The ac ions a e chosen based on he s a es p o ided by he
en i onmen . The s a es ep esen a local iew on he a ge s uc u e (he e using a window size o
h ee). To be able o build he s a es a each posi ion, he a ge s uc u e is padded a each end wi h a
padding symbol ‘#’.
Once all si es ha e been assigned nucleo ides, he ewa d is compu ed based on he Hamming
dis ance(35) be ween he a ge s uc u e and he s uc u e de i ed om olding he p edic ed
candida e sequence wi h RNAFold. The dis ance is no malized by he leng h o he inpu s uc u e,
which p o ides a inal dis ance alue be ween ze o (pe ec ly ma ching s uc u es) and one
(comple ely di e en s uc u es). To p o ide a mo e in o ma i e ewa d signal o op imiza ion,
LEARNA uses a hype pa ame e
𝛼
ha shapes he ewa d. The inal ewa d unc ion o s uc u e-
based RNA design wi h LEARNA hen looks as ollows:
𝑅 = (1 − 𝐷
𝐿)𝛼,
whe e D is he Hamming dis ance, L is he leng h o he a ge s uc u e, and
𝛼
is he pa ame e o
shaping he ewa d.
The LEARNA amewo k encompasses h ee s a egies o he design o RNAs: LEARNA, Me a-
LEARNA, and Me a-LEARNA-Adap . We will de ail hese s a egies in Sec ion 3. The desc ibed
o mula ion o he s a es, ac ions, and he ewa d unc ion, howe e , applies o all h ee e sions. Fo
a mo e o mal desc ip ion o he LEARNA app oach, we e e he in e es ed eade o (27).
Howe e , he lea na_ ools package also p o ides libLEARNA(28), an ex ension o he o iginal
LEARNA amewo k ha conside s a mo e gene al RNA design pa adigm, pa ial RNA design(36).
We b ie ly ou line his RNA design o mula ion in he ollowing sec ion.
1.5 The Pa ial RNA Design Pa adigm
The design o RNA molecules o en in ol es in eg a ing dis inc RNA agmen s, known as mo i s,
each ca ying a speci ic unc ion(37)(38)(39). The new pa adigm o pa ial RNA design(36) ollows
his concep by allowing lexible and p ecise RNA design om p o ided sequence and s uc u e
mo i s.
T adi ionally, RNA design has been cons ained by he need o adhe e o a single, p ede e mined
seconda y s uc u e. Howe e , his app oach does no always align wi h he mul i ace ed na u e o
RNA design endea o s. Conside , o ins ance, designing an RNA molecule o ligand-dependen
gene exp ession egula ion. This ask migh in ol e combining a iboswi ch(40) wi h a Shine-
Dalga no sequence(41), necessi a ing di e en ypes o mo i s: a mo i wi h speci ic sequence and
s uc u al equi emen s o binding he ligand ( he ap ame ), a a iable-leng h mo i (a space egion),
a mo i ocusing solely on seconda y s uc u e ea u es ( he exp ession pla o m), and a sequence-
speci ic mo i wi hou s uc u al cons ain s ( he Shine-Dalga no sequence).
Exis ing RNA design me hods all sho in add essing such mul i ace ed design asks, leading o he
de elopmen o pa ial RNA design and he libLEARNA algo i hm.
In his app oach, we o mula e mo i s ha cons ain speci ic egions o he sequence and seconda y
s uc u e o an RNA, while lea ing he o he egions uncons ained. These uncons ained pa s can
hen be de ined as a iable leng h egions which can be illed wi h a bi a y mo i s by an
algo i hm(36). The design can be iewed as a sea ch in a pa ially es ic ed design space ha allows
o explo a ion by a machine o p o ide di e se solu ions. libLEARNA le e ages hese cons ain s o
na iga e he RNA design space, ensu ing he inclusion o desi ed mo i s while explo ing a ious
combina ions wi hin he se pa ame e s.
Pa ial RNA design ep esen s a signi ican shi om con en ional RNA design me hods. I o e s
g ea e lexibili y, enabling he design o complex RNA molecules wi hou he exhaus i e need o
speci y and e ine a a ge s uc u e. Howe e , designing e ec i e sea ch spaces o libLEARNA does
equi e hough ul conside a ion o achie e he in ended ou comes, which we will explo e in de ail in
Sec ion 3 (see also no e 1 and no e 5).
By de aul , all algo i hms design a single candida e sequence o a gi en a ge s uc u e. To change
his beha io one can use he --num_solu ions op ion. Fo example, o design i e candida e sequences
o a s uc u e o a Hamme head ibozyme, you can use:
lea na -- a ge _s uc u e
“.(((((....((((((.......((((......))))...(((.....))).))))))...))))).
” --num_solu ions 5
The esul ing ou pu epo s all solu ions in a Ma kdown able o ma by de aul . The esul s can be
sa ed o a speci ic di ec o y ia he -- esul s_di <pa h o esul s di ec o y> op ion. The algo i hms
s o e esul s as a pandas da a ame in pickle o ma by de aul . Howe e , he p og ams allow
changing he ou pu o ma wi h he --ou pu _ o ma op ion. Cu en ly, he ollowing o ma s a e
suppo ed: “pickle”, “cs ” (a comma sepa a ed ile), o “ as a”. The “ as a” o ma sa es he ou pu o
a as a-like ile o ma ha looks as ollows:
>Id 1
<designed sequence 1>
<co esponding s uc u e 1 in do -b acke o ma >
>Id 2
<designed sequence 2>
<co esponding s uc u e 2 in do -b acke o ma >
All algo i hms u he allow se ing a un ime limi in seconds wi h he -- imeou < ime in seconds>
op ion. When he limi is eached, he algo i hms e u n all solu ions ound so a i he e a e any.
When using mul iple a ge s as inpu , he imeou coun s o each indi idual a ge . By de aul , he
algo i hms un wi h a un ime limi o 600 seconds.
Some imes, i migh be help ul o also e u n subop imal solu ions. This is pa icula ly help ul i he
imeou is se e y op imis ically. You can e u n subop imal solu ions wi h he --hamming_ ole ance
<na u al numbe > op ion. The op ion se s a h eshold o include candida es wi h a Hamming dis ance

o he a ge s uc u e ha is smalle o equal o he p o ided h eshold alue a e olding.
Al e na i ely, you can use he --show_all_designs lag o e u n all he designed candida es o a un.
Depending on he a ge s uc u e and he p o ided un ime limi , howe e , his lag migh esul in
ela i ely la ge amoun s o designed candida es.
3.1.4 Ad anced P og am Op ions
The algo i hms o he lea na_ ools package allow p ecise con ol o e all indi idual pa ame e s used
du ing he op imiza ion p ocess ia Au oRL. As an example, one can change he numbe o ully-
connec ed laye s o he ne wo k by se ing he --num_ c_laye s <na u al numbe > op ion. Howe e ,
he de aul op ions ypically p o ide good esul s and especially he p e ained models (Me a-
LEARNA, Me a-LEARNA-Adap , and libLEARNA) do no always allow o se a bi a y
combina ions o all pa ame e s due o con lic s wi h he sa ed model pa ame e s which a e loaded by
de aul o acili a e he command line in e ace. The e o e, we ecommend o ei he keep he
pa ame e s unchanged, o o un a comple ely new op imiza ion om sc a ch (see No e 2).
LEARNA, Me a-LEARNA-Adap (and libLEARNA) all adap o a gi en a ge a es ime by
upda ing he weigh s o he policy ne wo k o he ask a hand. Consequen ly, he algo i hms migh
ge s uck in a local minimum du ing op imiza ion. To a oid his, he h ee algo i hms p o ide he --
es a _ imeou < ime in seconds> op ion o es a he algo i hms wi h he ini ial weigh s o he
policy ne wo k a e a ce ain ime. We discuss he beha io o he es a op ion in mo e de ail in no e
4.
We u he would like o no e ha he agen is sha ed ac oss all asks when designing RNAs o
mul iple a ge s in a single un. This means ha he policy is op imized ac oss all he p o ided a ge s
and no op imized on each indi idual a ge only. To a oid his beha io , you can use he --
no_sha ed_agen lag o ensu e a ese o he weigh s o he agen be o e p ocessing he nex a ge .
An explici example command o he design o 100 candida es o he “F og Foo ” a ge o he
E e na100 benchma k(44) using Me a-LEARNA-Adap wi h a es a o he algo i hm e e y i e
seconds (as used in no e 4), and a un ime limi o wo minu es, would look as ollows:
me a-lea na-adap --inpu _ ile examples/i _ og_ oo _example.inpu -
- es a _ imeou 5 --num_solu ions 100 -- imeou 120
3.1.5 Desc ip ion o he Ou pu
All p og ams o he lea na_ ools package p in gene al in o ma ion abou he cu en job o he
console a e s a ing. These include e.g., he numbe o desi ed solu ions o he a ge Ids. The inal
solu ions a e p in ed in abula o ma by de aul . The able con ains he ollowing ields:
● An unnamed ield o index he solu ions.
● Id: The Id o he a ge .
● ime: The ime equi ed o ind he solu ion.
● hamming_dis ance: The Hamming dis ance be ween he olded candida e sequence and he
a ge s uc u e.
● el_hamming_dis ance: The Hamming dis ance no malized by he leng h.
● sequence: The sequence o he candida e solu ion.
● s uc u e: The p edic ed s uc u e o he candida e sequence.
In case he inpu consis s o mul iple a ge s, he p og ams p in one able o each indi idual a ge . I
he e we e no o ewe solu ions ound han eques ed, he p og am p in s a Wa ning message o
in o m he use and epo s all solu ions ound wi hin he gi en imeou . As desc ibed be o e, he
ou pu o ma can be changed ia he --ou pu _ o ma op ion.
The lea na_ ools package u he p o ides ools o isualiza ion o he ou pu s. These include logo
plo s o he designed sequences o plo he dis ibu ion o nucleo ides a each posi ion using he
logomake py hon package(45), as well as seconda y s uc u e plo s using VARNA. The plo ing can
be igge ed by se ing he --plo _logo and --plo _s uc u e lags. By de aul , all plo s a e sa ed in o
he ‘plo s’ di ec o y ela i e o he cu en wo king di ec o y. Howe e , one can speci y a pa h o
sa ing he plo s using he --plo ing_di <pa h o di ec o y> op ion. Fo showing he plo s besides
sa ing, one can use he --show_plo s op ion. We show examples o logo plo s in Figu e 6 in Sec ion 4
whe e we discuss he di e si y o he gene a ed sequences in no e 4. Using he “F og Foo ” example
om he E e na100 benchma k again, he command o LEARNA o ep oduce he logo plo s o
Figu e 6 o 100 solu ions wi hou es a s o he algo i hm looks as ollows:
lea na -- a ge _s uc u e
“..........((((....))))((((....))))((((...))))” -- a ge _id “F og
Foo ” -- imeou 60 --num_solu ions 100 --plo _logo --show_plo s
3.2 RNA Design in Pa ially Res ic ed Sea ch Spaces – libLEARNA
3.2.1 Desc ip ion and Scope o he P og am
libLEARNA is he mos ecen ex ension o he LEARNA amewo k. I uses he Me a-LEARNA-
Adap app oach bu allows e y lexible RNA design applica ions. libLEARNA can design RNAs o
a iable leng hs om p o ided sequence and s uc u e mo i s. The algo i hm ensu es ha p o ided
sequence and s uc u e cons ain s a e sa is ied a he gi en posi ions, bu i can also explo e a sea ch
space au oma ically o ind mul iple easonable solu ions. We demons a e he capabili ies o
libLEARNA wi h explici examples in he ollowing sec ions. Since libLEARNA o e s a new RNA
design in e ace, using libLEARNA o he i s ime migh appea new o use s ha a e amilia wi h
mo e adi ional RNA design app oaches. Howe e , we will guide h ough speci ic use cases o help
o ge mo e amilia wi h he app oach and i s in e ace.
3.2.2 Inpu Fo ma s
The sequence and s uc u e inpu s o libLEARNA can only be p o ided in a ile wi h speci ic ags.
The sequence cons ain s can be de ined a e a ‘#seq’ ag using IUPAC no a ion. These cons ain s
can be p o ided as mo i s. Posi ions in he inpu sequence ha a e ma ked wi h whi espace o ‘X’
(‘X end he e') a e conside ed uncons ained posi ions whe e he algo i hm is allowed o ex end he
sequence as long as a ha d maximum leng h limi o he en i e sequence is no iola ed. Simila ly, he
s uc u e cons ain s ( ollowing he ‘#s ’ ag) can be p o ided as mo i s. The algo i hm allows placing
‘N’ in he s uc u e cons ain s o ma k a posi ion ha is no s uc u ally cons ained. Posi ional
cons ain s o he s uc u e a e expec ed in do -b acke no a ion (34), using he symbols ‘(‘, ‘)’, ‘.’,
‘N’ only. libLEARNA conside s any mo i o consis o a sequence pa and a s uc u e pa o he
same leng h. A desi ed GC-con en can be added o he ile ia he ‘#GC’ ag. By de aul ,
libLEARNA designs candida es wi h a ole ance o 0.01 o a gi en GC-con en . This can be adjus ed
ia he --gc_ ole ance op ion. An example inpu ile o design RNAs o a iable leng hs om wo
p o ided sequence and s uc u e mo i s could hen e.g., look as ollows:
>libLEARNA Example1
#seq AYUUNN CUMUU
#s NN..(( ))...
#GC 0.5
In his example ile, he sequence cons ain s as well as he s uc u e cons ain s a e gi en as wo
mo i s ha o mula e a pa ially es ic ed RNA design space. The algo i hm will ensu e o s a each
candida e sequence wi h ‘A’, ‘C’ o ‘U’ ( o sa is y he ‘Y’ cons ain gi en in IUPAC no a ion), ‘U’,
‘U’, and wo posi ions wi h a bi a y nucleo ides (‘A’, ‘C’, ‘G’, ‘U’) o sa is y he ‘NN’ cons ain .
The esul ing s uc u es a e olding he candida e sequence will s a wi h wo a bi a y s uc u e
symbols ollowed by ‘..((‘. A e his mo i , he e migh be a egion ha con ains any combina ion o
nucleo ides and do -b acke symbols as he whi espace be ween he mo i s indica es a posi ion o
po en ial ex ension. Finally, he designed candida es will end wi h ‘C’, ‘U’, ‘A’ o ‘C’ (sa is ying he
‘M’ cons ain ), ‘U’, ‘U’, while he co esponding s uc u e in do -b acke no a ion will end wi h
‘))...’. The ‘#GC’- ag ensu es a GC con en o 0.5 o he designed candida es wi h he p o ided
ole ance. To be mo e explici , an example call o libLEARNA om he command line (i he
cons ain s o he example abo e we e sa ed in o a ile named example.inpu ) o design RNAs o a
maximum leng h o 50 nucleo ides wi h he p o ided cons ain s would look as ollows:
liblea na --inpu _ ile example.inpu --max_leng h 50
An example candida e hen could look as ollows:
AUUUAGUUAAAAAGGGACUCUU
....((((.......))))...
3.2.3 Desc ip ion o he P og am Op ions
The basic p og am op ions o libLEARNA a e he same as o he o he algo i hms o he lea na_ ools
package. You can un
liblea na -h
o ge a ull lis o a ailable op ions. We no e ha libLEARNA does no suppo he --
hamming_ ole ance <na u al numbe > op ion p o ided o he o he algo i hms o he lea na_ ools
package. Howe e , as we desc ibe in de ail in he ollowing, libLEARNA – by design – allows o
de ine RNA design spaces ha include mo e han one a ge s uc u e.
Since libLEARNA can design RNAs o di e en leng hs, he algo i hm has wo addi ional op ions, --
min_leng h and --max_leng h, o se ing he leng h bo de s o a design space. Fu he mo e,

libLEARNA allows o speci y a desi ed GC-con en wi h a ce ain ole ance o he design wi h he --
desi ed_gc and --gc_ ole ance op ions (see also Sec ion 3.2.2 and Sec ion 3.2.7).
3.2.4 Desc ip ion o he Ou pu
The ou pu o libLEARNA is in abula o ma like he ou pu o he o he algo i hms. Howe e , he
ields o he able di e sligh ly:
● An unnamed ield o index he solu ions.
● Id: The Id o he a ge .
● ime: The ime equi ed o ind he solu ion.
● ewa d: The ewa d ecei ed o he designed candida e solu ion.
● sequence: The sequence o he candida e solu ion.
● s uc u e: The p edic ed seconda y s uc u e o he candida e solu ion in do -b acke o ma .
● GC-con en : The G and C nucleo ide a io o he designed candida e sequence.
● leng h: The leng h o he designed solu ion.
libLEARNA allows sa ing he ou pu o a ile, using he same op ions as desc ibed o he o he
algo i hms o he lea na_ ools package, including changing he ou pu o ma wi h he --ou pu _ o ma
op ion, o sa ing s uc u e and logo plo s.
3.2.5 In e se RNA Folding wi h libLEARNA
The libLEARNA app oach allows o design RNAs om p o ided mo i s. Consequen ly, libLEARNA
can be applied o in e se RNA olding when gi en a single s uc u e mo i , a a ge seconda y
s uc u e in do -b acke o ma . Fo he “F og Foo ” example men ioned be o e, he inpu ile o
libLEARNA would look as ollows:
>F og Foo
#seq NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
#s ..........((((....))))((((....))))((((...))))
The inpu examples o he “F og Foo ” ask a e p o ided wi h he sou ce code in he examples
di ec o y. A call o libLEARNA hen could e.g., look as ollows:
liblea na --inpu _ ile examples/i _ og_ oo _example_liblea na.inpu
-- imeou 30 --num_solu ions 10
The ask o in e se RNA olding allows o compa e he di e en app oaches o he lea na_ ools
package o deepen he unde s anding and in ui ion o he di e en algo i hms. We use he 100 asks o
e sion 2 o he E e na100 benchma k(48) o ou compa ison. Figu e 3 shows he esul s o he
e alua ion using i e independen uns wi h each p og am wi h he commonly used imeou o 24
hou s.
We obse e ha all algo i hms wi h a p e ained policy ou pe o m LEARNA on his challenging
benchma k, wi h libLEARNA showing he bes pe o mance, ollowed by Me a-LEARNA and Me a-
LEARNA-Adap . This is an in e es ing esul ; in con as o he o he algo i hms, libLEARNA is no
speci ically ained o he ask o in e se RNA olding. We discuss he aining egimen o he
algo i hms in no e 6, highligh ing po en ial bene i s o he aining p ocedu e o libLEARNA
compa ed o he o he algo i hms. Howe e , o his expe imen we un libLEARNA wi h he exac
same se up as Me a-LEARNA-Adap (27) o ge compa able esul s. In pa icula , we use he --
es a _ imeou op ion o ese all weigh s o he algo i hms e e y 1800 seconds. As discussed in no e
4, his op ion can ha e subs an ial impac on he esul ing candida e sequences, essen ially p e en ing
he algo i hms om ge ing s uck in local minima o he design space. Fo Me a-LEARNA, his
op ion is no a ailable since he algo i hm is no upda ing weigh s a all, i.e., i does no adap o he
gi en ask a hand. We, he e o e, un libLEARNA wice, wi h and wi hou es a ing, and obse e
ha he pe o mance clea ly dec eases wi hou he es a ing p ocedu e. In ac , he pe o mance
seems o pla eau igh a he ime whe e he i s es a would ha e been pe o med. Since he
common e alua ion p o ocol o he E e na100 benchma k uses a long un ime o 24 hou s on each
ask, i is no e y su p ising ha p e en ing local minima ia he -- es a _ imeou op ion can ha e a
s ong impac on he pe o mance when unning an algo i hm ha adap s o a gi en ask. We,
he e o e, ecommend using his op ion when he expec ed un ime is high, o o use Me a-LEARNA
i adap a ion o he ask a hand is no equi ed.
[Figu e 3] Resul s o he compa ison o LEARNA, Me a-LEARNA, Me a-LEARNA-Adap and
libLEARNA on he asks o he E e na100 benchma k e sion 2. The plo on he le side shows he
accumula ed numbe o sol ed asks ac oss i e e alua ion uns o each algo i hm. The plo on he
igh shows he a e age numbe o sol ed asks wi h he s anda d de ia ion a ound he mean.
libLEARNA was un wice, once wi h es a ing he algo i hm e e y 1800 seconds, once wi hou he
es a ing p ocedu e.
3.2.6 Cons ained In e se RNA Folding wi h libLEARNA
In con as o LEARNA, Me a-LEARNA, and Me a-LEARNA-Adap , libLEARNA can include
sequence cons ain s using IUPAC no a ion. Fu he mo e, libLEARNA allows placing uncons ained
si es in he s uc u e using he le e ‘N’. An example inpu ile o RNA design wi h sequence
cons ain s and uncons ained posi ions in he s uc u e inpu could e.g., look as ollows:
>pa ially cons ained F og Foo example
#seq NNNAGNNNNNNNNNNNNNNNNNNNNYNNNNNNNNMKNNNNNNNNN
#s ..........((((....))))NN((....))NNNN((...))))
The p o ided s uc u e cons ain s o he example e eal ano he unique ea u e o libLEARNA which
makes i di e en compa ed o o he RNA design algo i hms: The e is no need o de ine speci ic pai s
in ad ance, i.e. libLEARNA can handle any s uc u e in o ma ion using he symbols ‘)’, ‘.’, ‘(‘, ‘N’,
including unbalanced b acke s. This allows o mo e lexibili y du ing he de ini ion o RNA design
asks (see no e 5). The pa ially cons ained example is p o ided wi h he sou ce code in he example
di ec o y. You can un libLEARNA on he example wi h:
liblea na --inpu _ ile
examples/ci _ og_ oo _example_liblea na.inpu -- imeou 30 --
num_solu ions 10
3.2.7 Au oma ed Design o Riboswi ches
While libLEARNA allows he design o RNAs om single mo i s like o in e se RNA olding, i
u he o e s he oppo uni y o de ine mo e complex design spaces. To demons a e such a mo e
ad anced use case o libLEARNA, we se up a pa ially es ic ed sea ch space o he design o
iboswi ch-like cons uc s ha we ecen ly used o design massi e amoun s o heophylline iboswi ch
candida es(28), ollowing a p e iously published p o ocol(37). We will guide h ough he de ini ion
o he design space s ep-by-s ep, using he p o ocol and he cons uc s p oposed in (37) as empla es.
We hen show how o de ine desi ed GC-con en s as pa o a ask and compa e he esul s wi h he
o iginal design p ocedu e o (37).
#seq X
#s X
Howe e , while his sea ch space includes all he iboswi ch cons uc s when conside ing a leng h
limi o 66 o 91 nucleo ides, he space is likely de ined oo gene al o gene a e easonable
candida es. Simila ly, o sea ch spaces ha only allow o e y li le explo a ion, libLEARNA
migh no always be he bes choice and some imes he de ini ion o mul iple asks ha a e
p ocessed sequen ially could be bene icial. Ano he c i ical aspec o he design space de ini ion
ela es o he design o RNAs wi h a desi ed GC-con en . Depending on he p o ided sequence
cons ain s and he leng h es ic ions o he design space, he eques ed GC-con en migh no be
eachable a all. Wi hou a p o ided imeou , libLEARNA would un in ini ely long, ying o ind
solu ions o he gi en ask.
2. Me a-Op imiza ion P ocess. Fo some use s, i migh also be in e es ing o un he
au oma ed ein o cemen lea ning pipeline using BOHB. We p o ide he espec i e sc ip s in
he gi hub eposi o y a h ps://gi hub.com/au oml/lea na_ ools. You can ollow he
ins uc ions p o ided in he eposi o y. Howe e , we no e ha he op imiza ion is a esou ce
in ensi e p ocess, and we ecommend using a compu e clus e .
3. Plo ing Seconda y S uc u es. I you expe ience any p oblems du ing plo ing o he
seconda y s uc u es wi h VARNA, please conside checking he ollowing de ails. The
lea na_ ools in e ace o VARNA expec s an ins alla ion ia conda. The call o VARNA
migh di e i i was ins alled wi hou conda e.g. using he bina ies p o ided wi h he .ja ile
a he o icial download sec ion unde
h ps:// a na.lisn.upsaclay. /index.php?lang=en&page=downloads&css= a na. I his is he
case, we ecommend o ei he change he call o VARNA in he sou ce code (sc ip
lea na_ ools/ isualiza ion.py), o o ins all VARNA ia conda. A second p oblem ha we
some imes expe ience when using VARNA is ha , depending on he s uc u e and he desi ed
esolu ion o he plo s, VARNA migh un ou o memo y. I you expe ience memo y issues,
conside changing he esolu ion o plo ing wi h VARNA, o inc ease he a ailable memo y

o ja a on you sys em. You can adjus he esolu ion using he -- esolu ion < loa ing poin
numbe > op ion a ailable o all lea na_ ools algo i hms. The de aul esolu ion is se o 8.0.
4. Sequence Di e si y. We ecall ha all algo i hms o he lea na_ ools package ei he
op imized a policy ac oss housands o di e en RNA design asks (Me a-LEARNA) o
con inue op imizing hei design policies a es ime o adap o a gi en design ask a hand
(LEARNA, Me a-LEARNA-Adap and libLEARNA). As a esul , i is e y likely ha he
algo i hms con e ge o a good solu ion o e ime, esul ing in simila sequences. Fo
LEARNA, Me a-LEARNA-Adap and libLEARNA, his clea ly has he ad an age ha hey
can p o ide solu ions e en o asks ha a e u he away om he aining da a dis ibu ion,
howe e , he di e si y o sequences dec eases o e ime. To mi iga e his p oblem, all
algo i hms excep Me a-LEARNA ha e a -- es a - imeou <na u al numbe > pa ame e ha
con ols a ese o he algo i hms. The es a pa ame e con ols he ime a e which he
weigh s o he algo i hms a e ese o he ini ial alues, meaning he algo i hms s a om
sc a ch again on he p o ided ask. Figu e 6 shows logo plo s o isualize he dis ibu ion o
nucleo ides a a gi en posi ion. We show esul s o he “F og Foo ” example om he
E e na100 benchma k men ioned be o e. In each ow, he igu e shows a un o ei he
LEARNA, Me a-LEARNA, o Me a-LEARNA-Adap wi h di e en numbe s o solu ions
eques ed. A he bo om, we gene a e 100 and 1000 solu ions wi h LEARNA and Me a-
LEARNA-Adap using he es a imeou op ion o ese he algo i hms e e y 5 seconds
(op ion no a ailable o Me a-LEARNA). We obse e ha all algo i hms con e ge o a
simila majo nucleo ide pa e n, mainly p edic ing ‘C’ and ‘G’ nucleo ides o pai ed
posi ions and ‘A’ o unpai ed posi ions o he ask, wi h inc easing numbe o solu ions
eques ed. Howe e , he es a ing p ocedu e escues his beha io o bo h, LEARNA and
Me a-LEARNA-Adap , and he esul ing logos show a high di e si y, like he case when he
algo i hms did no ye adap s ongly o he a ge (when eques ing 10 solu ions only).
Howe e , his di e si y comes a he cos o longe un imes.
[Figu e 6] Logo plo s o LEARNA, Me a-LEARNA, and Me a-LEARNA-Adap o di e en
numbe s o designed candida es o he F og Foo example o he E e na100 benchma k. Each column
shows he logo plo s o one o he algo i hms o di e en numbe s o eques ed solu ions. The wo
bo om ows show he logo plo s o LEARNA and Me a-LEARNA-Adap when es a ing he
algo i hms e e y 5 seconds o he design o 100 and 1000 (1k) candida es. While all algo i hms seem
o con e ge o a simila majo design pa e n, he es a op ion seems o help he algo i hms o
p o ide mo e di e se solu ions.
5. Flexible S uc u e De ini ions. libLEARNA implemen s he new RNA design pa adigm,
pa ial RNA design, which allows he use o p o ide a bi a y RNA sequence and s uc u e
mo i s o de ine a design space. Typically, RNA design algo i hms equi e an en i e
seconda y s uc u e as inpu . All pai ing posi ions ha e o be de ined in ad ance and he inpu
is only alid i he numbe s o opening and closing b acke s a e equal when using he do -
b acke no a ion. I is, he e o e, no ewo hy ha he e is no such es ic ion on he s uc u e
inpu o he mo i s no he en i e s uc u e in libLEARNA. In con as , he designe can
eely de ine he equi emen s o a gi en design ask. Fo example, he ollowing inpu s a e
conside ed alid de ini ions o he sequence and s uc u e dimensions o sea ch spaces o
libLEARNA:
>unbalanced example 1
#seq NNNNNNNNNN
#s NNNNNNN)))
>unbalanced example 2
#seq NNN NNN
#s NN( ))N
>unbalanced example 3
#seq XN
#s X)
Example 1 conside s a design space o 10 nucleo ides leng h ha con ains s uc u es ha end
wi h h ee closing b acke s (‘)))’). Example 2 de ines a space om wo mo i s, while posi ion
3 de ines he opening posi ion o a base pai . The p o ided candida es o libLEARNA will
old in o s uc u es ha end wi h a ‘))N’ pa e n, whe e ‘N’ can ei he be a do o a closing
b acke (since he s uc u e canno end wi h an opening b acke ). Finally, example 3 is a e y
un es ic ed sea ch space whe e he only es ic ion is ha he esul ing candida es old in o
s uc u es ha ha e a closing b acke a he las posi ion.
As a esul o hese lexible ask de ini ions, libLEARNA canno ake simila ad an age o
known base pai s as LEARNA, Me a-LEARNA, o Me a-LEARNA-Adap . While hese
algo i hms au oma ically place complemen a y pai ing pa ne s a posi ions ha ough o be
pai ed in he inpu s uc u e ( ollowing a Wa son-C ick base pai scheme), o libLEARNA
he exac pai ing posi ions migh be unknown. Howe e , libLEARNA le e ages a simila
app oach in case he pai ing pa ne is clea ly dis inguishable, i.e., he e is no un es ic ed
s uc u e posi ion be ween he cu en opening b acke and he nex closing b acke . One
ad an age o libLEARNA, he e o e, is ha i can gene a e sequences wi h non-Wa son-C ick
base-pai ing pa e ns.
Howe e , libLEARNA op ionally allows o indica e ma ching pai s wi hin he ask
desc ip ion using a unique iden i ie , a na u al numbe ollowing di ec ly a e he espec i e
pai ed posi ion o index he pai . As an example, he ollowing ask desc ip ion is alid o
libLEARNA:
>explici pai ing example
#seq NNN NNN
#s (1(N N)1N
The s uc u e pa de ines a sea ch space, whe e posi ion 1 o he i s mo i and posi ion 2 o
he second mo i should be pai ed. Du ing he design, hese explici inpu s a e ea ed as so
cons ain s: The likelihood o he wo posi ions o pai is inc eased by placing complemen a y
Wa son-C ick pai s a he explici posi ions, howe e , he pai ing is no en o ced du ing he
design p ocess. I is also alid o p o ide mul iple indices in a single design space using
unique iden i ie s, o o lea e some o he pai s wi hou indices. We no e ha libLEARNA
cu en ly only suppo s he design o nes ed RNA s uc u es. Howe e , non-nes ed indexing
will no esul in an e o bu migh no lead o he desi ed ou come.
Some imes, he ex ension wi h a bi a y s uc u e symbols migh no be desi ed. Fo example,
designing candida es wi h a a iable leng h o 30 o 50 nucleo ides ha con ain a single
hai pin o i e o en nucleo ides wi h a GNRA e a-loop appea s as a alid design goal.
Howe e , when indica ing he egions o ex ension wi h whi espaces (o X), he esul ing
candida es migh con ain mo e han one hai pin. libLEARNA, he e o e, allows o de ine
egions o ex ensions wi h do s (‘.’) only. These egions can be de ined wi h he le e ‘O’.
An inpu ile o he hai pin design app oach desc ibed abo e could hen e.g. look as ollows:
>Single hai pin design example
#seq XNNNNNNNNNNGNRANNNNNNNNNNX
#s ONNNNN(((((....)))))NNNNNO
6. Unique Fea u es o libLEARNA. Besides he de ini ion o asks, libLEARNA shows some
di e ences compa ed o he o he algo i hms o he lea na_ ools package. I uses a comple ely
new aining egimen, inspi ed by a masked p edic ion ask ha was e.g. used o ain he
language model Be (49). While Me a-LEARNA and Me a-LEARNA-Adap a e pu ely
ained on asks o he in e se RNA olding p oblem, gi en only a s uc u e in do -b acke
no a ion, he aining asks o libLEARNA a e mo e di e se and migh con ain sequence
cons ain s. Gene ally, libLEARNA employs he same da a pipeline as he o he wo
algo i hms. Howe e , in an addi ional p ep ocessing s ep, oughly wo hi d o he aining
da a poin s a e masked such ha sequence and s uc u e cons ain s al e na e, oughly 20
pe cen a e masked a andom and he emaining samples a e used as in e se RNA olding
asks. Table 1 shows examples o each o hese classes o aining samples. The gene al ask
du ing aining o libLEARNA can be in e p e ed as illing he missing pa s o he sequence
and he s uc u e inpu .
[Table 1] Examples o di e en asks o he aining da a o libLEARNA.
Task Desc ip ion
Task Space
Example
Pe cen age o Da a
In e se RNA Folding
Sequence
NNNNNNNNNNNN
11,5
S uc u e
...(((...)))
Al e na ing
Cons ain s
Sequence
NNAUGNNNCCNN
66,7
S uc u e
..NNN(..NN))
Random Masking
Sequence
NNANNGANNNNA
21,8
S uc u e
N..NN(..NNN)
The inclusion o sequence cons ain s also a ec s he s a e composi ion o libLEARNA. In
con as o LEARNA, Me a-LEARNA, and Me a-LEARNA-Adap , which only use he
s uc u e inpu o p o iding s a es o he agen , libLEARNA uses he s uc u e and he
sequence inpu s o de ine he s a es. The s a e space o libLEARNA, hus, is much la ge bu

also allows o mo e lexible ask de ini ions. The sliding window app oach, howe e ,
emains he same as o he o he me hods.
As a esul o he changes, he policy lea ned by libLEARNA seems o be less biased owa ds
speci ic nucleo ide composi ions. The e a e se e al easons ha migh explain his
obse a ion. Fi s ly, he la ge s a e space o libLEARNA makes i ha de o lea n speci ic
pa e ns o a gi en s a e. Also, he same s a e is less likely o appea mo e han once du ing
aining due o he andom masking p ocedu e and he addi ional sequence space. Secondly,
he masking leads o mo e ambiguous da a and a b oade solu ion space, making i less
a ac i e o op imize o a single solu ion o a gi en ask. Finally, he masking p ocedu e
o en en o ces he p edic ion o single nucleo ides a he han Wa son-C ick base pai s a a
gi en pai ed posi ion. The algo i hm hus mus be able o ind solu ions wi h p edic ions o
single nucleo ides only. This leads o gene al di e ences in he lea ning p ocedu e, educes
biases and migh lead o an inc eased base pai ing epe oi e, e.g., libLEARNA can p edic
solu ions wi h Wobble-pai s (‘G-U’) which he o he algo i hms canno .
7. RNA Design wi h Desi ed GC-Con en s. When designing RNAs wi h desi ed GC-con en s
using libLEARNA, we would like o no e ha libLEARNA was no ained o gene a e
sequences wi h ce ain GC-con en s. The algo i hm op imizes he design o he asks
con ained in a gi en design space o maximize i s ewa d. The GC-con en is a e m in he
ewa d unc ion o libLEARNA ha is se on op o he s uc u e-based ewa d sco e ha i
was ained on. Howe e , libLEARNA is qui e sensi i e o his change in he ewa d unc ion
and adap s quickly o he design o sequences wi h a speci ic GC con en . Ne e heless,
depending on he sea ch space, he adap a ion migh equi e mo e ime han unning
libLEARNA wi hou GC-con en con ol.
4.1 Concluding Rema ks
The lea na_ ools Py hon package p o ides ad anced machine lea ning algo i hms o he design o
RNAs. In pa icula , he mos ecen algo i hm, libLEARNA, hi s no el g ound wi h i s abili y o
design RNAs wi h desi ed p ope ies om p o ided mo i s. We will con inue he de elopmen o
libLEARNA o exploi i s ull po en ial in he u u e. Howe e , machine lea ning me hods, and in
pa icula deep lea ning app oaches, o biological applica ions ecen ly go in o he ocus o he deep
lea ning communi y. We expec his de elopmen o con inue and an icipa e ha se e al deep lea ning
me hods o biological applica ions will be de eloped in he u u e, including me hods ela ed o he
ield o RNA s uc u al biology. The i s s eps in his di ec ion can al eady be seen oday. Fo
example, ecen wo k uses deep lea ning o design RNAs including 3D s uc u e in o ma ion(50)(51),
an in e es ing app oach ha will likely ge o he g ound wi h an e e g owing numbe o 3D RNA
s uc u e da a being a ailable in he u u e. Fu he , he de elopmen o me hods ha deal wi h
s uc u al biology p oblems ac oss di e en molecule classes(52), including p o eins, RNAs, o small
o ganic compounds, could ha e he po en ial o e olu ionize s uc u e p edic ion and design in
gene al. In his con ex , gene a i e me hods based on ans o me s(53) and di usion models(54)
could play a key ole. We belie e ha deep lea ning has he po en ial o ha e a las ing impac on he
ield o biology and a e he e o e looking o he u u e wi h exci emen .
5 Acknowledgemen s
This esea ch was unded by he Deu sche Fo schungsgemeinscha (DFG, Ge man Resea ch
Founda ion) unde g an numbe 417962828. The au ho s u he acknowledge suppo by he s a e o
Baden-Wü embe g h ough bwHPC and he Ge man Resea ch Founda ion (DFG) h ough g an no
INST 39/963-1 FUGG.
6 Re e ences
[1] Jumpe J e al (2021) Highly accu a e p o ein s uc u e p edic ion wi h AlphaFold. Na u e
596(7873):583-589.
[2] Lin Z e al (2023) E olu iona y-scale p edic ion o a omic-le el p o ein s uc u e wi h a language
model. Science 379(6637):1123-1130.
[3] Wa son JL e al (2023) De no o design o p o ein s uc u e and unc ion wi h RFdi usion. Na u e
620(7976):1089-1100.
[4] LeCun Y, Bengio Y, Hin on GE (2015) Deep lea ning. Na u e 521(7553):436-444.
[5] Ho nik, K, S inchcombe M, Whi e H (1989) Mul ilaye eed o wa d ne wo ks a e uni e sal
app oxima o s. Neu al ne wo ks 2.5:359-366.
[6] Cybenko, G (1989) App oxima ion by supe posi ions o a sigmoidal unc ion. Ma hema ics o
con ol, signals and sys ems 2.4:303-314.
[7] Sil e , D e al (2016) Mas e ing he game o Go wi h deep neu al ne wo ks and ee sea ch. Na u e
529(7587):484-489.
[8] Vinyals, O e al (2019) G andmas e le el in S a C a II using mul i-agen ein o cemen lea ning.
Na u e 575(7782):350-354.
[9] K izhe sky A, Su ske e I, Hin on GE (2012) Imagene classi ica ion wi h deep con olu ional
neu al ne wo ks. Ad ances in neu al in o ma ion p ocessing sys ems 25.
[10] Singh J e al (2019) RNA seconda y s uc u e p edic ion using an ensemble o wo-dimensional
deep neu al ne wo ks and ans e lea ning. Na u e communica ions 10.1:5407.
[11] Sa o K, Akiyama M, Sakakiba a Y (2021) RNA seconda y s uc u e p edic ion using deep
lea ning wi h he modynamic in eg a ion. Na u e communica ions 12.1:941.
[12] F anke JKH, Runge F, Hu e F (2022) P obabilis ic T ans o me : Modelling Ambigui ies and
Dis ibu ions o RNA Folding and Molecule Design. Ad ances in Neu al In o ma ion P ocessing
Sys ems 35:26856-26873.
[13] Townshend RJL e al (2021) Geome ic deep lea ning o RNA s uc u e. Science
373(6558):1047-1051.
[14] Vale i JA e al (2020) Sequence- o- unc ion deep lea ning amewo ks o enginee ed
ibo egula o s. Na u e communica ions 11.1:5058.
[15] Tue k C, Gold L (1990) Sys ema ic e olu ion o ligands by exponen ial en ichmen : RNA
ligands o bac e iophage T4 DNA polyme ase. science 249(4968):505-510.
[16] Im J, Pa k B, Han K (2019) A gene a i e model o cons uc ing nucleic acid sequences binding
o a p o ein. BMC genomics 20.13:1-13.
[17] Iwano N e al (2022) Gene a i e ap ame disco e y using Rap Gen. Na u e Compu a ional
Science 2.6:378-386.
[18] Di Gioacchino A e al (2022) Gene a i e and in e p e able machine lea ning o ap ame design
and analysis o in i o sequence selec ion. PLoS compu a ional biology 18.9:e1010561.