1
Compa a i e s udy o nanopo e phenylalanine clamp a ian s e eals unique pep ide
biosensing and classi ica ion p ope ies
Jenni e M. Colby1 and B yan A. K an z2*
1P esen add ess: P emie Bio ech Labs, 723 Kaso a A e SE, Minneapolis, MN 55414, U.S.A.
2Depa men o Mic obial Pa hogenesis, School o Den is y, Uni e si y o Ma yland, Bal imo e,
650 W. Bal imo e S ee , Bal imo e, MD 21201, U.S.A.
* Co esponding au ho
B yan A. K an z
Depa men o Mic obial Pa hogenesis
School o Den is y, Uni e si y o Ma yland, Bal imo e
650 W. Bal imo e S ee
Bal imo e, MD 21201, U.S.A
(410) 706-1656
[email p o ec ed]
Running i le: Nanopo e clamp a ian s as biosenso s
P ep in se e : deposi ed a bioRxi unde doi: h ps://doi.o g/10.1101/2025.08.22.671566 wi h
a CC-BY 4.0 In e na ional license.
Classi ica ion: Majo : Biological Sciences, Mino : Biophysics and Compu a ional Biology
Keywo ds: pep ide biosenso , machine lea ning, nanopo e, an h ax oxin, p o ec i e an igen,
elec ophysiology, ansloca ion
2
Abs ac
Ra ionally enginee ing biological nanopo es is c i ical o ad ancing single-molecule biosensing.
He e, we in es iga e he phenylalanine clamp ac i e si e (ϕ clamp) o he an h ax oxin
p o ec i e an igen (PA) nanopo e, a key si e o molecula in e ac ion, o es i enginee ing his
si e can imp o e pep ide classi ica ion. We pe o med a compa a i e analysis o wild- ype PA
and wo ϕ-clamp mu an s (F427A, F427Y). We epo he pa adoxical inding ha he F427A
mu an —known o be a de ec i e la ge p o ein anslocase—is a supe io pep ide biosenso .
Using a machine lea ning amewo k wi h enginee ed biophysical ea u es, he F427A po e
classi ies a di e se pep ide se wi h 93% accu acy. Ou analysis sugges s his enhanced
pe o mance a ises because he F427A mu a ion, while weakening speci ic in e ac ions,
p oduces mo e consis en , lowe - a iance kine ic ‘ inge p in s’ ha a e mo e easily dis inguished
by compu a ional models. These indings es ablish a p inciple o biosenso design and enable a
s a egy whe e enginee ed po es wi h complemen a y speci ici ies a e deployed in mul iplexed
a ays o obus diagnos ics.
3
Signi icance S a emen
De eloping nanopo e biosenso s o accu a e pep ide de ec ion is a majo goal in
bio echnology. He e, we enginee ed he ‘ϕ-clamp’ ac i e si e o he an h ax oxin nanopo e o
imp o e i s pe o mance as a senso . We epo a pa adoxical disco e y: a mu an po e (F427A)
known o be a de ec i e p o ein anslocase is a a supe io pep ide biosenso , classi ying
pep ides using machine lea ning wi h 93% accu acy. This coun e in ui i e esul a ises because
he mu a ion p oduces weake bu mo e consis en elec ical signals, c ea ing a mo e eliable
inge p in o compu a ional analysis. This wo k e eals a powe ul new p inciple o a ional
biosenso design— ha delibe a ely a enua ing a na u al biological unc ion can enhance an
enginee ed one—p o iding a clea s a egy o c ea ing nex -gene a ion diagnos ic ools.
4
In oduc ion
The p ecise iden i ica ion and quan i ica ion o pep ides and p o eins a e undamen al o
a wide ange o applica ions, om medical diagnos ics o en i onmen al moni o ing. Pep ide
bioma ke s hold immense p omise o ea ly de ec ion o diseases, like hea disease, cance ,
and in ec ious diseases (1-3). Howe e , e ec i ely analyzing complex and o en dilu e pep ide
mix u es demands analy ical pla o ms wi h ex ao dina y sensi i i y and speci ici y. While many
con en ional echniques s uggle wi h hese equi emen s, single-molecule nanopo e biosensing
o e s a powe ul, label- ee al e na i e (4). This echnology ope a es by de ec ing disc e e
changes in ionic cu en as indi idual molecules pass h ough a nanome e -scale po e,
p o iding a eal- ime high- esolu ion eadou o molecula p ope ies. Beyond simple de ec ion,
his pla o m is a leading candida e o he ambi ious goal o di ec , high- h oughpu pep ide and
p o ein sequencing—an a ea o esea ch ha s ill lacks a widely applicable solu ion (5-7).
A he hea o nanopo e sensing a e biological p o ein channels, such as an h ax oxin
PA, which p o ide a highly con olled and unable en i onmen o molecula in e ac ions (8-13).
These po es, when embedded in a memb ane, can cap u e he dynamic, picoamp-scale cu en
luc ua ions, as ansloca ing molecules in e ac wi h he nanopo e's in e nal ea u es. The
esul ing ionic cu en signa u es a e ich in in o ma ion, bu hei complexi y p esen s a
signi ican challenge, namely in ex ac ing adequa e analy ical de ail o enable eliable
disc imina ion o di e se se s o pep ide analy es. C i ically, a single unmodi ied biosenso ,
ega dless o i s pe o mance, may no possess he necessa y disc imina o y powe o
accu a ely classi y he wide s uc u al and chemical di e si y ound in eal-wo ld pep ide
mix u es. This sugges s a need o a new pa adigm in nanopo e biosenso design.
The an h ax oxin PA channel has eme ged as a p emie biological nanopo e model
sys em o polypep ide analysis (14). As a na u al p o ein anslocase, PA is excep ionally obus
and acili a es he p ocessi e, ol age- and/o p o on g adien -d i en anspo o pep ides (15,
16). I achie es his wi hou he need o DNA e he s o o he labels (17-20). PA uses mul iple
5
in e nal loops and cle s, called ‘pep ide-clamp’ si es, such as he phenylalanine clamp (ϕ
clamp), o in e ac wi h he ansloca ing molecule (9, 12, 13, 18, 20, 21). These in e ac ions
gene a e complex, mul i-s a e ionic cu en signa u es wi h dis inc kine ic and conduc ance
cha ac e is ics (20), which con ain he in o ma ion necessa y o bo h pep ide classi ica ion (22)
and, in p inciple, sequencing. The inhe en s uc u al modula i y o he PA nanopo e makes i an
ideal pla o m o p o ein enginee ing o ine- une i s biosensing p ope ies.
Analyzing he oluminous and o en noisy da a om hese mul i-s a e sys ems
necessi a es ad anced compu a ional me hods. Machine lea ning (ML) and deep lea ning a e
pa icula ly well-sui ed o his ask, as hey can iden i y sub le, high-dimensional pa e ns in
ansloca ion kine ics ha a e inaccessible o adi ional analysis (22-28). Howe e , he ull
po en ial o hese compu a ional ools has ye o be ealized, especially when dealing wi h he
challenge o analyzing complex pep ide mix u es.
In his wo k, we hypo hesize he e ha a single unmodi ied senso , op imized o ce ain
ypes o pep ides, may no pe o m as success ully ac oss a wide a ay o di e en analy es. To
add ess his, we a e le e aging he PA nanopo e as a model sys em, employing p o ein
enginee ing o he ϕ-clamp si e o gene a e a se ies o no el biosenso s. We hen use ad anced
ML me hods o cha ac e ize he unique classi ica ion s eng hs o each a ian . Ou esul s
demons a e ha an ensemble o mul iplexed a ay o hese enginee ed biosenso s, wi h hei
complemen a y de ec ion cha ac e is ics, can achie e a well-balanced pe o mance,
su moun ing he pe o mance o he wild- ype (WT) po e. This ensemble app oach ep esen s a
signi ican s ep o wa d o p o ein nanopo e-based analysis in bio echnology, pa ing he way
o mo e accu a e pep ide and p o ein sequencing capabili ies.
Resul s
Nanopo e a ian s udy design. Ou p io s udy le e aged he well-
cha ac e ized an h ax oxin PA nanopo e as a single-molecule pep ide biosenso (22). The PA
6
nanopo e's a chi ec u e includes a key na ow cons ic ion si e, known as he ϕ clamp, which is
o med by a adially symme ic a angemen o phenylalanine esidues (F427) (13). This si e is
posi ioned o make close con ac wi h ansloca ing p o eins, e ec i ely ‘clamping’ hem and
he eby p omo ing p oduc i e p o ein un olding and ansloca ion. Mo eo e , he in e ac ion
leads o ull and pa ial blockade o he channel o ion low in a highly dynamic and pep ide-
speci ic manne (20, 22). Ou cen al hypo hesis was ha uning he iden i y o he esidue a
his F427 posi ion would al e he nanopo e's selec i i y and, consequen ly, i s capabili ies as a
pep ide classi ie . To es his, we enginee ed wo speci ic a ian s (i) F427A ( eplacing Phe wi h
he much smalle Ala) and (ii) F427Y ( eplacing he si e wi h he sligh ly bulkie , ye mo e
hyd ophilic Ty ). We hen compa ed hei classi ica ion pe o mance o he WT PA nanopo e.
To e alua e ou hypo hesis, we used a p e iously cha ac e ized se o se en 10-
esidue gues -hos pep ides o he sequence KKKKKXXSXX (whe e X is he gues esidue) (17,
20, 22). The gues esidues included non-a oma ic side chains (Ala, Leu, Th ) and a oma ic side
chains (Phe, T p, Ty ). A inal a ian , gues -hos T pDL, ea u ed an al e na ing pa e n o ᴅ-
and ʟ-s e eochemis y o p obe how pep ide backbone dynamics in luence he ansloca ion
signal. P io wo k has demons a ed ha hese pep ides can be e icien ly classi ied by he WT
PA nanopo e using ML wi h ~0.90 accu acy (22), and we used hese esul s as a benchma k o
he enginee ed F427A and F427Y nanopo es.
Da a collec ion and p ep ocessing. In p e ious wo k, we collec ed WT PA nanopo e
ansloca ion e en s eams o he gues -hos pep ide se ies. These expe imen s we e
pe o med unde a cons an d i ing o ce o 70 mV (cis posi i e), a condi ion known o s ongly
a o p oduc i e ansloca ion e en s and p oduce obus cu en signals (22). We simila ly
collec ed single-channel pep ide ansloca ion eco dings o he F427A and F427Y nanopo e
a ian s unde he same 70 mV d i ing o ce, wi h da a s eams ex ending up o 30 minu es pe
pep ide. The aw da a om hese expe imen s consis ed o complex ionic cu en signals (Fig.
1). To p epa e his da a o ML, we de eloped a p ep ocessing pipeline. Fi s , using a K-Means
7
clus e ing algo i hm, he aw e en s eams we e labeled acco ding o hei co esponding
conduc ance s a es. While a e in e media e s a es we e occasionally obse ed, ou analysis
iden i ied ou obus and well-popula ed s a es ha consis en ly desc ibed mos o he cu en
signal. These s a es a e enume a ed 0 h ough 3, co esponding o a ully blocked (s a e 0),
pa ially blocked (s a es 1 and 2), and ully open (s a e 3). This s a e labeling p ocedu e was
benchma ked agains expe elec ophysiology so wa e, CLAMPFIT, and deemed o be highly
consis en ye compu a ionally much mo e e icien . These s a e-labeled eco ds we e hen
ca e ully segmen ed in o indi idual ansloca ion e en s. We se led on using only e en s
g ea e han o equal o 15 ms in leng h in all ou subsequen analyses o cap u e he mos da a
while excluding lowe -in o ma ion sho du a ion spiky e en s. Finally, a se o biophysical
ea u es, such as dwell ime, cu en blockade le el, and signal a iance, we e compu ed o
desc ibe each indi idual e en (see Ma e ials and Me hods in SI Appendix o a comple e lis o
ea u es). These comp ehensi e ea u e se s, ep esen ing he unique biophysical signa u e o
each pep ide ansloca ion e en , we e hen used as he aining da a o ou ML classi ica ion
models.
Unsupe ised clus e ing analysis. To assess he in insic sepa abili y o he pep ide
ea u es ex ac ed om each nanopo e a ian , we pe o med an unsupe ised clus e ing
analysis using Uni o m Mani old App oxima ion and P ojec ion (UMAP) (Fig. 2). UMAP is a
dimensionali y educ ion echnique ha is well-sui ed o isualizing he high-dimensional da a,
e ealing he unde lying s uc u e and g ouping o ea u e se s. To quan i a i ely e alua e he
quali y o he clus e ing, we calcula ed he Adjus ed Rand Index (ARI) and No malized Mu ual
In o ma ion (NMI) o each nanopo e, compa ing he UMAP-de i ed clus e s agains he g ound
u h pep ide labels. In he analysis, using a 15 ms minimum e en du a ion il e , WT PA
nanopo e had an ARI o 0.0648 and NMI o 0.0864; PA F427A had an ARI o 0.0536 and NMI o
0.0775; and PA F427Y had an ARI o 0.0947 and NMI o 0.1424. A highe ARI and NMI alue
indica es ha he ea u e se o a gi en nanopo e mo e e ec i ely clus e s he pep ides in o
8
g oups ha align wi h hei ue iden i ies. Visual UMAP plo s o he F427Y a ian co obo a ed
hese me ics, showing less sca e and cleane clus e s compa ed o he F427A a ian (Fig.
2). In e es ingly, hese esul s sugges ha he ea u es om he PA F427Y nanopo e, which will
be shown o ha e lowe pe o mance in supe ised classi ica ion es s, demons a e supe io
inhe en clus e ing capabili ies. This highligh s a c i ical dis inc ion be ween unsupe ised and
supe ised lea ning asks. While he ea u es o he F427Y nanopo e may c ea e igh , well-
de ined clus e s, hese clus e s may be posi ioned oo closely o one ano he in he ea u e
space, making i di icul o a supe ised classi ie o es ablish a clea decision bounda y.
Con e sely, a nanopo e wi h less igh ly clus e ed da a (like WT o F427A) may s ill ha e
g ea e clus e sepa a ion, which is a mo e a o able condi ion o a supe ised model o lea n
and gene alize om. This obse a ion unde sco es he ac ha he ul ima e me ic o success in
his s udy is no he inhe en sepa abili y o he da a, bu he pe o mance o he classi ica ion
models buil upon i . The ollowing sec ion will he e o e ocus on he supe ised lea ning
pe o mance, whe e he ue disc imina i e powe o each nanopo e a ian is di ec ly
measu ed.
Ini ial single-s age classi ica ion pe o mance o nanopo e a ian s. To es ablish a
pe o mance baseline, we i s ained indi idual single-s age eX eme G adien Boos ing
(XGBoos ) classi ie s o each nanopo e a ian : WT, F427A, and F427Y (Table S1). This
p elimina y analysis e ealed dis inc pe o mance p o iles o each nanopo e, sugges ing a
po en ial o syne gis ic gains h ough an ensemble app oach. The WT nanopo e achie ed a
mean mac o-a e aged F1-sco e o 0.8652 (±0.0087) (N=5), while he F427A a ian
demons a ed supe io o e all pe o mance wi h a mean F1-sco e o 0.8869 (±0.0089). In
con as , he F427Y nanopo e pe o med he lowes , wi h a mean F1-sco e o 0.7829 (±0.0088).
A de ailed, class-by-class analysis o he indi idual F1-sco es e ealed he speci ic
s eng hs o each nanopo e (Fig. 3). The F427A nanopo e demons a ed a ma ked inc ease in
p edic i e powe o gues -hos pep ides con aining small non-a oma ic side chains. Speci ically,
9
i s F1-sco es o gues -hos Ala (0.95) and gues -hos Th (0.94) we e subs an ially highe han
hose achie ed by he WT nanopo e (0.80 and 0.74, espec i ely) (Table S1). Con e sely, he
WT nanopo e showed a sligh ad an age in classi ying some a oma ic gues -hos pep ides,
such as gues -hos T pDL, wi h an F1-sco e o 0.96 compa ed o 0.91 o F427A. The WT
nanopo e also pe o med well on gues -hos Phe, T p and Ty . Con usion ma ices simila ly
show he high deg ee o gues -hos Ala and Th con usion o he WT nanopo e ela i e o he
F427A e sion (Fig. S1). These complemen a y s eng hs—F427A excelling wi h small non-
a oma ic side chains and WT pe o ming somewha be e on a oma ic ones—mo i a ed he
design o a mo e sophis ica ed, mul is age ensemble classi ie o le e age he unique
in o ma ion om each nanopo e a ian .
Mul is age ensemble classi ie a chi ec u e. To e ec i ely maximize he dis inc
single-molecule in o ma ion om each nanopo e a ian , we designed a mul is age ensemble
classi ie based on a p obabilis ic blending app oach (Fig. 4A). In he i s s age, indi idual
XGBoos classi ie s we e ained o each nanopo e a ian (WT, F427A, o F427Y). These
models we e con igu ed o ou pu a p obabilis ic ec o (so max sco es) indica ing he
con idence o an e en being om an a oma ic pep ide class. In he second s age, a sepa a e
XGBoos model was ained on he augmen ed ea u es, which now included he S age 1
p obabili ies, o p edic pep ide-speci ic p obabili ies o bo h a oma ic and non-a oma ic
pep ides. Finally, a me a-classi ie ook as inpu he combined ea u e se om he aw da a and
he p obabilis ic ou pu s om bo h s ages 1 and 2 o p oduce he inal pep ide p edic ions. Fo
ins ance, in a wo-nanopo e combina ion (e.g., WT/F427A), he me a-classi ie 's inpu ec o
was a blend o all so max sco es om bo h nanopo es om bo h s ages. This a chi ec u e
allowed he me a-classi ie o op imally blend he con idence sco es om he di e en nanopo e
classi ie s, e ec i ely iden i ying and ampli ying he unique and complemen a y in o ma ion
p o ided by each a ian o imp o e o e all classi ica ion accu acy.
16
Table 1. Mul is age ensemble p obabilis ic blending classi ie pe o mance1.
Nanopo e(s)
O e all
accu acy2
Mac o-
a e aged
F1-sco e2
F1-sco e pe gues -hos pep ide class3
Ala
Leu
Phe
Th
T p
T pDL
Ty
WT
0.8874
(±0.0094)
0.8870
(±0.0094)
0.81
0.95
0.93
0.77
0.95
0.98
0.91
F427A
0.9323
(±0.0041)
0.9322
(±0.0042)
0.96
0.96
0.89
0.97
0.96
0.94
0.89
F427Y
0.8365
(±0.0042)
0.8367
(±0.0047)
0.97
0.92
0.67
0.94
0.82
0.86
0.72
WT/F427A
0.9112
(±0.0019)
0.9112
(±0.0018)
0.90
0.96
0.90
0.88
0.95
0.93
0.88
WT/F427Y
0.8632
(±0.0060)
0.8632
(±0.0062)
0.86
0.93
0.83
0.83
0.89
0.92
0.84
F427A/F427Y
0.8909
(±0.0061)
0.8909
(±0.0061)
0.97
0.94
0.81
0.96
0.90
0.91
0.82
WT/F427A/F427Y
0.8848
(±0.0033)
0.8847
(±0.0032)
0.89
0.94
0.85
0.88
0.91
0.91
0.86
1Pe o mance based on es se e alua ion me ics o mul is age ensemble p obabilis ic
classi ica ion model.
2O e all accu acy and mac o-a e aged F1-sco e alues a e means and s d. de . (N=5).
3Bes se o pe -class F1-sco es ou o N=5 eplica es.
17
Figu es
Fig. 1. Nanopo e sys em and ansloca ion e en s eams o F427X a ian nanopo es.
(A) Sagi al sec ion o he an h ax oxin PA7 nanopo e c yo-EM s uc u e (PDB: 3J9C) (8)
ende ed in Chime a (29) as a molecula su ace. Pep ide clamps and loop ac i e si es a e
colo ed and labeled: α clamp (magen a), 397-loop (cyan), ϕ clamp (g een), and cha ge clamp
( ed). O e all scale o he uppe es ibule and elonga ed lowe β ba el a e indica ed. Na owes
poin , a he ϕ clamp, has a luminal diame e o 6 Å. Di ec ion o ansloca ion om cis o ans is
indica ed by an a ow. Memb ane bilaye posi ion is indica ed wi h a solid g ay ec angle.
Rep esen a i e eco dings o gues -hos pep ide ansloca ions ca ied ou a 70 mV (cis
posi i e) in symme ic succina e bu e , 100 mM KCl, pH 5.6 o (B) F427A and (C) F427Y
nanopo es. To he le a e he s anda d h ee le e name o he gues esidue. These
nanopo e-pep ide sys ems popula e mul iple disc e e pa ially o ully blocked in e media es
(app oxima e loca ions indica ed by dashed lines). F om bo om o op o each eco d: ully
blocked (s a e 0), pa ially blocked in e media es (s a e 1 and s a e 2), and ully open baseline
(s a e 3). Scaleba a he uppe igh o each panel deno es 4 pA by 100 ms o gues -hos Ala,
Leu, Phe, Th , and Ty pep ides. Fo gues -hos T p and T pDL pep ides, i is 4 pA by 500 ms o
show hei cha ac e is ically longe e en s. No e well ha F427A nanopo es conduc ~30% mo e
han F427Y channels (13).
18
Fig. 2. Unsupe ised clus e ing analysis o ea u e se desc ibing ansloca ion e en s
om mu an nanopo es. UMAP clus e ing analysis o e en -le el ea u es. Da a poin s,
ep esen ing indi idual ansloca ion e en s, a e colo ed by gues -hos pep ide iden i y: Ala
(black), Leu ( ed), Phe (g een), Th (blue), T p (yellow), T pDL (magen a), and Ty (cyan). (A)
UMAP embedding o F427A nanopo e wi h ARI o 0.0536 and NMI o 0.0775. (B) UMAP
embedding o F427Y nanopo e e en s wi h ARI o 0.0947 and NMI o 0.1424. E en s in ei he
panel we e il e ed a a minimum e en du a ion o 15 ms, and UMAP se ings we e n_neighbo s
= 15 and min_dis = 0.5.
19
Fig. 3. Pe -pep ide class F1-sco es o indi idual nanopo e ϕ-clamp a ian s. Pe -class F1-
sco es o single-s age XGBoos classi ica ion o WT (black), F427A ( ed), and F427Y (blue)
nanopo es. Plo ed alues ep esen he bes pe o ming e alua ion ou o 5 eplica e ainings.
20
Fig. 4. Mul is age ensemble p obabilis ic blending classi ie . (A) Block diag am o
mul is age ensemble p obabilis ic blending classi ie . The model p ocesses aw ansloca ion
ea u es (F) h ough a wo-s age classi ica ion and blending pipeline. S age 1 classi ie s, one o
each nanopo e a ian , p edic he a oma ic class p obabili y (PA) o a gi en ansloca ion
e en . These p obabili ies a e hen used o augmen he o iginal ea u es (F) o S age
2 classi ica ion. The S age 2 classi ie s p edic pep ide-speci ic p obabili ies o bo h a oma ic
(PApep ide) and non-a oma ic (PNApep ide) pep ides. Finally, a Me a-Classi ie blends he p obabili ies
om bo h S age 1 (dashed line connec ions) and S age 2 o p oduce he inal pep ide p edic ion.
No malized con usion ma ices om mos success ul ainings o mul is age ensemble
p obabilis ic blending model o (B) a single F427A nanopo e and (C) a WT/F427A ensemble.
Rows ep esen ue pep ide labels, and columns ep esen p edic ed labels. Values indica e he
p opo ion o e en s om a gi en ue class ha we e p edic ed as each class.
1
Suppo ing In o ma ion o
Compa a i e s udy o nanopo e phenylalanine clamp a ian s e eals unique
pep ide biosensing and classi ica ion p ope ies
Jenni e M. Colby1 and B yan A. K an z2*
1P esen add ess: P emie Bio ech Labs, 723 Kaso a A e SE, Minneapolis, MN 55414,
U.S.A.
2Depa men o Mic obial Pa hogenesis, School o Den is y, Uni e si y o Ma yland,
Bal imo e, 650 W. Bal imo e S ee , Bal imo e, MD 21201, U.S.A.
* Co esponding au ho
B yan A. K an z
Depa men o Mic obial Pa hogenesis
School o Den is y, Uni e si y o Ma yland, Bal imo e
650 W. Bal imo e S ee
Bal imo e, MD 21201, U.S.A
(410) 706-1656
[email p o ec ed]
This PDF ile includes:
Suppo ing ex
Figu e S1
Table S1
SI Re e ences
1
Suppo ing In o ma ion Tex
Ma e ials and Me hods
Nanopo e and pep ides. Monome ic 83-kDa PA (PA83) p ep o ein mu an s,
F427A and F427Y, and hei homohep ame ic p epo e oligome s (PA7) we e p oduced
as desc ibed (1, 2). PA83 mu an s we e o e exp essed in Esche ichia coli BL21(DE3),
using a pET22b plasmid, which di ec s exp ession o he pe iplasm. Cell cul u es we e
g own a 37 °C in a cus om 5 L e men o using ECPM1 g ow h media (3), which was
supplemen ed wi h ca benicillin (50 mg/L). Once eaching an OD600 o 3-10, he cul u es
we e hen induced wi h 1 mM isop opyl β-d- hiogalac opy anoside o ~3 h a 30 °C.
PA83 was eleased om he pe iplasm by esuspending pelle ed cells on ice using a wi e
whisk wi h 1 L o hype onic suc ose bu e (20% suc ose, 20 mM T is-Cl, 0.5 mM EDTA,
pH 8) ollowed by osmo ic shock o cen i uged/pelle ed cells using a wi e whisk in 1 L o
hypo onic solu ion (5 mM MgCl2). Released PA83 monome , isola ed a e cen i uga ion
o emo e cellula deb is, was pu i ied on Q-Sepha ose anion-exchange
ch oma og aphy in 20 mM T is-Cl, pH 8.0 by binding and hen elu ing wi h a linea sal
g adien using 20 mM T is-Cl, pH 8.0 wi h 1 M NaCl.
To make PA7 p epo e oligome s o ei he mu a ion, pu i ied PA83 a a
concen a ion o 1 mg/ml was ea ed wi h ypsin (1:1000 w /w ypsin:PA) o 30 min a
oom empe a u e o o m nicked PA. T ypsin was subsequen ly inhibi ed wi h soybean
ypsin inhibi o a 1:100 dilu ion (w /w soybean ypsin inhibi o :PA). Nicked PA was
applied o Q-Sepha ose o hen isola e he PA7; oligome was bound o he column in
20 mM T is-chlo ide, pH 8.0 and elu ed by a linea sal g adien using 20 mM T is-Cl, 1 M
NaCl, pH 8.0. PA7 was concen a ed and ozen in small aliquo s o main ain ep oducible
nanopo e inse ion ac i i y in plana bilaye expe imen s.
Ten- esidue gues -hos pep ides o he gene al sequence, KKKKKXXSXX, whe e
X = A, L, F, T, W, and Y, we e syn hesized wi h s anda d ʟ amino acids (4, 5) (Elim
2
Biopha maceu icals). One s e eochemical a ian o X = W (called T pDL) was p oduced,
whe e ins ead o syn hesizing he pep ide wi h uni o m ʟ amino acids, an al e na ing
pa e n o ᴅ and ʟ amino acids was used (5).
Single-channel elec ophysiology. Plana lipid bilaye cu en s we e eco ded
using an Axopa ch 200B ampli ie in e aced by a Digida a 1440A acquisi ion sys em
(Molecula De ices) (2, 5, 6). Memb anes we e o med by pain ing ac oss a 50-μm
ape u e o a 1-mL whi e Del in cup wi h 3% (w / ol) 1,2-diphy anoyl-sn-glyce o-3-
phosphocholine (A an i Pola Lipids) in n-decane. The cis (side o which he PA7 is
added) and ans chambe s we e ba hed in symme ic single-channel bu e (SCB: 100
mM KCl, 1 mM EDTA, 10 mM succinic acid, pH 5.60). Reco dings we e acqui ed a 500-
600 Hz using PCLAMP10. The applied ol age is de ined as Δψ = ψcis - ψ ans (whe e
ψ ans is 0 mV).
Single-channel eco dings o he gues -hos pep ide ansloca ions ia he PA
nanopo e we e ca ied ou as desc ibed (5) wi h some sligh di e ences. A single PA
channel was inse ed in o a pain ed bilaye a a Δψ o 20-30 mV by adding ~2 pM o PA7
( eshly dilu ed om a 2-μM s ock) o he cis side o he memb ane. The p epo e
oligome con e s o he nanopo e s a e by inse ing in o he memb ane in an o ien ed
manne . Once a single channel inse ed, he cis chambe was pe used by esh SCB o
emo e excess uninse ed PA7. Then he desi ed pep ide analy e was added o
he cis chambe a 5 o 20 nM. T ansloca ion da a we e acqui ed by s epping he applied
Δψ o a highe posi i e alue and collec ing eco dings o he ansloca ion e en s eam
o up o hi y minu es.
Mino p ocessing as well as conduc ance s a e labeling o he aw single-channel
e en s eam eco dings was subsequen ly pe o med. Ra e ansien ou -o - ange
cu en spikes, inse ion o second channels, and inac i a ed channels we e emo ed by
a ‘ o ce alues’ ou ine in CLAMPFIT. T ansloca ion eco dings acqui ed a 500 o 600
3
Hz we e downsampled o 400 Hz by decima ion in Py hon using he scipy.signal lib a y.
400 Hz was chosen o be consis en wi h p io WT PA da ase s (7), main aining a
consis en ime s ep o ML pep ide classi ica ion models. Fou disc e e conduc ance
s a es we e hen de ec ed in hese eco dings using K-Means clus e ing, whe e baseline
cu en d i was co ec ed by applying a mo ing window a e age o se (4000 ime poin
window). While a e addi ional s a es we e no ed, he da a we e labeled o he ou
dominan s a es o ei he he F427A o F427Y mu an s. By con en ion, he ully blocked
pep ide-bound s a e was s a e 0, he in e media e closes o he ully blocked s a e was
s a e 1, he in e media e closes o he open s a e was s a e 2, and he open s a e was
s a e 3. Ul ima ely, he s a e labeling p oduced a h ee-column CSV ile o he s eam
wi h columns ‘Time’, ‘Cu en ’, and ‘S a e’. >90% o ime poin s we e labeled consis en ly
when compa ing ou K-Means algo i hm and he mo e adi ional CLAMPFIT. All labeled
CSV s eam iles o he se en pep ides we e en e ed in o a local anno a ed pep ide
da abase o aid in in si u loading/p ep ocessing o each es ed ML model.
Ha dwa e and so wa e used o ML. Anaconda was used o c ea e a Py hon
3.10.16 en i onmen , whe e XGBoos (3.0.0) (8) and o he s anda d modules we e
ins alled. The ha dwa e used in p ep ocessing and he ML pep ide classi ica ions was a
2025 MacBook P o wi h M4 Apple Silicon and 24 GB o RAM. All sou ce code is
a ailable a Gi Hub (h ps://gi hub.com/bak an z/Pep -Class).
P ep ocessing, e en segmen a ion, and ea u e ex ac ion. Raw s a e-labeled
e en s eams we e segmen ed in o ansloca ion e en s, enabling ea u e ex ac ion, as
desc ibed (7). The minimum e en du a ion, which se ed as an e ec i e il e o
excluding e y sho -du a ion e en s, was se o 15 ms o op imally cap u e he mos
in o ma ion- ich da a. This il e ing pa ame e was p e iously sys ema ically a ied in he
ange o 5 o 20 ms (7). Each segmen ed e en was de ined as ini ia ing when he
cu en changed om he ully open s a e (s a e 3, co esponding o baseline cu en ) o
4
any pep ide-bound s a e (s a e 0, 1, o 2) and e mina ing when he cu en e u ned o
he open s a e. F om hese segmen ed e en s, bo h aw cu en sequences and
co esponding s a e sequences we e ex ac ed. A comp ehensi e se o e en -le el
ea u es was hen compu ed om hese wo di e en sequences using a cus om
segmen a ion co e. This co e main ains a gene alizable amewo k o p ocess pep ide
ansloca ion e en s om sys ems exhibi ing di e se mechanisms and an a bi a y
numbe o s a es. These ea u es we e ini ially g ouped in o scala , ec o , and ma ix
da a s uc u es, wi h alues in ec o and ma ix ea u es being s a e o ansi ion
enume a ed. Scala ea u es included: Shannon en opy o s a e sequence, e en
du a ion, numbe o ansi ions, ime o he i s ansi ion, o al numbe o s a es isi ed
du ing he e en , skewness o he cu en sequence, and ku osis o he cu en
sequence. Vec o ea u es included: obse ed conduc ance s a e Boolean, obse ed
conduc ance le els, p obabili y o esiding in each s a e, and longes dwell ime in each
s a e. Ma ix ea u es included: a e age dwell ime o speci ic s a e- o-s a e ansi ions,
a iance o dwell ime o ansi ions, and a io o p obabili ies be ween s a es. Fo
downs eam ML classi ica ion, all ma ix ea u es we e la ened in o one-dimensional
a ays and appended wi h he ec o and scala ea u es o o m a single ea u e ec o
o each ansloca ion e en . These la ened desc ip i e key names o he ea u es
we e gene a ed o main ain aceabili y in subsequen applica ions. All p ocessed e en
sequences, hei la ened ea u es, and associa ed ea u e key names we e sa ed as a
Py hon pickle objec o e icien s o age and cached e ie al. A local pep ide e en s
da abase was employed o ack hese p ep ocessed cached pickle iles o e iciency.
Ini ial single nanopo e ML pep ide classi ica ion. ML-based classi ica ion o
pep ide ansloca ion e en s om single indi idual nanopo e a ian s was pe o med
using he g adien boos ing amewo k, XGBoos (8), which was implemen ed as
desc ibed (7). Fo his, pep ide e en da a, p e iously ex ac ed and cha ac e ized in o