Pe u ba ion-Theo y Machine Lea ning (PTML) Mul ilabel Model o
he ChEMBL Da ase o P eclinical Assays o An isa coma
Compounds
Alejand o Cab e a-And ade,*
,&
And esLo
pez-Co es,
&
C is ian R. Mun eanu, Alejand o Pazos,
Yunie kis Pe ez-Cas illo, Edua do Teje a, Sonia A asa e, and Humbe Gonzalez-Díaz*
Ci e This: ACS Omega 2020, 5, 27211−27220
Read Online
ACCESS Me ics & Mo e A icle Recommenda ions *
sıSuppo ing In o ma ion
ABSTRACT: Sa comas a e a g oup o malignan neoplasms o
connec i e issue wi h a diffe en e iology han ca cinomas. The effo s
o disco e new d ugs wi h an isa coma ac i i y ha e gene a ed la ge
da ase s o mul iple p eclinical assays wi h diffe en expe imen al
condi ions. Fo ins ance, he ChEMBL da abase con ains ou comes o
37,919 diffe en an isa coma assays wi h 34,955 diffe en chemical
compounds. Fu he mo e, he expe imen al condi ions epo ed in his
da ase include 157 ypes o biological ac i i y pa ame e s, 36 d ug
a ge s, 43 cell lines, and 17 assay o ganisms. Conside ing his
in o ma ion, we p opose combining pe u ba ion heo y (PT) p inciples
wi h machine lea ning (ML) o de elop a PTML model o p edic
an isa coma compounds. PTML models use one unc ion o e e ence
ha measu es he p obabili y o a d ug being ac i e unde ce ain condi ions (p o ein, cell line, o ganism, e c.). In his pape , we used
a linea disc iminan analysis and neu al ne wo k o ain and compa e PT and non-PT models. All he explo ed models ha e an
accu acy o 89.19−95.25% o aining and 89.22−95.46% in alida ion se s. PTML-based s a egies ha e simila accu acy bu
gene a e simples models. The e o e, hey may become a e sa ile ool o p edic ing an isa coma compounds.
■INTRODUCTION
Sa comas a e a g oup o malignan neoplasms o connec i e
issue. Al hough hei p e alence is much lowe han
ca cinomas, he numbe o cases is inc easing acco ding o
he Wo ld Heal h O ganiza ion.
1
A he molecula le el, hei
beha io diffe s om ca cinomas, p esen ing a mo e a ied and
complex e iology. This high e iological complexi y possibly
s ems om hei mesenchymal o igin, which makes i difficul
o p opose new he apeu ic a ge s o he espec i e
ea men .
2−6
Rep esen a i e an icance compounds end o
ha e high cy o oxici y and low cellula specifici y.
7
This leads
o a dec eased efficiency wi hin he ea men and a low
emission a e o he disease. Howe e , a desc ip ion o new
molecula ma ke s and he cons an pe o mance o d ug
p eclinical assays ha e gene a ed la ge amoun s o da a.
8−12
This da a, i adequa ely a ionalized, may lead in u n o he
design o mo e selec i e d ugs, which akes in o accoun
specific d i e s based on pa hogenic signaling pa hways. Fo
ins ance, he Chemical Da abase o he Eu opean Molecula
Biology Labo a o y (ChEMBL)
13,14
con ains expe imen al
ou comes o >37,900 diffe en p eclinical assays o an i-
sa coma d ug candida es. These assays co e a la ge and
s uc u ally he e ogeneous se ies o >34,900 diffe en chemical
compounds. Fu he mo e, he p eclinical assays ha e been
ca ied ou on e y diffe en expe imen al condi ions. These
expe imen al condi ions include up o 155 diffe en ypes o
biological ac i i y pa ame e s, 36 p o ein a ge s, 43 cell lines,
and 17 assay o ganisms. O e all, his o ms a la ge and
complex da ase suscep ible o analysis so as o ex ac use ul
knowledge o d ug disco e y.
In his con ex , we can use compu a ional echniques o
explo e his expe imen al da ase due o he e iden difficul ies
o analyze i manually. Specifically, chemin o ma ics me hod-
ologies ha e succeeded in he disco e y o new d ug candida es
effec i e in he we -lab.
15,16
Howe e , many models de eloped
hus a a e applied only o ca cinomas and/o a e ocused on
homologous se ies o compounds wi h one a ge o a single
cell line.
17−26
In ecen yea s, se e al s udies ha e ocused on
applying hese me hodologies o he s udy o new ypes o
an isa coma d ugs, mainly on cell lines.
27−30
Howe e , almos
all he models epo ed ha e a na ow domain o applica ion
because hey ocus on only one se o condi ions, o ins ance,
Recei ed: July 13, 2020
Accep ed: Oc obe 6, 2020
Published: Oc obe 15, 2020
A icle
h p://pubs.acs.o g/jou nal/acsod
© 2020 Ame ican Chemical Socie y 27211
h ps://dx.doi.o g/10.1021/acsomega.0c03356
ACS Omega 2020, 5, 27211−27220
This is an open access a icle published unde an ACS Au ho Choice License, which pe mi s
copying and edis ibu ion o he a icle o any adap a ions o non-comme cial pu poses.
one specific p ope y, a ge p o ein, o cell line. Thus, models
whe e mul iple condi ions o assays a e conside ed a he same
ime a e a ac i e. Pe u ba ion heo y (PT) ideas wi h
machine lea ning (ML) me hods (PT + ML = PTML models)
a e pa icula ly use ul o fi ing complex da ase s wi h big da a
ea u es in d ug disco e y, p o eomics, nano echnology,
e c.
31−41
PTML models begin wi h one unc ion o e e ence ha
measu es he p obabili y o a d ug o be ac i e unde ce ain
condi ions (p o ein, cell line, o ganism, e c.). Nex , PTML
models use PT ope a o s (PTOs) o accoun o he
pe u ba ions (de ia ions) o he inpu a iables o his d ug
wi h espec o a popula ion o d ugs assayed unde he same
condi ions. ML algo i hms a e used o es ablish he ela ion-
ship be ween he inpu s and he ou pu a iable. In cance
esea ch, Speck-Planche e al. and o he esea che s ha e
de eloped PTML-like models o diffe en ypes o cance s
(wi h an emphasis on ca cinomas) such as bladde , p os a e,
b ain, and b eas cance s.
42−50
In addi ion, Bediaga e al.
de eloped a PTML algo i hm o p edic ing an icance
compounds using da a o mul iple ypes o ca cinomas a
he same ime.
51
Speck-Planche e al. also ecen ly de eloped
he fi s PTML-like model o he p edic ion o an isa coma
compounds using a spec al momen app oach.
52
In any case, he e a e no epo s o o he PTML-like models
o an isa coma compounds. In his s udy, we ca ied ou a
comp ehensi e compila ion, cu a ion, and p ep ocessing o he
ChEMBL da ase o p eclinical assays o an isa coma
compounds. A e ha , we de eloped he fi s PTML model
able o fi his complex da ase wi h >37,900 assays and
>34,900 compounds. To he bes o ou knowledge, he s udy
ou pe o ms all p e ious effo s in e ms o simplici y o he
model and numbe o cases, compounds, and cell lines
conside ed.
■RESULTS AND DISCUSSION
PTML An isa coma Compound Model. The s a is ical
pa ame e s o he PTML model showed a high specifici y
(Sp) and sensi i i y (Sn) o he aining se ies (95.63 and
79.64, espec i ely). In addi ion, simila alues we e ob ained
o Sp (95.79) and Sn (81.62) in he alida ion se s.
Fu he mo e, he p-le el ob ained om he chi-squa e (χ2=
16848.08) was <0.05, indica ing ha he model is able o
pe o m a s a is ically significan sepa a ion o bo h classes. I is
also in e es ing o obse e he high o e all accu acy (Ac)
ob ained in bo h se s: o e 94% (Table 1). These esul s
sugges ha he gene a ed model pe o ms a s a is ically
significan classifica ion o an isa coma compounds; hence, i
can be conside ed use ul o classifica ion models wi h
applica ion in medicinal chemis y. The ull lis o biological
ac i i ies (c0) in he ChEMBL da ase o an isa coma
p eclinical expe imen al assays is shown in Table S1.
The esul ing PTML−linea disc iminan analysis (LDA)
model showed he ollowing o mula
χ
=−+·+·
−·−·[−⟨⟩]
+·[−⟨⟩]
== <
D
DDD
DD
np
c
c
( ) 11.8545 34.8028 ( ) 0.37
0.0128 0.3616 ( )
0.0191 ( )
34955, 16848.08, 0.001
ij calc ij e 1
211j
22j
2(1)
The PTML-LDA model was ini ia ed by using as an inpu
he alues he unc ion o e e ence ( ij) e o each compound
and by adding he effec o pe u ba ions wi hin he sys em.
These pe u ba ion effec s e e o he PTOs ΔDk(cj). In eq 1,
“i”and “j”a e he assay and condi ion, espec i ely. Addi ional
coefficien s and e ms a e desc ibed in Table 2.
The pa ame e s ALOGP and PSA a e widely used in
medicinal chemis y because hey a e ela ed o he lip-
ophilici y o d ugs and, consequen ly, o hei capaci y o pass
h ough biological memb anes o in e ac wi h p o ein
Table 1. PTML Model Resul s
p edic ed se
se ies s a is ical pa ame e
a
p edic ed s a is ics (%) obse ed se ( ij)p ed =0 ( ij)p ed =1
aining Sp 95.63 ( ij)obs = 0 25,647 1172
Sn 79.64 ( ij)obs = 1 330 1291
Ac 94.72 o al 25,977 2463
alida ion Sp 95.79 ( ij)obs = 0 8559 376
Sn 81.62 ( ij)obs = 1 100 444
Ac 94.98 o al 8659 820
a
Sn, sensi i i y (%); Sp, specifici y (%); Ac, accu acy (%).
Table 2. Va iables Used o Fi he PTML Model
condi ion
a
(cj)
condi ion
name symbol ope a o o mula ope a o in o ma ion
c0ac i i y ype ( ij)obs =IF(AND( ij > cu off(c0), d(c0)
= 1), 1, IF (AND ( ij <
cu off(c0), d(c0)=−1), 1, 0))
obse ed classifica ion o he ou come ij in he assay wi h condi ions cj
c0ac i i y ype ( ij) e n( ( ij)obs = 1)/nj unc ion o e e ence i he obse ed alue o p obabili y p( ( ij)=1)
exp o he ac i i y ij o
ype c0
cj=[c
1,c2,c3] all
condi ions
(cj)
ΔD1(cj) ALOGPi-⟨ALOGP(cj)⟩de ia ion o he molecula desc ip o s o hyd ophobici y/lipophilici y D1(ALOGP) and
pola su ace a ea D2(PSA) om each expec ed alue (⟨D1(cj)⟩)o (⟨D2(cj)⟩) o he
condi ions cj(c1= p o ein a ge ; c2= cell line; c3= assay o ganism)
cj=[c
1,c2,c3] all
condi ions
(cj)
ΔD2(cj) PSAi-⟨PSA(cj)⟩
a
MMA ope a o s wi h a subse o mul iple condi ions included in eq 1.
ACS Omega h p://pubs.acs.o g/jou nal/acsod A icle
h ps://dx.doi.o g/10.1021/acsomega.0c03356
ACS Omega 2020, 5, 27211−27220
27212
hyd ophobic pocke s.
53−56
The PTML algo i hm has been
p e iously applied o he s udy o mul iple p eclinical assays o
an icance d ugs. As shown in Table 3, mos applica ions ha e
been di ec ed owa d he mos p e alen ca cinomas among
he global popula ion. Fo ins ance, Speck-Planche e al.
epo ed PTML-like models o bladde ,
44
colo ec al,
46
b eas ,
47
p os a e
49
cance s and o mul iple ca cinoma
sub ypes.
51
In addi ion, PTML-like models ha e been es ed
in an ib ain umo agen s.
45
In e es ingly, Bediaga e al.
demons a ed he applica ion o a PTML on se e al ypes o
ca cinomas simul aneously and ob ained simila Sn and Sp
alues as we did (>90%).
51
All hese PTML-like models a e
able o accoun o changes in a ge p o eins, cellula lines,
o ganisms, e c. Howe e , hey a e specificmodels o
ca cinomas, no o sa comas.
I is wo h no ing ha o he bes o ou knowledge, Speck-
Planche e al.
52
seem o be he only esea che s o ha e
epo ed a p e ious PTML-like model o sa comas hus a . In
hei s udy, he p edic ion model in ex e nal alida ion esul ed
in Ac (90.78) and Sp (90.65) alues ha we e lowe han wha
was ob ained in ou model (Ac = 94.98 and Sp = 95.79).
Howe e , ou PTML algo i hm showed a lowe sensi i i y in
ex e nal alida ion da a (81.62%) han he model ob ained by
Speck-Planche e al. (91.74%). E en when ou model had a
much lowe numbe o a iables and used a s ic e cu -off
defini ion o ac i i y class (i.e., IC50 = 0.1 μM ins ead 1 μM),
hese aspec s alone canno explain he sensi i i y educ ion.
The gene a ed PTML-LDA model (eq 1) has impo an
cha ac e is ics ha allow i o be used wi hin esea ch ocused
on d ug disco e y. One o he main ad an ages o ou model is
heconside able educ iono inpu a iables o he
cons uc ion o he algo i hm h ough he inclusion o PTOs.
This educ ion allowed us o wo k on da ase s wi h a la ge
amoun o in o ma ion, o define cu -off alues, and o
calcula e he p obabili y o belonging o a class, whe he his
was a p edic ion o ac i e compounds (1) o inac i e
compounds (0). In his way, he Sn o Sp alues o he
model can be adjus ed acco ding o he delimi ed cu -offs. An
ideal p edic ion model has a easonable ade-offbe ween Sn
and Sp. This means ha a high sensi i i y is achie ed by
accep ing a ela i ely low Sp and, con e sely, a high Sp is
eached by comp omising Sn. Sp is synonymous wi h a ue-
nega i e a e, which is ela ed o he alse-posi i e a e,
30
so a
high specifici y in a p edic ion model o d ug disco e y
implies ha i is unlikely o ge a posi i e esul in a d ug ha
does no ha e a desi ed biological ac i i y. Thus, a posi i e
Table 3. Compa ison o O he PTML Models o An icance Compounds
cance ype
a
PT
b
ML
c
NV
d
cases
e
Sn(%)
Sp(%)
e
sa coma
MSS MMA LDA 3 37,919 ∼80 >90 his wo k
MSS MA LDA >10 3017 >90 >90 52
ca cinoma
bladde MA LDA >10 664 >90 >90 44
bladde ANN (RBF) 10 664 >95 >95 44
b ain MA LDA >10 1236 ∼90 >90 45
b eas MA LDA >10 2272 >85 >90 47
colo ec al MA LDA >10 1651 >90 >90 46
colo ec al MA ANN (RBF) >10 1651 >90 >90 46
p os a e MA LDA >10 1668 >85 >90 49
MCS MMA LDA >10 116,934 >70 ∼90 51
MCS MMA LDA 3 116,934 >70 >90 51
MCS MMA ANN 4 116,934 >80 >80 51
a
MSS, mul iple sa coma sub ypes; MCS, mul iple ca cinoma sub ypes.
b
PT ope a o s used in PTML models: MMA, mul icondi ion mo ing
a e age; MA, mo ing a e age.
c
ML me hod used o he PTML models: LDA, linea disc iminan analysis; ANN, a ificial neu al ne wo ks; RBF,
adial basis unc ion; LNN, linea neu al ne wo ks; E-ANN (RBF), ensemble o a ificial neu al ne wo ks based on he RBF a chi ec u e.
d
NV,
numbe o inpu a iables.
e
Numbe o p eclinical assays.
App oxima e alues o aining se ies.
Table 4. Diffe en Sco es Calcula ed o he Selec ed Biological Ac i i ies (c0)
ac i i y pa ame e o ij(c0) (uni ) nj(c0)
a
⟨ ij(c0)⟩
b
dj(c0)
c
cu off(c0)n( ( ij)obs =1)
d
p( ( ij)obs = 1/c0)
e
po ency (nM) 31,581 19669.199 −1 100 149 0.005
IC50 (nM) 1808 228362.82 −1 100 177 0.098
inhibi ion (%) 690 39.186507 1 50 225 0.326
CC50 (nM) 450 134445.04 −1 100 4 0.009
ac i i y (%) 404 52.416163 1 50 208 0.515
EC50 (nM) 379 63578.521 −1 100 44 0.116
TGI (%) 202 43.915842 1 50 102 0.505
T/C 173 26.556832 1 50 28 0.162
IC50 (μgmL
−1) 167 64.429402 −1 60 118 0.707
T/C (%) 144 156.92153 1 50 123 0.854
GI50 (nM) 113 66515.131 −1 100 13 0.115
EC50 (μgmL
−1) 90 60.733562 −1 60 57 0.633
a
nj(c0), o al compounds wi h expe imen al alues.
b
⟨ ij(c0)⟩, a e age calcula ed o each c0biological ac i i y.
c
dj(c0), desi abili y alue (1, −1)
assigned o each c0.
d
n( ( ij)obs = 1), o al numbe o biologically ac i e compounds obse ed wi hin each c0acco ding o he expe imen al alues
ij(c0) epo ed o he pa ame e s j.
e
p( ( ij)obs = 1/c0), p obabili y o a desi ed biological ac i i y wi hin he condi ions c0.
ACS Omega h p://pubs.acs.o g/jou nal/acsod A icle
h ps://dx.doi.o g/10.1021/acsomega.0c03356
ACS Omega 2020, 5, 27211−27220
27213
ou come in a specific model is qui e in o ma i e in a d ug
disco e y scena io.
On he o he hand, a main a ibu e is he possible
combina ion o se e al expe imen al condi ions o he
p edic ion o new compounds. In his sense, Speck-Planche
e al.
52
used a ound 3000 in e ac ions de i ed om 14 cell lines
and only conside ed IC50 assays o hei model. Howe e , we
modeled 37,919 in e ac ions cases comp ising 36 p o ein
a ge s, 43 cell lines, and 17 assay o ganisms. We also included
se e al diffe en assay ypes (Table 4). The modeling ask we
ha e is mo e complex no only because o he inc emen in he
chemical di e si y bu also he wide ype o he e ogenei y in
he in e ac ions (i.e., a ge ypes and o ganisms). The wo
models canno be compa ed in his scena io and ou educ ion
in he abili y o de ec he ue-posi i e cases (Sn) could be a
consequence o his da a complexi y and also he modeling
s a egy.
PTML Cu -OffScanning S udy. As men ioned abo e, he
cu -offimplemen ed in he model is a igo ous alue ha , a
he expe imen al le el, is impo an i one desi es o inc ease
effec i eness in he p ocess o disco e ing an isa coma d ugs. A
es ic ed alue p omo es high ce ain y in he p edic ion o
ac i e compounds o achie ing a desi ed biological ac ion
unde mul iple es condi ions.
57−59
Fu he mo e, a s ic cu -
offcan dec ease he a e o p edic ed alse posi i es; he e o e,
i he assay is o be implemen ed, hen i needs a highe
sensi i i y o highe specifici y. This alue can be modeled
depending on he expe imen al condi ions one wishes o apply.
This cu -off alue also influences he accu acy wi hin ou
model. As obse ed in Figu e 1, when using he a e age
⟨ ij(c0)⟩calcula ed o each c0, he Ac is no a desi able sco e.
These low s a is ical alues a e mainly influenced by he low Sn
in he p edic ion. By inc easing he igo , he model imp o es
i s p edic ion alues o he ac i e compounds (1). When
looking a hese esul s, ou p edic ion algo i hm no only akes
in o accoun se e al expe imen al condi ions bu also es ic s
he p edic ion o compounds o hose ha ha e ue biological
ac i i y.
PTML s ML Model Compa ison. Mos mul i asking o
mul ilabel ML me hods a e use ul o p edic ing mul iple
ca ego ical ou pu s o he same se o inpu con inuous
a iables.
60,61
Howe e , ou p oblem was a li le diffe en : we
had o de elop an ML model wi h only wo possible ou pu s,
( ij)p ed = 1 o 0, o he same se o inpu a iables. Tha
mean ha ou model was no mul i asking o a single case
wi h a se o inpu a iables con aining mul iple con inuous
a iables plus mul iple ca ego ical inpu a iables. Howe e , we
had mul iple combina ions o inpu ca ego ical a iables o
le els o he same se o inpu con inuous a iables. Hence,
ou model was mul ilabel in he inpu ca ego ical a iables o
he same se o inpu con inuous a iables. To illus a e his
ac , we de eloped he e a compa ison o ou PTML-LDA
model s classic ML using mul iple labeling ca ego ical
a iables. As seen in Figu e 2A, he pe o mance o ou
PTML-LDA model compa ed o a classic ML-LDA demon-
s a es simila alues based on Sp, Sn, and Ac. Simila ly, when
de eloping neu al ne wo ks (NN), he esul s o PTML-NN
(Figu e 2B) and ML-NN (Figu e 2C) a e qui e simila . One o
he ad an ages o ou PTML model is he inclusion o PTOs,
which g ea ly educes he numbe o a iables o gene a e he
algo i hm. Thus, al hough he s a is ics o all he models
gene a ed a e qui e simila , he PTML me hodology allows o
he educ ion o a iables om 164 a iables in classic ML
me hods o only 5 in he PTML model. All he PTML and
non-PTML model esul s a e desc ibed in Table S2.
Figu e 1. Va ia ion o he specifici y, sensi i i y, and accu acy alues acco ding o he cu -offs implemen ed. The a ia ion o hese sco es based on
he biological ac i i ies c0is included in he x-axis. Biological ac i i ies c0exp essed in % (e.g., inhibi ion, ac i i y, umo g ow h inhibi ion, e c.) and
hose exp essed in nM (e.g., po ency, IC50,CI
50,e c.) a e desc ibed. The final model is ob ained by applying cu -off alues o 50 o c0exp essed in
% and 100 o c0exp essed in nM.
ACS Omega h p://pubs.acs.o g/jou nal/acsod A icle
h ps://dx.doi.o g/10.1021/acsomega.0c03356
ACS Omega 2020, 5, 27211−27220
27214
PTML s ML Model wi h O he Desc ip o s. P e ious
s udies ha e conside ed a wide a ie y and quan i y o
molecula desc ip o s in PTML models. Fo example, o
sa coma modeling, Speck-Planche e al.
52
used 423 desc ip o s
ollowed by a ea u e selec ion s a egy. Simila ly, 289
desc ip o s we e used in a PTML model on b eas cance .
47
We used his app oach as a s a egy o compa e he
pe o mance o PTML model s classic ML echniques
including new molecula desc ip o s (Figu e 2A). In his ML
s udy, we included 12 BCUT molecula desc ip o s (Dk, wi h k
> 2) as an inpu , which we e no used in he p e ious model,
and 162 ca ego ical (dummy) a iables (Ck). These Ckha e
been used o label he mul iple condi ions o he assays cj
(o ganisms, p o eins, cell lines, e c.). One mus emembe ha
D1= ALOGP and D2= PSA. The new molecula desc ip o s
we e D3,D4, ..., D14. The expansion o he a iables oge he
wi h he ML s a egies yielded good esul s bu did no
ou pe o m wha was ob ained o he PTML-LDA an i-
sa coma model (as seen in Figu e 2A and Table S2) and he
numbe o a iables inc eased o 174 inpu a iables in o al.
This sugges s ha by adding diffe en molecula desc ip o s
and p obably ea u e selec ion s a egies, accep able models o
d ug disco e y can be buil . Howe e , ou PTML-LDA model
based on D1and D2is a simple ye effec i e model.
Figu e 2. PTML s ML models. Compa ison o sensi i i y, specifici y, and accu acy o all he gene a ed models. (A) P edic ion alues o PTML-
LDA and ML-LDA models using diffe en ypes o inpu a iables: ( ij)p ed is he unc ion o e e ence; D1(cj) and D2(cj) a e he ALOGP and PSA
desc ip o s, espec i ely; ΔD1(cj) and ΔD1(cj) a e he de ia ions o he molecula desc ip o s o ALOGP and PSA, espec i ely; D3, ..., D15(cj) a e
he 12 BCUT molecula desc ip o s calcula ed om ChemAxon. Unlike he PTML model, he ML model is calcula ed wi h condi ions c1,c
2, and c3
as a sepa a ed se o ca ego ical a iables. (B) P edic ion alues be ween he neu al ne wo k-PTML (NN-PTML) and (C) NN-ML models. The
NN ob ained we e mul ilaye pe cep on (MLP), linea neu al ne wo k (LNN), and adial basis unc ion ne wo k (RBF).
ACS Omega h p://pubs.acs.o g/jou nal/acsod A icle
h ps://dx.doi.o g/10.1021/acsomega.0c03356
ACS Omega 2020, 5, 27211−27220
27215
Mul iple-Condi ion A e ages in he PTML An isa co-
ma Model. In o al, we ound 83 possible combina ions o
mul iple condi ions o all he included sa coma assays. As
shown in Table 5, he nj(cj) wi h he highes numbe o en ies
co esponded o es s on human cell lines and on cell lines in
Mus musculus. The mul icondi ion mo ing a e ages (MMAs)
used he e, ⟨D1(cj)⟩and ⟨D2(cj)⟩, a y significan ly along all
combina ions. Howe e , he an icance compounds obse ed
o he human os eosa coma cell lines U2OS, HOS, SAOS-2,
MG-63, and 143B and o he fib osa coma cell line HT-1080
we e in a ange o ⟨D1(cj)⟩o 1.2−3.7. A simila ange was
obse ed in compounds es ed in M. musculus (⟨D1(cj)⟩=1−
3). In e es ingly, when compa ing hese alues wi h he
a ia ion o ⟨D2(cj)⟩, es s on i us lines, such as Moloney
mu ine sa coma i us and Woolly monkey sa coma i us, had
highe means (be ween 140 and 205). Since he ALOGP
coefficien is a measu e widely used in d ug disco e y o assess
he deg ee o abso p ion, dis ibu ion in he body, pene a ion
ac oss biological memb anes, me abolism, and exc e ion, his
ange iden ified in ou esul s is an impo an space o he
p edic ion o an isa coma d ugs.
62,63
Likewise, he ange o
PSA e idenced in i al line assays may be a be e space o his
coefficien i i is desi ed o p edic new compounds in hese
expe imen al condi ions. This may be in e es ing when
defining he alida ion o a ce ain an isa coma compound.
Thus, i a compound is significan ly p edic ed in an
expe imen al animal o human cell lines, hen i will be
possible o p opose alida ions a he p eclinical le el o in
clinical ials, espec i ely.
How o Use he PTML Model in P ac ice. The model is
capable o sco ing he ac i i y o a single compound unde
diffe en assay condi ions. To p edic a new compound, fi s ,
we ha e o subs i u e he expec ed alues o unc ion o
e e ence ( ij) e =p( ( ij =1)
exp in he model. As
Table 5. Mul iple-Condi ion A e ages o All Sa coma Assays
assay condi ion (cj)
a
pa ame e
c1= p o ein (gene)c
2= cell line c3= assay o ganisms
b
nj(cj)⟨D1(cj)⟩⟨D2(cj)⟩
O75874 (IDH1)MD H. sapiens 31,581 3.778 70.597
MD MD M. musculus 1440 2.67 103.712
MD U2OS H. sapiens 746 4.421 78.325
MD HOS H. sapiens 637 3.603 89.517
MD MD H. sapiens 375 3.846 69.876
MD SAOS-2 H. sapiens 358 4.882 81.659
MD Sa coma-180 M. musculus 271 1.108 83.68
MD MG-63 H. sapiens 241 2.965 111.864
MD M5076 M. musculus 197 3.033 114.886
MD HT-1080 H. sapiens 170 2.826 97.731
MD 143B H. sapiens 131 1.283 141.735
MD MD Pseudomonas ae uginosa 130 0.277 142.432
MD MD MD 126 1.898 93.448
MD habdomyosa coma cell H. sapiens 116 4.036 77.177
MD CCRF S−180 M. musculus 109 0.978 140.984
P13053 (Vd )MDRa us no egicus 64 5.844 60.476
MD MES-SA H. sapiens 64 2.956 89.631
MD MD RSV 61 1.277 127.944
MD 6C3HED M. musculus 60 3.09 97.831
MD C3H/3T3 MMSV 50 0.327 139.359
P35354 (PTGS2)MD H. sapiens 49 3.515 69.152
MD A204 H. sapiens 44 1.189 106.655
P03359 (pol) MD WMSV 44 6.786 204.629
MD MD Gallus gallus 43 0.516 106.529
P37231 (PPARG)MD H. sapiens 40 5.33 83.835
MD MD MMSV 39 0.213 166.782
Q07869 (PPARA)MD H. sapiens 37 5.364 81.891
Q13443 (ADAM9)MD H. sapiens 35 2.914 91.186
MD MD R. no egicus 34 5.245 64.58
MD fib oblas MMSV 33 −1.224 150.956
MD MD en e o i us 33 6.348 38.332
MD MD human he pes i us 1 31 6.27 57.306
MD 791T cell line H. sapiens 28 −1.179 139.194
MD C3H/3T3 M. musculus 28 1.745 115.047
P08253 (MMP2)MD H. sapiens 28 3.31 112.85
MD MD human en e o i us 71 28 1.967 124.221
P04637 (TP53), Q00987 (MDM2) SJSA-1 H. sapiens 27 5.213 49.453
P06401 (PGR)MDH. sapiens 26 4.494 32.958
MD HL-60 H. sapiens 25 3.81 33.754
a
MD, missing da a.
b
RSV, Rous sa coma i us; MLV, mu ine leukemia i us; MMSV, Moloney mu ine sa coma i us; WMSV, Woolly monkey
sa coma i us.
ACS Omega h p://pubs.acs.o g/jou nal/acsod A icle
h ps://dx.doi.o g/10.1021/acsomega.0c03356
ACS Omega 2020, 5, 27211−27220
27216
a o emen ioned, his is he p obabili y o he compound being
ac i e o a gi en biological ac i i y pa ame e (c0) (see Table
2). Nex , we need o subs i u e in o he equa ion he alues o
molecula desc ip o s D1= ALOGP and D2= PSA o he
compound (chemical s uc u e), calcula ed wi h he same
algo i hm used in he ChEMBL da ase . Las , we ha e o
subs i u e in o he equa ion he a e age alues (expec ed
alues) o he molecula desc ip o s ⟨D1(cj)⟩ o he specific
subse o condi ions o he assay cjwe wan o p edic . In Table
5, we show some selec ed alues o hese a e ages wi h >25
assays epo ed. I can be no ed ha he mos popula ed assays
in Homo sapiens in he da ase we e hose in i o assays ha
a ge ed he p o ein O75874 (IDH1) and ha a ge ed he cell
line U2OS. Upon inspec ing Table 5, we can see ha ⟨Dk(cj)⟩
alues change o diffe en subse s o condi ions cj.
Consequen ly, when we subs i u e he diffe en ⟨Dk(cj)⟩ alues
in o he model o he same compound, we can calcula e
diffe en sco es ( ij)calc o biological ac i i y o he same
compound unde mul iple assay condi ions. The ull lis o he
alues o ⟨Dk(cj)⟩appea s in Table S3.
■CONCLUSIONS
In his esea ch wo k, we gene a ed a PTML-LDA model
cons uc ed wi h an isa coma assays ob ained om ChEMBL
and a he e ogeneous se o diffe en cell lines, o ganisms, and
a ge s. As a as we know, his cons i u es he fi s ime ha
his kind o model was es ed o sa coma comp ising 34,955
chemical compounds and 37,919 assays. The PTML-LDA
model was compa ed wi h classic ML app oaches like he
neu al ne wo k and also wi h non-PT conside a ion. The a e
o ue posi i es and ue nega i es is simila when compa ing
PTML-LDA o o he p edic ion models. PTML-LDA educes
he amoun o inpu a iables (ALOGP and PSA) needed, hus
inc easing he simplici y and in e p e abili y o he model.
■METHODS
ChEMBL Da a Cu a ion and P ep ocessing. In o al, we
downloaded >370,000 ou comes o p eclinical assays o
an isa coma d ug candida es om he ChEMBL da abase.
The keywo ds (fields) used o he sea ch we e as ollows:
Sa coma (Assay) and also keywo ds o mo e ele an cell
os eosa coma lines MG-63, U2O2, HOS, SAOS-2, and 143B.
A e ha , we ca ied ou a da a usion o he da ase s ob ained
in o one single aw da ase . The wo king da ase was cu a ed
by elimina ing all duplica ed en ies. We also elimina ed all
cases wi h missing alues o biological ac i i y ( ij) and/o
molecula desc ip o s. The molecula desc ip o s used we e
he same as hose p ecalcula ed by he ChEMBL da abase
whe e D1= logPand D2= PSA.
13,14
The final da ase ob ained
a e cu a ion con ained 37,919 cases comp ising 36 p o ein
a ge s, 43 cell lines, and 17 assay o ganisms (Table S1). Fo
compa ison and explo a ion wi h o he models, we addi ionally
compu ed 12 BCUT molecula desc ip o s
64
wi h ChemAxon
(h p://www.chemaxon.com). The classical unweigh ed Bu -
den desc ip o s as well as hose weigh ed by cha ge and
hyd ogen bond p ope ies we e calcula ed. The lowes and he
h ee highes eigen alues we e used o desc ip o calcula ion.
To ain he model, we spli his da ase in o wo da a
subse s: aining and alida ion se ies. We pe o med a
andom, s a ified, and ep esen a i e selec ion o aining/
alida ion cases. To accomplish his ask, we so ed he cases
by nj( om highes o lowes ) as well as by assay condi ions:
biological ac i i y, p o ein accession, cell line, and assay
o ganism (alphabe ically om A o Z). A e his, we selec ed
e e y ou h case (1 ou o 4) o o m a aining subse (75% o
cases) and alida ion subse (25% o cases). The esul o each
expe imen al assay is he alue ob ained om he quan ifica-
ion o each biological ac i i y and named ij (“i”and “j”
ep esen he assay and condi ions, espec i ely). Each
biological ac i i y depends on he condi ions cj(c0,c
1,c
2, ...,
cn) used in each assay. Thus, he condi ions aken in o accoun
in he da a p ep ocessing we e c0= biological ac i i y, c1=
p o ein accession, c2= cell line, and c3= assay o ganism. F om
ij, each expe imen al assay was disc e ized based on he
desi abili y d(c0). This a iable was defined as 1 when he
esul o he desi ed biological ac i i y depended on an
inc eased alue o ij and −1 when he desi ed biological
ac i i y depended on a lowe alue o ij. Thus, he disc e ized
alue ( ij)obs was calcula ed as ollows: ( ij)obs = 1 when ij >
cu -offand d(c0) = 1. The unc ion ( ij)obs = 1 when ij < cu -
offand d(c0)=−1; o he wise, ( ij)obs = 0. The alue ( ij)obs =
1 e e s o a s ong effec o he compound o e he a ge .
Since d(c0) has a di ec ela ionship wi h ( ij)obs, we applied a
a ional cu -off o each c0, which will be discussed la e .
B iefly, he cu -off o p ope ies ela ed o d ug concen a ions
and desc ibed in nM (po ency, IC50,CC
50,EC
50,GI
50,e c.)
was se a 100. Fo p ope ies desc ibed in % (inhibi ion,
ac i i y, TGI, among o he s), he cu -offwas se a 50. Las , o
calcula e he p obabili y o hese expec ed alues, we e alua ed
he ela ionship be ween he o al numbe o he obse ed
n( ( ij)=1)
obs wi hin he le el o biological ac i i y desi ed o
he condi ion cjand he o al numbe o compounds nj ha
we e desc ibed in ha same condi ion. In his sense, we ha e
ha p( ( ij)obs =1)
exp =n( ( ij)=1)
obs/c0.
PTML Linea Model. The mul icondi ion mo ing a e ages
(MMAs) a e PTOs simila o Box−Jenkins mo ing a e age
ope a o s. Howe e , MMAs a e PTOs accoun ing o
pe u ba ions (changes) in mul iple condi ions cja he same
ime, while MA quan ifies changes in only one condi ion. By
using linea disc iminan analysis (LDA),
65
we ob ained a
PTML-LDA equa ion as ollows
∑∑=+· + ·+
·
Δ
===
aa aD a
Dc
() ()
()
k
k
kj k
kj
kj
kj
k
ij calc 0 1 ij e
11,0
,
j
max max max
The model gene a es an ou pu sco e ( ij)calc ha e e s o a
sco e unc ion o a biological ac i i y ij unde he assay
condi ions cj. The LDA algo i hm includes he Mahalanobis’
dis ance me ic,
65
which makes i possible o in e p edic i e
alues h ough a p obabili y calcula ion p( ( ij)=1)
p ed. Fo
he a iable selec ion, we de ec ed specific pe u ba ions wi hin
he condi ions cj ha will be adjus ed o an icance p ope ies
h ough a o wa d-s epwise s a egy.
65
Such condi ions as c1=
p o ein accession, c2= cell line, and c3= assay o ganism we e
significan , so we ook hem in o conside a ion in ou model.
Th ough p( ( ij)=1)
p ed, we p edic ed he ac i i y o each
compound by applying he unc ion ( ij)p ed = 1 when p( ( ij)
=1)
p ed > 0.5 o ( ij)p ed =0.
Fo compa ison, we also used a s a egy ha is no based on
pe u ba ion heo y. In his sense, besides he molecula
desc ip o s, we added condi ions c1,c
2, and c3as a sepa a e se
o ca ego ical a iables. A o al o 237 a iables we e needed o
ep esen all condi ions. Fil e ing using he a iance o each
ACS Omega h p://pubs.acs.o g/jou nal/acsod A icle
h ps://dx.doi.o g/10.1021/acsomega.0c03356
ACS Omega 2020, 5, 27211−27220
27217
a iable leads o a o al o 162 a iables, including ALOGP and
PSA.
The e alua ion o he disc iminan model was calcula ed
om Wilks’lambda (Λ) as ollows
Ä
Å
Å
Å
Å
Å
Å
Å
Å
É
Ñ
Ñ
Ñ
Ñ
Ñ
Ñ
Ñ
Ñ
λ
Λ
=+
1
1
whe e Λis chi-squa e dis ibu ed o d = (k−1), kis equal o
he numbe o pa ame e s es ima ed, and
Ä
Ç
Å
Å
Å
Å
Å
Å
Å
Å
É
Ö
Ñ
Ñ
Ñ
Ñ
Ñ
Ñ
Ñ
Ñ
λ
=∑−
∑−
ZZ
ZZ
()
()
j
ij i
2
2.
Fo ML, besides LDA, we also used neu al ne wo ks (NN)
wi h diffe en a chi ec u es. STATISTICA so wa e was used
in bo h cases. The final ne wo ks ob ained we e mul ilaye
pe cep on (MLP), linea neu al ne wo k (LNN), and adial
basis unc ion ne wo k (RBF). All hese ML s a egies we e
applied wi h pe u ba ion and nonpe u ba ion heo y. The
p edic ed 1 o 0 alues we e used o de e mine he specifici y
o ue-nega i e a e (Sp), sensi i i y o ue-posi i e a e (Sn),
and accu acy (Ac) when compa ed o he obse ed alues.
Thus, when ( ij)p e = ( ij)obs, he cases we e de e mined o be
co ec .
65
The me ics o e alua e he pe o mance o all he p edic ion
models we e Ac, Sn, and Sp using he ollowing o mulae
=
A
cnumbe o co ec ly classi ied compounds
o al numbe o compounds
=
S
nnumbe o co ec ly classi ied ac i e compounds
o al numbe o ac i e compounds
=
S
pnumbe o co ec ly classi ied inac i e compounds
o al numbe o inac i e compounds
■ASSOCIATED CONTENT
*
sıSuppo ing In o ma ion
The Suppo ing In o ma ion is a ailable ee o cha ge a
h ps://pubs.acs.o g/doi/10.1021/acsomega.0c03356.
ChEMBL da ase o an isa coma p eclinical expe imen-
al assays o he PTML model; esul s o he analyzed
models o sa coma biological ac i i ies; all he mul iple-
condi ion a e ages o all sa coma assays (XLSX)
■AUTHOR INFORMATION
Co esponding Au ho s
Alejand o Cab e a-And ade −G upo de Bio-
Quimioin o ma ica and Ca e a de En e me ı
a, Facul ad de
Ciencias de la Salud, Uni e sidad de Las Ame icas, Qui o
170125, Ecuado ; RNASA-IMEDIR, Compu e Sciences
Facul y, Uni e si y o A Co una, A Co una 15071, Spain;
o cid.o g/0000-0001-9702-6618; Email: aul.cab e a@
udla.edu.ec
Humbe Gonzalez-Díaz −Depa men o O ganic Chemis y
II and Basque Cen e o Biophysics, Uni e si y o Basque
Coun y UPV/EHU, Leioa 48940, Biscay, Spain; Ike basque,
Basque Founda ion o Science, Bilbao 48011, Biscay, Spain;
o cid.o g/0000-0002-9392-2797;
Email: [email p o ec ed]
Au ho s
And esLo
pez-Co e
s−RNASA-IMEDIR, Compu e Sciences
Facul y, Uni e si y o A Co una, A Co una 15071, Spain;
Cen o de In es igacion Gene ica y Genomica, Facul ad de
Ciencias de la Salud Eugenio Espejo, Uni e sidad UTE, Qui o
170129, Ecuado
C is ian R. Mun eanu −RNASA-IMEDIR, Compu e Sciences
Facul y, Uni e si y o A Co una, A Co una 15071, Spain;
Biomedical Resea ch Ins i u e o A Co una (INIBIC),
Uni e si y Hospi al Complex o A Co una (CHUAC), A
Co una 15006, Spain; Cen o de In es igacion en Tecnologi
as
de la In o macion y las Comunicaciones (CITIC), Campus de
El ina s/n, A Co una 15071, Spain; o cid.o g/0000-0002-
5628-2268
Alejand o Pazos −RNASA-IMEDIR, Compu e Sciences
Facul y, Uni e si y o A Co una, A Co una 15071, Spain;
Biomedical Resea ch Ins i u e o A Co una (INIBIC),
Uni e si y Hospi al Complex o A Co una (CHUAC), A
Co una 15006, Spain
Yunie kis Pe
ez-Cas illo −G upo de Bio-Quimioin o ma ica
and Escuela de Ciencias Fı
sicas y Ma ema icas, Uni e sidad de
Las Ame icas, Qui o 170125, Ecuado
Edua do Teje a −G upo de Bio-Quimioin o ma ica and
Facul ad de Ingenie ı
a y Ciencias Aplicadas, Uni e sidad de Las
Ame icas, Qui o 170125, Ecuado
Sonia A asa e −Depa men o O ganic Chemis y II and
Basque Cen e o Biophysics, Uni e si y o Basque Coun y
UPV/EHU, Leioa 48940, Biscay, Spain
Comple e con ac in o ma ion is a ailable a :
h ps://pubs.acs.o g/10.1021/acsomega.0c03356
Au ho Con ibu ions
&
A.C.-A. and A.L.-C. con ibu ed equally o he s udy.
No es
The au ho s decla e no compe ing financial in e es .
■ACKNOWLEDGMENTS
The au ho s acknowledge esea ch g an s om Minis y o
Economy and Compe i i eness, MINECO, Spain (FEDER
CTQ2016-74881-P), and Basque go e nmen (IT1045-16).
The au ho s also acknowledge he suppo o Ike basque,
Basque Founda ion o Science. This wo k was suppo ed by
Uni e sidad de Las Ame icas and he Collabo a i e P ojec in
Genomic Da a In eg a ion (CICLOGEN) PI17/01826 unded
by he Ca los III Heal h Ins i u e om he Spanish Na ional
Plan o Scien ific and Technical Resea ch and Inno a ion
2013−2016 and he Eu opean Regional De elopmen Funds
(FEDER)−“A way o build Eu ope”. This p ojec was also
suppo ed by he Gene al Di ec o a e o Cul u e, Educa ion
and Uni e si y Managemen o Xun a de Galicia ED431D
2017/16 and “D ug Disco e y Galician Ne wo k” e .
ED431G/01 and he “Galician Ne wo k o Colo ec al Cance
Resea ch”( e . ED431D 2017/23) and finally by he Spanish
Minis y o Economy and Compe i i eness o i s suppo
h ough he unding o he unique ins alla ion BIOCAI
(UNLC08-1E-002, UNLC13-13-3503) and he Eu opean
Regional De elopmen Funds (FEDER) by he Eu opean
Union. Addi ional suppo was offe ed by he Consolida ion
and S uc u ing o Compe i i e Resea ch Uni s−Compe i i e
Re e ence G oups (ED431C 2018/49), unded by he Minis y
o Educa ion, Uni e si y and Voca ional T aining o he Xun a
de Galicia endowed wi h EU FEDER unds.
ACS Omega h p://pubs.acs.o g/jou nal/acsod A icle
h ps://dx.doi.o g/10.1021/acsomega.0c03356
ACS Omega 2020, 5, 27211−27220
27218
■REFERENCES
(1) B ay, F.; Fe lay, J.; Soe joma a am, I.; Siegel, R. L.; To e, L. A.;
Jemal, A. Global cance s a is ics 2018: GLOBOCAN es ima es o
incidence and mo ali y wo ldwide o 36 cance s in 185 coun ies.
CA Cance J. Clin. 2018,68, 394−424.
(2) Hui, J. Y. C. Epidemiology and E iology o Sa comas. Su g. Clin.
No h Am. 2016,96, 901−914.
(3) Sidaway, P. Sa coma: Gene ic de e minan s o sa coma isk
e ealed. Na . Re . Clin. Oncol. 2016,13, 590.
(4) Thomas, D. M.; Ballinge , M. L. E iologic, en i onmen al and
inhe i ed isk ac o s in sa comas. J. Su g. Oncol. 2015,111, 490−495.
(5) HaDuong, J. H.; Ma in, A. A.; Skapek, S. X.; Masca enhas, L.
Sa comas. Pedia . Clin. No h Am. 2015,62, 179−200.
(6) Yang, J.; Ren, Z.; Du, X.; Hao, M.; Zhou, W. The ole o
mesenchymal s em/p ogeni o cells in sa coma: upda e and dispu e.
S em Cell In es ig. 2014,1, 18.
(7) Double, J.; Ba ass, N.; Ba na d, N. D.; Na a a nam, V. Toxici y
es ing in he de elopmen o an icance d ugs. Lance . Oncol. 2002,3,
438−442.
(8) Yap, T. A.; Sandhu, S. K.; Wo kman, P.; de Bono, J. S.
En isioning he u u e o ea ly an icance d ug de elopmen . Na . Re .
Cance 2010,10, 514−523.
(9) Williams, R. J.; Walke , I.; Takle, A. K. Collabo a i e app oaches
o an icance d ug disco e y and de elopmen : a Cance Resea ch UK
pe spec i e. D ug Disco e y Today 2012,17, 185−187.
(10) Heinemann, F.; Hube , T.; Meisel, C.; Bundschus, M.; Lese ,
U. Re lec ion o success ul an icance d ug de elopmen p ocesses in
he li e a u e. D ug Disco e y Today 2016,21, 1740−1744.
(11) Sun, J.; Wei, Q.; Zhou, Y.; Wang, J.; Liu, Q.; Xu, H. A
sys ema ic analysis o FDA-app o ed an icance d ugs. BMC Sys . Biol.
2017,11, 87.
(12) Ca alho-Sil a, D.; Pie leoni, A.; Pigna elli, M.; Ong, C.; Fumis,
L.; Ka amanis, N.; Ca mona, M.; Faulconb idge, A.; He cules, A.;
McAuley, E.; Mi anda, A.; Pea , G.; Spi ze , M.; Ba e , J.; Hulcoop,
D. G.; Papa, E.; Koscielny, G.; Dunham, I. Open Ta ge s Pla o m:
new de elopmen s and upda es wo yea s on. Nucleic Acids Res. 2019,
47, D1056−D1065.
(13) Mendez, D.; Gaul on, A.; Ben o, A. P.; Chambe s, J.; De Veij,
M.; Felix, E.; Maga inos, M. P.; Mosque a, J. F.; Mu owo, P.;
Nowo ka, M.; Go dillo-Ma anon, M.; Hun e , F.; Junco, L.;
Mugumba e, G.; Rod iguez-Lopez, M.; A kinson, F.; Bosc, N.;
Radoux, C. J.; Segu a-Cab e a, A.; He sey, A.; Leach, A. R. ChEMBL:
owa ds di ec deposi ion o bioassay da a. Nucleic Acids Res. 2019,47,
D930−D940.
(14) Gaul on, A.; He sey, A.; Nowo ka, M.; Ben o, A. P.; Chambe s,
J.; Mendez, D.; Mu owo, P.; A kinson, F.; Bellis, L. J.; Cib ian-Uhal e,
E.; Da ies, M.; Dedman, N.; Ka lsson, A.; Maga inos, M. P.;
O e ing on, J. P.; Papada os, G.; Smi , I.; Leach, A. R. The ChEMBL
da abase in 2017. Nucleic Acids Res. 2017,45, D945−D954.
(15) Lo, Y.-C.; Rensi, S. E.; To ng, W.; Al man, R. B. Machine
lea ning in chemoin o ma ics and d ug disco e y. D ug Disco e y
Today 2018,23, 1538−1546.
(16) Ali, M.; Ai okallio, T. Machine lea ning and ea u e selec ion
o d ug esponse p edic ion in p ecision oncology applica ions.
Biophys. Re . 2019,11,31−39.
(17) Wang, J.; Yun, D.; Yao, J.; Fu, W.; Huang, F.; Chen, L.; Wei, T.;
Yu, C.; Xu, H.; Zhou, X.; Huang, Y.; Wu, J.; Qiu, P.; Li, W. Design,
syn hesis and QSAR s udy o no el isa in analogues inspi ed Michael
accep o as po en ial an icance compounds. Eu . J. Med. Chem. 2018,
144, 493−503.
(18) Pogo zelska, A.; Sławinski, J.; Zołnowska, B.; Sza anski, K.;
Kawiak, A.; Chojnacki, J.; Ulenbe g, S.; Zielinska, J.; Bączek, T. No el
2-(2-alkyl hiobenzenesul onyl)-3-(phenylp op-2-ynylideneamino)-
guanidine de i a i es as po en an icance agen s - Syn hesis,
molecula s uc u e, QSAR s udies and me abolic s abili y. Eu . J.
Med. Chem. 2017,138, 357−370.
(19) Sławinski, J.; Sza anski, K.; Pogo zelska, A.; Zołnowska, B.;
Kawiak, A.; Macu , K.; Belka, M.; Bączek, T. No el 2-benzyl hio-5-
(1,3,4-oxadiazol-2-yl)benzenesul onamides wi h an icance ac i i y:
Syn hesis, QSAR s udy, and me abolic s abili y. Eu . J. Med. Chem.
2017,132, 236−248.
(20) Singh, H.; Kuma , R.; Singh, S.; Chaudha y, K.; Gau am, A.;
Ragha a, G. P. S. P edic ion o an icance molecules using hyb id
model de eloped on molecules sc eened agains NCI-60 cance cell
lines. BMC Cance 2016,16, 77.
(21) To opo , A. A.; To opo a, A. P.; Ben ena i, E.; Gini, G.;
Leszczynska, D.; Leszczynski, J. SMILES-based QSAR app oaches o
ca cinogenici y and an icance ac i i y: compa ison o co ela ion
weigh s o iden ical SMILES a ibu es. An i-Cance Agen s Med.
Chem. 2011,11, 974−982.
(22) Gonzalez-Díaz, H.; Bone , I.; Te an, C.; De Cle cq, E.; Bello,
R.; Ga cía, M. M.; San ana, L.; U ia e, E. ANN-QSAR model o
selec ion o an icance leads om s uc u ally he e ogeneous se ies o
compounds. Eu . J. Med. Chem. 2007,42, 580−585.
(23) Gonzalez-Díaz, H.; Vina, D.; San ana, L.; de Cle cq, E.; U ia e,
E. S ochas ic en opy QSAR o he in silico disco e y o an icance
compounds: p edic ion, syn hesis, and in i o assay o new pu ine
ca banucleosides. Bioo g. Med. Chem. 2006,14, 1095−1107.
(24) Gonzales-Díaz, H.; Gia, O.; U ia e, E.; He nadez, I.; Ramos,
R.; Cha iano, M.; Seijo, S.; Cas illo, J. A.; Mo ales, L.; San ana, L.;
Akpaloo, D.; Molina, E.; C uz, M.; To es, L. A.; Cab e a, M. A.
Ma ko ian chemicals ″in silico″design (MARCH-INSIDE), a
p omising app oach o compu e -aided molecula design I: disco e y
o an icance compounds. J. Mol. Model. 2003,9, 395−407.
(25) Jung, M.; Kim, H.; Kim, M. Chemical genomics s a egy o he
disco e y o new an icance agen s. Cu . Med. Chem. 2003,10, 757−
762.
(26) Shi, L. M.; Fan, Y.; Mye s, T. G.; O’Conno , P. M.; Paull, K. D.;
F iend, S. H.; Weins ein, J. N. Mining he NCI an icance d ug
disco e y da abases: gene ic unc ion app oxima ion o he QSAR
s udy o an icance ellip icine analogues. J. Chem. In . Compu . Sci.
1998,38, 189−199.
(27) Ba e ina, J.; Caponig o, G.; S ansky, N.; Venka esan, K.;
Ma golin, A. A.; Kim, S.; Wilson, C. J.; Leha , J.; K yuko , G. V.;
Sonkin, D.; Reddy, A.; Liu, M.; Mu ay, L.; Be ge , M. F.; Monahan, J.
E.; Mo ais, P.; Mel ze , J.; Ko ejwa, A.; Jane-Valbuena, J.; Mapa, F. A.;
Thibaul , J.; B ic-Fu long, E.; Raman, P.; Shipway, A.; Engels, I. H.;
Cheng, J.; Yu, G. K.; Yu, J.; Aspesi, P.; de Sil a, M.; Jag ap, K.; Jones,
M. D.; Wang, L.; Ha on, C.; Palescandolo, E.; Gup a, S.; Mahan, S.;
Sougnez, C.; Ono io, R. C.; Lie eld, T.; MacConaill, L.; Winckle ,
W.; Reich, M.; Li, N.; Mesi o , J. P.; Gab iel, S. B.; Ge z, G.; A dlie,
K.; Chan, V.; Mye , V. E.; Webe , B. L.; Po e , J.; Wa mu h, M.;
Finan, P.; Ha is, J. L.; Meye son, M.; Golub, T. R.; Mo issey, M. P.;
Selle s, W. R.; Schlegel, R.; Ga away, L. A. The Cance Cell Line
Encyclopedia enables p edic i e modelling o an icance d ug
sensi i i y. Na u e 2012,483, 603−607.
(28) To opo , A. A.; To opo a, A. P.; Ben ena i, E.; Gini, G.;
Leszczynska, D.; Leszczynski, J. CORAL: classi ica ion model o
p edic ions o an i-sa coma ac i i y. Cu . Top. Med. Chem. 2012,12,
2741−2744.
(29) Vos, H. I.; Coenen, M. J. H.; Guchelaa , H.-J.; Ma oeska, D.; e
Loo, D. M. The ole o pha macogene ics in he ea men o
os eosa coma. D ug Disco e y Today 2016,21, 1775−1786.
(30) Vama he an, J.; Cla k, D.; Czod owski, P.; Dunham, I.; Fe an,
E.; Lee, G.; Li, B.; Madabhushi, A.; Shah, P.; Spi ze , M.; Zhao, S.
Applica ions o machine lea ning in d ug disco e y and de elopmen .
Na . Re . D ug Disco e y 2019,18, 463−477.
(31) Blazquez-Ba badillo, C.; A anzamendi, E.; Coya, E.; Le e, E.;
So omayo , N.; Gonzalez-Díaz, H. Pe u ba ion heo y model o
eac i i y and enan ioselec i i y o palladium-ca alyzed Heck-Heck
cascade eac ions. RSC Ad . 2016,6, 38602−38610.
(32) M Casanola-Ma in, G.; Le-Thi-Thu, H.; Pe ez-Gimenez, F.;
Ma e o-Ponce, Y.; Me ino-Sanjuan, M.; Abad, C.; Gonzalez-Díaz, H.
Mul i-ou pu Model wi h Box-Jenkins Ope a o s o Quad a ic Indices
o P edic ion o Mala ia and Cance Inhibi o s Ta ge ing Ubiqui in-
P o easome Pa hway (UPP) P o eins. Cu . P o ein Pep . Sci. 2016,
17, 220−227.
ACS Omega h p://pubs.acs.o g/jou nal/acsod A icle
h ps://dx.doi.o g/10.1021/acsomega.0c03356
ACS Omega 2020, 5, 27211−27220
27219