scieee Science in your language
[en] (orig)

Lessons learned from the legacy of bioinformatic resources at SIB Swiss Institute of Bioinformatics

Author: Zahn, Monique; Moretti, Sébastien; Bucher, Philipp; Duvaud, Severine
Publisher: Zenodo
DOI: 10.5281/zenodo.17698526
Source: https://zenodo.org/records/17698526/files/2025_Zahn_ResourceLegacy.pdf
Lessons lea ned om he legacy o bioin o ma ic esou ces a SIB Swiss Ins i u e o
Bioin o ma ics
Monique Zahn-Zabal1, Sebas ien Mo e i2, Philipp Buche 2 and Se e ine Du aud1*
1Bioda a Resou ces G oup, SIB Swiss Ins i u e o Bioin o ma ics, Lausanne, Swi ze land.
2Vi al-IT G oup, SIB Swiss Ins i u e o Bioin o ma ics, Lausanne, Swi ze land.
*Co esponding au ho .
Abs ac
The li e cycle o a bioin o ma ics esou ce has been desc ibed as consis ing o he p oo -o -
concep phase, he eme ging phase, he ma u e phase, and he legacy phase. A a ime when
he FAIR p inciples (Findable, Accessible, In e ope able, Reusable) a e being applied o
bioin o ma ic esou ces, ques ions a ound eusabili y and a chi ing o esou ce componen s—
especially ensu ing hey a e in e ope able— a e una oidable. Since 2000, he Swiss S a e
Sec e a ia o Educa ion, Resea ch and Inno a ion (SERI) has manda ed and unded he SIB
Swiss Ins i u e o Bioin o ma ics o iden i y, suppo , and de elop key open bioin o ma ics
esou ces. SIB po olio comp ises bo h eme ging and ma u e esou ces. When a esou ce
ends—due o insu icien unding, e i emen o mo e o he P incipal In es iga o ou side o
Swi ze land —SIB ensu es ha he esou ces emain a ailable i s ill ele an o he scien i ic
communi y. He e, we p esen use cases ha p o ide insigh in o cu en and pas p ac ices, as
well as lessons lea ned.
In oduc ion
Bioin o ma ics is a ield in which biological da a is managed, analysed, and isualized using
compu a ional ools and echniques. In i s in ancy, bioin o ma ic da a deal wi h DNA
sequences, p o ein s uc u es, and gene exp ession da a; howe e , p esen day bioin o ma ics
in ol es omics da a, such as genomics, p o eomics, ansc ip omics, and me abolomics, and
he use o machine lea ning o a i icial in elligence o deal wi h he la ge olume o biological
da a. In his con ex , bioin o ma ics esou ces include da abases, knowledgebases, da a se s,
so wa e ools, wo k lows o pipelines, machine lea ning models. Bioin o ma ic esou ces hus
play a c ucial ole in p o iding he scien i ic and medical communi y. Gi en he conside able
ime, expe ise, and inancial in es men equi ed o hei de elopmen and main enance,
ex ending hei li espan has become a p io i y. A chi ing a esou ce allows he scien i ic
communi y o con inue o bene i om i , a he han duplica ing e o s, he eby op imizing
unding and ex ending he knowhow ound in hese esou ces.
The li e cycle o a bioin o ma ics esou ce consis s o ou phases (Gabella e al., 2022). The i s
phase is he p oo -o -concep phase, in which a esea ch p ojec de elops a esou ce o
esea ch pu poses. Fu he de elopmen on he esou ce akes place in he eme ging phase.
These i s wo phases o m he esea ch s age o a esou ce. Wi h ime, he esou ce ma u es
and, i success ul, may be conside ed as essen ial in as uc u e. I he decision is aken o
a chi e he esou ce, i en e s he legacy s a e. The in e na ional li e science communi y has
been using he eely a ailable da abases p o ided by SIB since i s ounda ion in 1998. These, as
well as o he esou ces, a e lis ed in Expasy, he Swiss bioin o ma ics esou ce po al (Du aud
e al., 2021). The esou ces ound in Expasy consis ed o eme ging and ma u e esou ces un il
ecen ly and include i e Global Co e Bioda a Resou ces (GCBR) conside ed aluable
in as uc u e.
While guidelines exis o he de elopmen (Schul heiss, 2011; Helmy e al., 2016) and
managemen o bioin o ma ics esou ces (Gabella e al., 2022), li le a en ion has been gi en
o he ansi ion o an eme ging o ma u e s a e esou ce o a legacy s a e, e med
"sunse ing". Fac o s such as loss o unding, key pe sonnel, use s, o echnological changes
can lead o such a ansi ion and he esul ing disappea ance o he esou ce is no an
uncommon occu ence. A s udy by A wood e al. (2015) showed ha he 18-yea su i al a e o
326 bioin o ma ics da abases was only a ound 25%, and he e is no e idence o sugges an
imp o emen in he las decade (Imke e al., 2023). The sudden disappea ance o a
bioin o ma ics da abase could ha e signi ican impac , leading o he loss o aluable esea ch
da a, especially i he da abase held cu a ed in o ma ion essen ial o a ious esea ch p ojec s.
Bioin o ma ics ools a e simila ly i al o analysing biological da a. The disappea ance o a
widely used ool can dis up ongoing esea ch p ojec s ha ely on i o da a analysis, leading
o dec eased p oduc i i y and e iciency. Resea che s may spend aluable ime sea ching o
al e na i e ools o adap ing hei wo k lows o compensa e o he missing ool.
In he absence o guidelines o sunse ing esou ces and conscious o he impac his may
ha e on he scien i ic communi y, we explo ed he legacy o bioin o ma ic esou ces de eloped
when he SIB was in i s in ancy. Faced wi h esou ces de eloped by SIB g oups whose heads a e
now e i ing, we had o se up a p ocess o minimize he impac his has on he long- e m use o
he esou ce. In his pape , we compa e he li e cycle o h ee esou ces de eloped a SIB - he
CleanEx da abase, he EPD web esou ce, and he neX P o knowledgebase – and con as hei
legacy.
The CleanEx Da abase
The CleanEx (P az e al., 2004) da abase o he e ogeneous gene exp ession da a was de eloped
by Vi iane P az while doing he hesis in he g oup o Philipp Buche a SIB. Fu he de elopmen
ook place du ing Vi iane's pos -doc (P az and Buche , 2009) bu wo k on he esou ce s opped
when she le he g oup, a equen scena io. A he ime, i was a e o a esou ce o be
a chi ed so, al hough he CleanEx la iles and ools we e publicly a ailable online, his is no
longe he case.
In he case o CleanEx, he esou ce wen om he eme ging phase o he legacy s a e because
o he loss o key pe sonnel. The legacy o he p ojec consis s o h ee publica ions (including
Vi iane's hesis), o which only wo a e cu en ly accessible. The me ada a o he esou ce is in
wo egis ies, Da abase Commons (CNCB-NGDC Membe s and Pa ne s, 2022) and bio. ools
(Ison e al., 2019). Howe e , he en ies do no show he legacy s a e o he esou ce. In e ms o
FAIRness, he CleanEx da a and so wa e ools hemsel es a e no longe FAIR; howe e , he
me ada a o CleanEx is.
Wha was he impac on he scien i ic communi y? Concomi an o he de elopmen o
CleanEx, he communi y had access o he GEO (Edga e al., 2022) and A ayExp ess
(Pa kinson e al., 2007) eposi o ies. Today i has access o a numbe o ma u e s age esou ces
o which wo, Bgee, a da abase use s can use o e ie e and compa e gene exp ession pa e ns
in mul iple animal species (Bas ian e al., 2025) and Gene Exp ession Da abase (GXD), a
esou ce o mouse de elopmen al gene exp ession in o ma ion (Balda elli e al., 2021), a e
GCBR.
The EPD Web Resou ce
The Euka yo ic P omo e Da abase (EPD) was de eloped by Philipp Buche and his g oup a SIB.
O iginally a manually cu a ed compila ion o p omo e sequences which p ima ily ocusing on
ansc ip ion s a si es (TSS), a new sec ion was added called EPDnew (D eos e al., 2013). A
ime o w i ing his web esou ce which ca alogues expe imen ally de e mined RNA polyme ase
II p omo e s in euka yo es is a ailable online a h ps://epd.expasy.o g/epd/ and a o al o nine
publica ions desc ibing he esou ce a e lis ed in PubMed, he i s in 1997 and mos ecen in
2020.
Philipp's e i emen , he loss o unding o u he de elop EPD, and he unsuccess ul sea ch o
a successo o con inue o de elop he esou ce o ano he esou ce o in eg a e he da a,
mean ha his ma u e esou ce needed o be p epa ed o he legacy s a e. As he esou ce is
s ill online, widely used and no equi alen esou ce exis s, he decision was aken in 2022 o
a chi e all aluable pa s. To de e mine which elemen s o he esou ce should be a chi ed, ou
decision was guided by wha could be o use o he scien i ic communi y in he u u e - he da a
which may be in eg a ed in ano he esou ce, he code as he me hod can be applied o o he
species, and he websi e in case a p o ide wishes o e i e he esou ce. In his way, he
scien i ic communi y will be able o build on he Philipp's legacy.
Philipp in en o ied he da a and code (applica ions) which needed o be made open and FAIR o
enable u u e use. The da a publicly a ailable o download on he esou ce was deposi ed in
Zenodo (Buche and Mo e i, 2025) as no domain-speci ic eposi o y was ound. Indi idual
comp essed iles o each o he i een model o ganism genomes, e o ma ed ollowing he
EPD s anda d, a e now a ailable unde a C ea i e Commons Ze o 1.0 Uni e sal license
pe mi ing euse. The copy igh has been wai ed, and i is now in he wo ld-wide public
domain. The p i a e EPD code was cleaned, e ised, documen ed, and con aine ized using
Docke be o e being mo ed o Gi Hub and made public a h ps://gi hub.com/sib-swiss/EPD
unde a GNU Gene al Public License 3.0 o allow euse. The eposi o y has he code o un he
EPD websi e; he e a e web and managemen sc ip s, as well as he EPD da abase dumps.
Finally, an end-o -se ice o " ombs one" page wi h a desc ip ion o he esou ce and links o
he a chi ed da a, code and websi e was p epa ed such ha , when EPD is no longe a ailable
online, i can be pu online o use s o he esou ce o know which elemen s a e a ailable and
whe e.
The neX P o Knowledgebase
Es ablished in 2010, he neX P o knowledge pla o m o human p o eins was buil on he
knowledge on human p o eins ound in UniP o (Lane e al., 2012). I employed inno a i e
seman ic echnologies o in eg a e da a om genomics, ansc ip omics, and p o eomics, and
p oposed ad anced ools o help he explo a ion and que ying o his da a. Be ween 2011 and
2023, neX P o was he p ima y e e ence esou ce o he Human P o eome P ojec (HPP).
Se en publica ions desc ibing he esou ce a e lis ed in PubMed. The loss o unding and he
absence o esou ces willing o in eg a e pa o he en i e y o he da a led o ansi ioning his
ma u e esou ce o he legacy s a e.
Be o e neX P o wen o line, he da a o all he eleases we e a ailable o download. The da a
om h ee anno a ion p ojec s we e in indi idual po als, as well he da a o he unc ional
p o eome p ojec . All he da a we e expo ed in o he TSV in e ope able o ma and README
iles p epa ed which desc ibe he con en . Again, he di e en da a we e deposi ed in Zenodo in
he absence o a sui able domain-speci ic eposi o y unde he C ea i e Commons A ibu ion
4.0 In e na ional which allows euse. The code o he Fea u e-Viewe (Paladin e al., 2020), a
isualiza ion ool o posi ional anno a ions on a sequence, which was on Gi Hub was al eady
a ailable in Zenodo. Addi ional code, albei no all, o he neX P o p ojec s and esou ces was
also publicly a ailable on Gi Hub a h ps://gi hub.com/calipho-sib. Regis ies ha ing neX P o
en ies we e con ac ed, and he en ies upda ed o show i s legacy s a e. The esou ce URL was
edi ec ed o he end-o -se ice page on Expasy a h ps://www.expasy.o g/a chi es/nex p o . In
his i s a chi e page on Expasy, links o he da a se s, he Fea u e-Viewe code, as well as he
public neX P o p ojec code a e p o ided.
The Legacy o CleanEx, EPD and neX P o
The ansi ion o legacy s a e o he h ee SIB esou ces p esen ed was due o he loss o key
pe sonnel and/o unding. While he CleanEx da a and code a e no longe accessible, he
me ada a and publica ions a es o his ea ly e o . This meag e legacy had no impac on he
scien i ic communi y. Fo EPD and neX P o , no successo was ound o ake o e he unning o
he esou ce, and no o he esou ce was ound o in eg a e he da a. The legacy o hese wo
esou ces is FAIR (me a)da a and code (a leas in pa o neX P o ), as well as he publica ions
desc ibing he esou ces. The alue and impac o a chi ing hese bioin o ma ic esou ces will
be seen in he coming yea s. The accompanying documen a ion and in e ope abili y o he
o ma s used in he a chi es will be c i ical o whe he u u e bioin o ma icians can e-use hem
o no .
Lessons Lea ned
Bioin o ma ic da a esou ces ha e su i ed despi e p eca ious unding and epea ed calls o
und hem as c i ical scien i ic in as uc u es, a he han as esea ch p ojec s (Po eda e al.,
2025). UniP o , he wo ld’s leading high-quali y, comp ehensi e and eely accessible esou ce
o p o ein sequence and unc ional in o ma ion (UniP o Conso ium, 2025), has aced wo
unding c ises, se e al changes in he eam leade and he ansi ion om being a esou ce
p o ided by a single ins i u e o ha p o ided by a conso ium – ye i con inues o be a i al
esou ce o he li e science communi y. Howe e , gi en ha he p obabili y ha a esou ce will
no su i e is high, ma u e esou ce p o ide s need o ake sui able measu es.
While esou ce p o ide s ha e he esponsibili y o lea e hei legacy in o de , hei ins i u ions
also play a ole in ensu ing hey do so. Wha lessons ha e we lea ned in ou p ocess o
ansi ioning esou ces o he legacy s a e?
1. Use ile o ma s sui able o long- e m a chi ing om he beginning. In e ope abili y
is key o sha ing and eusing.
2. S a sunse ing p epa a ions ea ly. Funding o pe sonnel may no longe be a ailable
o a chi e you esou ce i you wai un il he las minu e.
3. T y o ind a successo . As he esou ce p o ide , you a e he bes pe son o ind a
sui able colleague o adop you esou ce o a esou ce which will include you da a.
4. Keep only wha is use ul. A chi es ha e a size limi , and long- e m s o age comes a a
p ice, so so h ough and decide wha elemen s may be use ul o he communi y.
5. P o ide he sou ce code. Conside he eason o no making you sou ce code publicly
a ailable a a ime when his has become he no m and ensu es ep oducibili y.
6. Documen da a, code, and p ocesses ex ensi ely. This is usually a pain poin , bu i
also makes i easie o onboa d new eam membe s and o o he s o euse and ex end
you wo k.

7. Ensu e licenses a e as pe missi e as possible. You legacy mus come wi h a license
de ining i s euse, whe he o i s o iginal pu pose (con en ional euse) o o ul il a
di e en unc ion (c ea i e euse o epu posing).
8. P o ide me ada a o ensu e indabili y. I you esou ce canno be ound, i is as
hough i ne e exis ed.
9. Make you legacy publicly a ailable o he long- e m. Choose he eposi o y wisely
so ha he scien i ic communi y will ha e access.
10. Upda e he in o ma ion in egis ies. Le you u u e use s know whe e o ind you
legacy by p o iding an end-o -se ice page and upda ing he in o ma ion on you
esou ce in egis ies.
Acknowledgmen s
We hank Ch is ine Du inx o he aluable sugges ions and Ch is ophe Dessimoz o his
pe spec i e on esou ce managemen , as well as Amos Bai och o e iewing he manusc ip .
Funding
This wo k ecei ed unding om he Swiss S a e Sec e a ia o Educa ion, Resea ch and
Inno a ion (SERI).
Con lic o in e es s a emen . None decla ed.
Re e ences
Gabella C, Du aud S, Du inx C. Managing he li e cycle o a po olio o open da a esou ces a
he SIB Swiss Ins i u e o Bioin o ma ics. B ie Bioin o m. 2022 Jan 17;23(1):bbab478. doi:
10.1093/bib/bbab478. PMID: 34850820; PMCID: PMC8769900.
Du aud S, Gabella C, Lisacek F, S ockinge H, Ioannidis V, Du inx C. Expasy, he Swiss
Bioin o ma ics Resou ce Po al, as designed by i s use s. Nucleic Acids Res. 2021 Jul
2;49(W1):W216-W227. doi: 10.1093/na /gkab225. PMID: 33849055; PMCID: PMC8265094.
Schul heiss SJ. Ten simple ules o p o iding a scien i ic Web esou ce. PLoS Compu Biol.
2011 May;7(5):e1001126. doi: 10.1371/jou nal.pcbi.1001126. Epub 2011 May 26. PMID:
21637800; PMCID: PMC3102757.
Helmy M, C i s-Ch is oph A, Bade GD. Ten Simple Rules o De eloping Public Biological
Da abases. PLoS Compu Biol. 2016 No 10;12(11):e1005128. doi:
10.1371/jou nal.pcbi.1005128. PMID: 27832061; PMCID: PMC5104318.
A wood TK, Agi B, Ellis BM. Longe i y o Biological Da abases. EMBne J. 2015;21:e803. doi:
10.14806/ej.21.0.803.
Imke HJ, Schacka KE 3 d, Is a e AM, Cook CE. A machine lea ning-enabled open bioda a
esou ce in en o y om he scien i ic li e a u e. PLoS One. 2023 No 28;18(11):e0294812. doi:
10.1371/jou nal.pone.0294812. PMID: 38015968; PMCID: PMC10684096.
P az V, Jaganna han V, Buche P. CleanEx: a da abase o he e ogeneous gene exp ession da a
based on a consis en gene nomencla u e. Nucleic Acids Res. 2004 Jan 1;32(Da abase
issue):D542-7. doi: 10.1093/na /gkh107. PMID: 14681477; PMCID: PMC308841.
P az V, Buche P. CleanEx: new da a ex ac ion and me ging ools based on MeSH e m
anno a ion. Nucleic Acids Res. 2009 Jan;37(Da abase issue):D880-4. doi: 10.1093/na /gkn878.
PMID: 19073704; PMCID: PMC2686468.
CNCB-NGDC Membe s and Pa ne s. Da abase Resou ces o he Na ional Genomics Da a
Cen e , China Na ional Cen e o Bioin o ma ion in 2022. Nucleic Acids Res. 2022 Jan
7;50(D1):D27-D38. doi: 10.1093/na /gkab951. PMID: 34718731; PMCID: PMC8728233.
Ison J, Ienasescu H, Chmu a P, Rydza E, Ménage H, Kalaš M, Schwämmle V, G üning B, Bea d
N, Lopez R, Du aud S, S ockinge H, Pe sson B, Vařeko á RS, Raček T, Vond ášek J, Pe e son H,
Salume s A, Jonassen I, Hoo R, Ny önen T, Valencia A, Capella S, Gelpí J, Zambelli F, Sa akis B,
Leskošek B, Rapacki K, Blanche C, Jimenez R, Oli ei a A, V iend G, Collin O, an Helden J,
Løng een P, B unak S. The bio. ools egis y o so wa e ools and da a esou ces o he li e
sciences. Genome Biol. 2019 Aug 12;20(1):164. doi: 10.1186/s13059-019-1772-6. PMID:
31405382; PMCID: PMC6691543.
Edga R, Dom ache M, Lash AE. Gene Exp ession Omnibus: NCBI gene exp ession and
hyb idiza ion a ay da a eposi o y. Nucleic Acids Res. 2002 Jan 1;30(1):207-10. doi:
10.1093/na /30.1.207. PMID: 11752295; PMCID: PMC99122.
Pa kinson H, Kapushesky M, Shoja alab M, Abeygunawa dena N, Coulson R, Fa ne A, Holloway
E, Kolesnyko N, Lilja P, Lukk M, Mani R, Rayne T, Sha ma A, William E, Sa kans U, B azma A.
A ayExp ess--a public da abase o mic oa ay expe imen s and gene exp ession p o iles.
Nucleic Acids Res. 2007 Jan;35(Da abase issue):D747-50. doi: 10.1093/na /gkl995. Epub 2006
No 28. PMID: 17132828; PMCID: PMC1716725.
Bas ian FB, Camma a a AB, Ca sana o S, De e ing H, Huang WT, Joye S, Niknejad A, Nyama i M,
Mendes de Fa ias T, Mo e i S, Tzi anopoulou M, Wollb e J, Robinson-Recha i M. Bgee in 2024:
ocus on cu a ed single-cell RNA-seq da ase s, and que y ools. Nucleic Acids Res. 2025 Jan
6;53(D1):D878-D885. doi: 10.1093/na /gkae1118. PMID: 39656924; PMCID: PMC11701651.
Balda elli RM, Smi h CM, Finge JH, Hayamizu TF, McC igh IJ, Xu J, Shaw DR, Beal JS, Blodge
O, Campbell J, Co bani LE, F os PJ, Gianna o SC, Mie s DB, Kadin JA, Richa dson JE, Ringwald
M. The mouse Gene Exp ession Da abase (GXD): 2021 upda e. Nucleic Acids Res. 2021 Jan
8;49(D1):D924-D931. doi: 10.1093/na /gkaa914. PMID: 33104772; PMCID: PMC7778941.
Balda elli RM, Smi h CM, Finge JH, Hayamizu TF, McC igh IJ, Xu J, Shaw DR, Beal JS, Blodge
O, Campbell J, Co bani LE, F os PJ, Gianna o SC, Mie s DB, Kadin JA, Richa dson JE, Ringwald
M. The mouse Gene Exp ession Da abase (GXD): 2021 upda e. Nucleic Acids Res. 2021 Jan
8;49(D1):D924-D931. doi: 10.1093/na /gkaa914. PMID: 33104772; PMCID: PMC7778941.
D eos R, Amb osini G, Ca in Pé ie R, Buche P. EPD and EPDnew, high-quali y p omo e
esou ces in he nex -gene a ion sequencing e a. Nucleic Acids Res. 2013 Jan;41(Da abase
issue):D157-64. doi: 10.1093/na /gks1233. Epub 2012 No 27. PMID: 23193273; PMCID:
PMC3531148.
Buche P, Mo e i S. EPDnew genomes [Da a se ]. In Nucleic Acids Resea ch (Vol. 41, Numbe
D1, pp. D157–164). Zenodo. 2025 doi: 10.5281/zenodo.15706069.
Lane L, A goud-Puy G, B i an A, Cusin I, Duek PD, E ale O, Ga eau A, Gaude P, Gleizes A,
Masselo A, Zwahlen C, Bai och A. neX P o : a knowledge pla o m o human p o eins. Nucleic
Acids Res. 2012 Jan;40(Da abase issue):D76-83. doi: 10.1093/na /gk 1179. Epub 2011 Dec 1.
PMID: 22139911; PMCID: PMC3245017.
Paladin L, Schae e M, Gaude P, Zahn-Zabal M, Michel PA, Pio esan D, Tosa o SCE, Bai och A.
The Fea u e-Viewe : a isualiza ion ool o posi ional anno a ions on a sequence.
Bioin o ma ics. 2020 May 1;36(10):3244-3245. doi: 10.1093/bioin o ma ics/b aa055. PMID:
31985787.
Po eda L, Fa ell G, Tosa o S, Zahn M, Ruch P, Gobeill J, Wa e house R, Dessimoz C. The
missing link in FAIR da a policy: da a esou ces. Zenodo. 2025 doi: 10.5281/zenodo.15724104.
UniP o Conso ium. UniP o : he Uni e sal P o ein Knowledgebase in 2025. Nucleic Acids Res.
2025 Jan 6;53(D1):D609-D617. doi: 10.1093/na /gkae1010. PMID: 39552041; PMCID:
PMC11701636.