scieee Science in your language
[en] (orig)

2Pipe: It starts with a question. Matching you with the correct pipeline for MAG reconstruction (Pipeline description)

Author: Yepes García, Jeferyd; Falquet, Laurent
Publisher: Zenodo
DOI: 10.5281/zenodo.17335110
Source: https://zenodo.org/records/17335110/files/Additional_file_1.pdf
2Pipe: I s a s wi h a ques ion. Ma ching you wi h he co ec
pipeline o MAG econs uc ion
Je e yd Yepes-Ga cíaa,b, Lau en Falque a,b,*
aDepa men o Biology, Uni e si y o F ibou g, F ibou g, Can on o F ibou g, 1700, Swi ze land
bSwiss Ins i u e o Bioin o ma ics, Lausanne, Vaud, 1015, Swi ze land
*Co espondence should be add essed o L.F (lau en . alque @uni .ch)
__________________________________________________________________________________
Desc ip i e pipeline o e iew
Below we p esen a desc ip i e o e iew o he main wo k low o each pipeline o pla o m, whe e
impo an echnical conside a ions such as he ype o inpu (sho eads, long eads o bo h), key ools
employed a each s ep, ad an ages, limi a ions and/o special ea u es hey depic a e documen ed.
1. Sho - ead cen e ed pipelines
1.1 An i’o1
An i’o is a comp ehensi e modula pla o m o he analysis and isualiza ion o mic obial
omics including, bu no es ic ed o, me agenomics, me a ansc ip omics and me apangenomics.
An i’o is de eloped o be highly cus omizable h ough exchangeable p og ams ( ools) ha pe o m
speci ic asks, empowe ing he use wi h a wide ange o ools o explo e. Being so, a me agenomics
wo k low is p oposed by he de elope s o he pla o ms ha begins wi h sho - ead quali y cleaning,
p oceeds o ead assembly o be used o ead ec ui men (mapping), and inalizes con ig anno a ion
( unc ions, Hidden Ma ko Models, and axonomy). Op ionally, he use can achie e ead axonomic
p o iling wi h K akenUniq2, and mo e ecen ly binning ools ha e been made a ailable such as
Me aBAT23, CONCOCT4, MaxBin25 and BinSani y6, as well as DASTool7 as a e inemen al e na i e.
None heless, he use mus un he analysis manually, equi ing hem o accoun wi h some expe ience
ega ding so wa e ins alla ion, execu ion and debugging. Mo eo e , al hough An i’o is in p inciple a
command line ool, i inco po a es a use - iendly g aphical in e ace o da a inspec ion and
isualiza ion ha is commonly used o con ig isualiza ion.
1.2 BugBus e 8
BugBus e is an au oma ic, modula , and ep oducible Nex low9 (DSL2) wo k low wi h
specialized modules o axonomic p o iling and esis ome cha ac e iza ion. I s wo k low
encompasses he ollowing s eps: ini ial eads p ocessing o quali y il e ing and hos con amina ion
emo al (Bow ie210); axonomic p o iling a he ead le el using ools like K aken211/B acken12 o
Sou mash13; and an ibio ic esis ance gene (ARG) p edic ion om eads using KARGA14 and
KARGVA15. The assembly is ca ied ou wi h MEGAHIT16, ollowed by axonomic and unc ional
anno a ion o con igs using BLAST17, BlobTools18, DeepARG19, and Me aCe be us20. A e wa ds, he
con igs a e binned wi h ools such as Me aBAT23, SemiBin221 and COMEBin22, and e ined hem wi h
a Me aWRAP23-na i e module; he quali y is assessed wi h CheckM224, and he MAGs a e
axonomically a ilia ed wi h GTDB-Tk225. BugBus e is ully con aine ized (Docke ) aiming a ensu ing
ease o ins alla ion, high ep oducibili y, and deploymen ac oss a ious compu a ional en i onmen s.
Mo eo e , BugBus e s ands ou gi en i s inclusion o speci ic ools o cha ac e ize and quan i y genes
associa ed wi h an ibio ic esis ance.
1.3 DATMA26
DATMA (Dis ibu ed AuToma ic Me agenomic Assembly and anno a ion amewo k) is a
pipeline ocused on speed and au oma ion, le e aging dis ibu ed compu ing o e iciency. As a
s a ing poin , DATMA applies a quali y il e wi h RAPPIFILT (cus omized ool de eloped o his
pipeline), T immoma ic27 and Fas QC28, and i he inpu sequences a e pai ed-end, i me ges hem
using FLASH229 and Fo ceMe ge. Following his p ocedu e, his pipeline iden i ies and emo es 16S
DNA sequences based on RFAM30 (RNA sequence amilies), NCBI31, Ribosomal Da abase P ojec
(RDP)32 and SILVA33 o clus e he emaining sequences wi h CLAME34. The clus e s (o bins in
de ini ion o he adi ional wo k low) gene a ed hen a e assembled in ba ches by me aSPAdes35,
Vel e 36, and MEGAHIT16 o a subsequen axonomic anno a ion elying on BLAST37 and Kaiju38, as
well as ORF p edic ion wi h P odigal39 and GeneMa k40. To conclude wi h he analysis a de ailed HTML
epo is gene a ed wi h in e ac i e K ona41 plo s o axonomic isualiza ion; his epo in eg a es he
16S DNA anno a ion (RDP Classi ied) along wi h he anno a ed bins. As in e ed om he desc ibed
wo k low, DATMA pe o ms an in e ed app oach o gene a e bins by i s g ouping he eads using
CLAME and a emp ing o assemble only hese g oups indi idually a e wa ds. Fu he , his pipeline is
w apped by COMP Supe scala which acili a es he de elopmen and execu ion o pa allel
applica ions o dis ibu ed in as uc u es such as clus e s, cloud se ices and con aine ized
pla o ms.
1.4 EasyMe agenome42
EasyMe agenome in eg a es a classical wo k low s a ing wi h sho eads o p o ide a de-
eplica ed (dRep43) se o bins and pangenome analysis ha elies on an An i’o module. The assembly
is pe o med wi h MEGAHIT16, a Me aWRAP23 module is in cha ge o he binning ask, CheckM224
con ols he quali y o he bins, and GTDB-Tk225 inalizes he execu ion by axonomically anno a ing
hem. No ably, his pipeline pe o ms unc ional anno a ion (Ghos KOALA44, eggNOG45, dbCAN346)
and axonomy assignmen on he con igs a e a p e- il e ing s ep ha gene a es a non- edundan
gene se . EasyMe agenome uses Conda en i onmen s o assu e ep oducibili y, he use can inpu
mul i-sample da a, al hough i is no o ches a ed by any wo k low manage . As special ema ks, i
ca ies ou a axonomic p o iling (Me aPhlAn47, HUMAnN348, K aken211) o he pos - il e ed
(KneadDa a) eads, and he unc ional anno a ion o he gene se is expanded o iden i y i ulence
ac o s (VFDB49) and an ibio ic esis an genes (CARD50).
1.5 EURYALE (MEDUSA)51,52
EURYALE is a Nex low-based eimplemen a ion o he MEDUSA pipeline. I p o ides a
modula and con aine ized wo k low using Nex low DSL2, wi h so wa e execu ion h ough Docke ,
Conda o Singula i y, which ensu es po abili y, ep oducibili y, and scalabili y. The wo k low o his
pipeline s a s wi h ead quali y con ol wi h Fas QC28, imming and me ging using as p53, and
op ional hos decon amina ion wi h Bow ie210; Mul iQC54 p o ides a ull epo con aining
isualiza ions ega ding sequence p ep ocessing. Op ionally, clean sequences can be assembled
using MEGAHIT16 wi h a pos e io axonomic classi ica ion ca ied ou by Kaiju38 o K aken211, while
unc ional anno a ion elies on a DIAMOND55-based alignmen o e e ence da abases (NCBi n by
de aul ). I is wo hy o men ion he lexibili y EURYALE o e s gi en i s cus omizable da abase
selec ion o bo h axonomic and unc ional anno a ion.
1.6 JAMS56
JAMS (Jus a Mic obiology Sys em) is an in eg a ed amewo k o iginally designed o pe o m
he analysis on he NIH’s Biowul sys em. JAMs is di ided in o wo main modules: JAMSα, which
pe o ms single sample analyses, and JAMSβ, which ocuses on c oss-sample compa isons. JAMSα
( he pipeline) in eg a es ools such as Bow ie210 o hos emo al, MEGAHIT16 o SPAdes57 o ead
assembly, K aken211 o axonomic classi ica ion, and P okka58 and In e P oScan59 o gene and
p o ein domain p edic ion, espec i ely; JAMSβ uses R-based packages o isualiza ion and
s a is ical analysis. This wo k low is execu ed wi hin Conda en i onmen s, and i s main ad an age
elies on he ease o es ablish compa isons ac oss samples. Howe e , his pipeline does no suppo
binning ools no genome-quali y, and cu en ly, i exhibi s es ic ed deploymen lexibili y due o
op imiza ion o he NIH’s Biowul sys em, al hough JAMS is open sou ce and can be ins alled on any
UNIX-based machine.
1.7 MAGNETO60
MAGNETO is an au oma ed, modula ized and scalable pipeline w apped wi h Snakemake61
and execu ed wi h Conda. I is ocused on allowing he use he selec ion o di e en assembly and/o
binning s a egies, in ol ing se e al s eps om ead p e-p ocessing un il MAG anno a ion and gene
ca alog gene a ion. The P e-p ocessing module le e ages as p53, Bow ie210 and Fas Q Sc een62,
whils he Assembly mode uses Simka63 and hie a chical agglome a i e clus e ing o clus e he
samples i he use s p e-de ines a co-assembly s a egy; he eads a e assembled using MEGAHIT16.
Fu he mo e, con ig abundances a e compu ed by alignmen agains he aw eads o be bin by
Me aBAT23 a e wa ds. Quali y es ima ion and de eplica ion a e ca ied ou wi h CheckM64 1.0 and
dRep43, espec i ely. To end he wo k low, a gene ca alog is p oduced o bo h he con igs and he
MAGs by unning P odigal39, Linclus 65 and CD-HIT66, and he MAGs a e anno a ed wi h GTDB-Tk225
and eggNOG-mappe 67. As a special ea u e, MAGNETO can p o ide a ead-based axonomy
abundance wi h mOTU68 p o ile . MAGNETO exhibi s all he ad an ages Snakemake w apping, and
execu ed wi h Conda, ep esen s such as mul i-sample handling, scalabili y ac oss di e en
compu ing in as uc u es and checkpoin con ol o wo k low es a ing.
1.8 MAGO69
MAGO is an end- o-end pipeline designed o un o e a single execu ion om a con aine
image (Singula i y o Docke ); a hi d op ion is a ailable as a Vi ual Machine (VM). This con igu a ion
allows MAGO o o e a s eamlined implemen a ion o he en i e me agenomics pipeline, including
e o checking, and compu a ional esou ce dis ibu ion. The ool wo k low ollows he adi ional
design wi h ead quali y con ol ( as p53, Fas QC28), ollowed by he assembly s ep wi h MEGAHIT16,
me aSPAdes35 and/o IBDA-UD70. MAGO pe o ms binning h ough mul iple algo i hms (Me aBAT71,
MaxBin25, CONCOCT4 and BinSani y wi h mul iple con igu a ions). MAG comple eness and
con amina ion o MAGs a e es ima ed wi h CheckM64. To conclude he execu ion, MAGO anno a es
he MAGs wi h P okka58, and pe o ms axonomic classi ica ion and phylogene ic placemen using
GTDB-Tk72. Mo eo e , o expand i s capabili ies, he de elope s included he possibili y o gene a ing
phylogene ic ees h ough ezT ee73, analyzing he pangenome wi h Roa y and measu ing ANI wi h
Fas ANI74 as an app oxima ion o de- eplica e he MAG se .
1.9 me aGEM75
me aGEM ep esen s a adi ional end- o-end pipeline designed o econs uc MAGs om
me agenomics aw eads; howe e , i s main ea u e elies on an in eg a ed module ha p o ides
genome scale me abolic models (GEMS). The wo k low s a s wi h he ead quali y cleaning using
as p53 o a subsequen assembly wi h MEGAHIT16 and a con ig co e age es ima ion wi h BWA76. The
bins a e hen ob ained ia h ee di e en ools (Me aBAT23, MaxBin25 and CONCOCT4) along a
pos e io e ining by he me aWRAP23 e inemen module. As a esul , he bins o MAGs a e used as
inpu o Ca eMe77 (Genome Scale Me abolic Models), and SMETANA78 is called o me abolic
in e ac ion p edic ions and MEMOTE79 is in cha ge o gene a ing quali y epo s. The esul ing GEMs
can hen be used o a ious downs eam analyses, such as p edic ing me abolic in e ac ions wi hin
he communi y, simula ing g ow h unde di e en condi ions, and iden i ying key me abolic pa hways.
The pipeline ends wi h MAG cha ac e iza ion h ough P okka58 and Roa y80 ( unc ional anno a ion and
pangenome analysis), GRiD81 (g ow h a e es ima ion), GTDB-Tk225 ( axonomic anno a ion) and BWA76
(genome abundance). As addi ional ea u es, me aGEM iden i ies euka yo ic MAGs ia EukRep82 and
e alua es con amina ion wi h EukCC83. Also, his pipeline p oduces axonomic abundance p o iles
om he il e ed eads using mOTUS284. Na u ally, his pipeline exhibi s he bene i s Snakemake61
o ches a ion p o ides, as men ioned p e iously.
1.10 Me aGenePipe85
Me aGenePipe is a pipeline de eloped wi h Wo k low De ini ion Language (WDL), sel -
execu ed wi hin a Singula i y con aine , whose p ima y goal is pe o ming a con ig-based unc ional
and axonomic analysis om sho ead sequences. I is composed o 4 subwo k lows, whe e he
ope a ion s a s wi h he quali y con ol wo k low, he subsequen one assembles he eads wi h
MEGAHIT16 o map hem back agains he sho eads wi hin he hi d subwo k low. Meanwhile, he
las subwo k low is in cha ge o gene p edic ion and unc ional anno a ion based on wo main
s a egies: alignmen wi h he Swiss-P o da abase and Hidden Ma ko Models sea ch in KO am
da abase86. Al hough Me aGenePipe does no include binning so wa e o p o ide MAGs as main
ou pu , i s e sa ili y ha allows an analysis adap ed o euka yo ic and i al analyses wi h minimal
modi ica ions, and i s uncommon wo k low manage wi hin he pipelines conside ed in his e iew,
makes Me aGenePipe an in e es ing al e na i e o use s wi h ad anced compu a ional in as uc u es.
Addi ionally, Me aGenePipe is designed o handle a co-assembly s a egy in case he use equi es
his ea u e.
1.12 Me agenome-A las87
Me agenome-A las is an end- o-end, Snakemake61-based and Conda-execu ed pipeline
suppo ing Illumina sho eads and p o iding a modula wo k low. I is di ided in o ou modules,
namely Quali y Con ol, Assembly, Genomic Binning and Anno a ion. The ini ial module emo es hos ,
common con aminan s and PCR duplica es, and i necessa y, ims low-quali y sequences acco ding
o use p e-speci ied pa ame e s. The Assembly module co ec s sequence e o s based on k-me
co e age, me ges pai ed-end sequences, assembles hem using MEGAHIT16 and/o me aSPAdes35
along wi h a con ig-leng h il e ing. The ollowing module uses Me aBAT23, MaxBin25, and op ionally
VAMB88 and SemiBin221 o bin he con igs; CheckM224, BUSCO89 and GUNC90 a e un o measu e
he bin quali y, as well as DASTool7 and dRep43 o bin e inemen and MAG de eplica ion, espec i ely.
Fo he las module, Me agenome-A las axonomically and unc ionally anno a es he MAGs using
GTDB-Tk225 and DRAM91, espec i ely, and i inally p oduces a gene ca alog h ough mapping he
p edic ed coding sequences using eggNOG-mappe 67. Among he main ad an ages o Me agenome-
A las, i is possible o desc ibe he possibili y o unning indi idual modules and i s ene ge ic
suppo ing communi y and de elope s. Mo eo e , he Snakemake w appe allows o lexibili y, mul i-
sample handling, and adap abili y o medium o la ge p ojec s unning on local se e s o High-
Pe o mance Clus e (HPC) en i onmen s.
1.13 Me apho 92
Me apho is a classic me agenomics pipeline aiming a MAG econs uc ion and anno a ion
w apped by Snakemake61 and le e aging Conda as package manage . The pipeline is igge ed by he
use wi h a .cs ile poin ing o he sequence di ec o ies and a .yaml ile wi h he pipeline con igu a ion.
A quali y con ol will be ca ied ou hen wi h Fas QC28 and as p53, wi h a pos e io assembly wi h
MEGAHIT16, con ig e alua ion wi h Me aQUAST93 and mapping agains he inpu sequences using
Minimap294 and Sam ools; he con igs a e binned (VAMB88, Me aBAT23, CONCOCT4) and e ined
(DASTool7). Me apho execu ion inalizes wi h bin anno a ion h ough P odigal, Diamond, and he NCBI
COG da abase. Complemen a y o Snakemake o ches a ion capabili ies, Me apho p o ides a se ies
o plo s depic ing un ime and memo y wi h he goal o iden i ying compu a ional bo lenecks du ing
he analyses.
1.14 Me aWRAP23
Me aWRAP is a popula and cus omizable pipeline buil p ima ily as a command-line
amewo k wi h a ocus on lexibili y and use con ol. Me aWRAP consis s o indi idual modules ha
can be un independen ly o combined in o cus om wo k lows. I s co e unc ionali ies encompasses
ead QC and cleaning (Fas QC28, T im Galo e and BMTagge ), assembly (MEGAHIT16, me aSPAdes35,
BWA76 and Me aQUAST93), and a binning sui e ha inco po a es Me aBAT23, MaxBin25, and
CONCOCT4. Me aWRAP also includes a na i e e inemen module ha p oduces hyb id bin se s o
explo e o e he di e en a ian s o each bin (o iginal and hyb idized bin se s) o de e mine he “bes
bin” acco ding o he use p e-speci ied quali y alues based on comple eness and con amina ion
(CheckM64 1.0). This module is equen ly execu ed in independen me agenomics analysis, and e en
some pipelines desc ibed in his e iew inco po a e i wi hin hei wo k lows. I decided by he use ,
Me aWRAP o e s he possibili y o bin e-assembling guided by hei p e ious e sions, imp o ing he
o e all bin quali y. Fo MAG axonomic and unc ional analysis, Me aWRAP elies on P okka58 and
Taxa o - k95 (combined wi h NCBI31 da abases), and i p o ides isualiza ion modules o summa izing
esul s. Analogous o MAGNETO60, Me aWRAP can p oduce ead-based axonomic p o iles in pa allel.
Al hough Me aWRAP does no in eg a e ull pipeline au oma ion, i s high modula i y and
s aigh o wa d design ha e p omo ed a wide suppo ing communi y. None heless, a he momen o
w i ing his epo , Me aWRAP is no main ained by he de elope s, wi h he subsequen lack o ool
upda es.
None heless, gi en he popula i y o Me aWRAP, a Snakemake61 w appe was de eloped o
au oma e he me agenomics analysis known as SnakeWRAP96. The e o e, SnakeWRAP can ca y ou
he Me aWRAP end- o-end ead p ocessing o gene a e MAGs in a single un, e aining he lexibili y
o Me aWRAP while educing he bu den o manual execu ion and dependency handling. Addi ionally,
SnakeWRAP’s in eg a ed en i onmen managemen ia Conda and suppo o HPC en i onmen s
enables seamless execu ion o mul iple Me aWRAP modules and samples in pa allel, being pa icula ly
use ul o mul i-sample execu ion.
1.15 MOSHPIT97
Acco ding o i s documen a ion, MOSHPIT (MOdula SHo gun me agenome Pipelines wi h
In eg a ed p o enance T acking) is a oolki o plugins o whole me agenome assembly, anno a ion,
and analysis buil on he mic obiome mul i-omics da a science amewo k QIIME 298. MOSHPIT
enables lexible, modula , ully ep oducible wo k lows o ead-based o assembly-based analysis o
me agenome da a. The co e componen s o MOSHPIT include q2-assembly, which p o ides
unc ionali ies o genome assembly and quali y con ol, and q2-anno a e, which suppo s con ig
binning, axonomic classi ica ion, and unc ional anno a ion. Addi ional plugins, such as q2- i omics
and q2-am inde plus, ex end capabili ies o i al sequence de ec ion and an imic obial esis ance
gene anno a ion, espec i ely. In echnical e ms, MOSHPIT mus be un locally o on an HPC
en i onmen wi h he possibili y o execu e he p ocesses in pa allel by he explici decla a ion o
pa i ions, a na i e QIIME2 unc ionali y. Fu he , he en i e QIIME2 ecosys em elies on Conda, and
hence his a sine-qua-non equisi e o pe o m MAG econs uc ion wi h MOSHPIT.
1.16 nIMP399
nIMP3 is a Nex low-based eimplemen a ion o he IMP (In eg a ed Me a-omic Pipeline)
wo k low ha assembles me agenomics (MG) and me a ansc ip omics (MT) da ase s oge he . nIMP3
handles p ep ocessed and con aminan - ee MT and MG eads (Fas QC28, So MeRNA100,
BBTools101), and join ly assembles hem in a hyb id and i e a i e p ocess using MEGAHIT16.
Addi ionally, nIMP3 pe o ms axonomic p o iling wi h mOTUs102 and K aken211, as well as unc ional
p o iling wi h g quan 103. Unlike he o iginal IMP pipeline, nMP3 does no include a binning module,
and hus i canno eco e MAGs. None heless, nIMP3 o e s a ligh e , ep oducible, and in eg a i e
pipeline o mul i-omics me agenome/me a ansc ip ome p ocessing.
1.17 SnakeMAGs104
SnakeMAGs is a simple ye use ul pipeline ha as i s name indica es is con olled by a
Snakemake61 w appe wi h Conda as so wa e adminis a o . I in eg a es basic modules s a ing wi h
quali y con ol wi h Illumina-u ils105 and T immoma ic27, and i equi ed, hos emo al wi h Bow ie210.
A e wa ds, he eads a e assembled h ough MEGAHIT16, he con igs a e binned by Me aBAT23, a
quali y assessmen is ca ied ou wi h CheckM64 1.1 and GUNC90, MAG abundances a e ob ained
using Co e M106, and inally he axonomic classi ica ion is pe o med using GTDB-Tk225. Simila o
he p e ious pipelines go e ned by Snakemake, SnakeMAGs eases au oma ion, ep oducibili y,
scalabili y and wo k low managemen .
1.18 SPIRE107
The SPIRE p ojec employs a Nex low-based pipeline ha has been used o p ocess and
anno a e mo e han 100,000 me agenomes belonging o mo e han 700 s udies. The wo k low
inco po a es ools such as NGLess108 o ead imming and decon amina ion, MEGAHIT16 o

assembly, P odigal39 o gene p edic ion and ba nap109 o ibosomal RNA de ec ion. Mo eo e , con ig
binning is ca ied ou wi h Me aBAT23 wi h a complemen a y genome quali y assessmen using
CheckM224 and GUNC90, and he wo k low ends wi h axonomic classi ica ion (GTDB-Tk225) and
unc ional anno a ion (eggNOG-mappe 67, ab ica e110, RGI50 and Mac el111). Among he ad an ages
SPIRE o e s, he possibili y o pe o m an imic obial esis ance gene p edic ion and he anno a ion o
i ulence ac o s s and ou , as well as i s scalabili y, ep oducibili y ac oss high-pe o mance and cloud
en i onmen s, and s anda dized p ocessing, enabling consis en compa isons ac oss global da ase s.
None heless, a he momen o w i ing his epo , his pipeline is aiming o be execu ed a online
pla o ms like CloWM112 as i is lacking de ined en i onmen s o con aine images, and he inpu da a
should be al eady hos ed a he sequencing a chi es such as ENA, DDBJ o SRA.
1.19 Sunbeam113
Sunbeam is a modula pipeline o ches a ed by Snakemake61 wi h Conda as dependency
manage ; his con igu a ion makes Sunbeam analysis eliable, ep oducible and scalable. The main
ea u e Sunbeam depic s is i s modula ized and ex ensible design ha allows use s o build o he
co e unc ionali y. The execu ion backbone o Sunbeam is ep esen ed by an ini ial quali y con ol ha
encloses adap e imming, hos ead emo al and low-complexi y il e ing (T immoma ic27, Fas QC28,
BWA76 and Komplexi y), ollowed he assembly o eads in o con igs wi h MEGAHIT16 along wi h hei
co esponding anno a ion wi h P odigal39, BLAST37 and Diamond55 (wi h nucleo ide o p o ein
da abases). As complemen a y p ocedu es, Sunbeam maps he eads o e e ence genomes (use
p e-speci ied) and deli e s a axonomic assignmen o he clean eads using K aken114 1.0. As
p e iously s a ed, i s modula iza ion and eady- o-use empla es o c ea e new modules ha e enabled
he de elopmen o addi ional ex ensions o assigning me agenomic eads o a ull bac e ial
phylogeny, single genome assembly, among o he s.
2. Long- ead ocused pipelines
2.1 EasyNanoMe a115
EasyNanoMe a is a specialized pipeline designed o p ocess ONT long eads ei he solely o
in combina ion wi h sho eads (hyb id assembly). This pipeline elies on a dual app oach ha uses
bo h assembly-based and assembly- ee s a egies. Pa icula ly, EasyNanoMe a inco po a es ou
assemble s (me aFlye116, OPERA-MS117, me aSPAdes35, Me aPla anus118), i e binne s (SemiBin221,
Me aBAT23, MaxBin25, CONCOCT4, VAMB88) and a polishing ool (Nex Polish119) o assu e he bes
possible ou come. Addi ionally, once he bins a e ob ained, i pe o ms he common asks such as
unc ional anno a ion wi h P okka58, quali y con ol wi h CheckM224, phylogeny in e ence wi h
PhyloPhlan120 and axonomic classi ica ion wi h GTDB-Tk225. Fo he assembly- ee me hodology,
EasyNanoMe a p o ides a ull epo con aining composi ion, di e si y and co ela ion among he
iden i ied species wi h K aken211 and Cen i uge121. Rega ding ope a ional cha ac e is ics, his
pipeline can be un au oma ically on a Singula i y/App aine image ha s eamlines he se up p ocess
and minimizes dependency issues o expe ienced use s can execu e indi idual modules h ough shell
sc ip s ha ely on Conda en i onmen s.
2.2 Hi-Fi-MAG-Pipeline122
Hi-Fi-MAG is a simple, ye ime-sa ing pipeline de eloped and main ained by Paci ic
Biosciences specially designed o build MAGs om Hi-Fi eads (long PacBio eads). I encompasses
di e en binning ools (Me aBAT23 and SemiBin221) along wi h DASTool7 as e inemen so wa e;
CheckM224 se es a quali y con ol ool, whe e con igs abo e 500 kb a e kep as single bins i hey
show a comple eness abo e 93%, o he wise hey a e sen back o he binning module. This app oach
enhances he eco e y o high-quali y and single-con ig MAGs, ou pe o ming adi ional binning
me hods. A e MAG de- eplica ion, axonomic anno a ion is achie ed wi h GTDB-Tk225, and a
comple e g aphical epo is compiled au oma ically. One impo an ca ea abou his wo k low is
ep esen ed by i s lack o assembly s ep, and hence he use mus p epa e he assembly o he PacBio
sequences be o ehand using ools such as hi iasm123 in i s me a e sion, me aFlye116, OPERA-MS117,
among o he s. Hi-Fi-MAG-Pipeline equi es Conda as so wa e manage , and i is o ches a ed by
Snakemake61.
2.3 Maple 124
Maple is a pipeline speci ically designed o handle PacBio HiFi long eads. Maple wo k low
is o ches a ed by Snakemake along wi h Conda o package managemen , enabling scalable
execu ion on local o clus e sys ems. Rega ding he speci ic ools encompassed by Maple , s a e-o -
he-a assemble s such as me aMDBG125, hi iasm-me a123, me aFlye116 and OPERA-MS117 a e
a ailable, wi h Me aBAT23 as he binning ool. La e on he wo k low, each bin is classi ied
axonomically ia GTDB-Tk225 o K aken211, and genome quali y is e alua ed using CheckM224
s anda ds. Maple aligns eads back o con igs wi h Minimap294 o compu e no el me ics including
he aligned ead pe cen age and aligned base pe cen age, s a i ied ac oss quali y ca ego ies. I is
impo an o men ion ha Maple accep s assemblies and bins as inpu o skip pa o he p ocess,
and i includes a pa allel analysis, whe e assembled e sus unassembled eads a e con as ed by
e alua ing k-me dis ibu ions (KAT126), ead quali y (Fas QC28), and axonomic composi ion (K aken2
+ K ona41). As a esul , by combining classic bin-based me ics wi h ead- o-con ig alignmen
s a is ics, Maple assis s in es ima ing how much o he sequence di e si y emains uncap u ed.
2.4 NanoPhase127
NanoPhase is a pipeline ha enables building high-quali y MAGs om ONT long eads,
op ionally enhanced wi h sho ead-based MAG polishing. The backbone o he pipeline is
ep esen ed by an assembly wi h me aFlye116 ollowed by con ig binning wi h Me aBAT23 and
MaxBin25, and bin e inemen wi h a Me aWRAP23 module. To es ima e abundance and co e age, he
con igs a e mapped agains he eads, and se e al polishing ounds wi h Racon128 and medaka,
comple e he wo k low o gene a e high-accu acy inal bins; I he use decides o include sho eads
in he analysis, hese a e used o polishing wi h Pilon129. Complemen a y, Me aQuas 93 and CheckM64
1.0 a e in cha ge o MAG quali y con ol, IDEEL130 e alua es he ac ion o p edic ed ull-leng h
p o eins in each MAG, ull-leng h p o eins a e de ec ed ia alignmen wi h UniP o KB131, and P okka58
se es as unc ional anno a ion so wa e. Rema kably, NanoPhase allows p ophage and ac i e
p ophage iden i ica ion wi hin he econs uc ed MAGs wi h VIBRANT132 and P opagA E133. Among
pipeline echnical speci ica ions, his pipeline equi es Conda as package manage and i o e s
pa allelized execu ion wi h GNU Pa allel o speed up he analysis.
3. Dual pipelines
3.1 GEN-ERA134
GEN-ERA sui e is a collec ion o Nex low9 pipelines aiming a suppo ing MAG econs uc ion
and anno a ion wi h as many me hodologies as possible s a ing om ei he sho o long eads.
Speci ically, his oolbox coun s wi h mo e ha 10 wo k lows speci ically designed o asks anging
om assembly and binning, quali y assessmen and decon amina ion, o hologous in e ence and
maximum likelihood phylogenomic analyses, SSU RNA phylogeny (cons ained by ibosomal
phylogenomic), A e age Nucleo ide Iden i y (ANI) clus e ing, axonomic iden i ica ion and me abolic
modelling. Mo eo e , GEN-ERA inco po a es speci ic ools designed o handle euka yo ic assembly
anno a ion such as BRAKER2135 and AMAW136. Thus, GEN-ERA sui s almos all equi emen s any use
migh demand gi en he a ie y o goals ha can be achie ed wi hin a single so wa e sui e. F om a
echnical poin o iew, ope a ional GEN-ERA ea u es, Nex low-managed and Singula i y-execu ed,
ensu es po abili y and ep oducibili y ac oss en i onmen s.
3.2 Me agenomics-Toolki 137
Me agenomics-Toolki is a wo k low designed o inc ease scalabili y o ask execu ion,
enabling op imal esou ce alloca ion om i s machine lea ning-op imized assembly s ep. This
op imized assembly ailo s he peak RAM alue eques ed by a me agenome assemble o ma ch
ac ual equi emen s, he eby minimizing he dependency on dedica ed high-memo y ha dwa e.
Me agenomics-Toolki is w apped by Nex low9 and powe ed wi h Docke con aine iza ion
echnology, and i can ake ei he sho o Ox o d Nanopo e (ONT) long eads as inpu . As a esul , his
pipeline is highly scalable and adap able ac oss compu a ional in as uc u es wi h a backbone
wo k low ha elies on he adi ional MAG-aimed s eps such as quali y con ol, assembly, binning,
and anno a ion, plus an agg ega ion module ha cap u es he ou pu om each sample o “polish”
he inal MAGs. Rega ding special ea u es o e ed by Me agenomics-Toolki , i o e s plasmid
iden i ica ion based on a ious ools, he eco e y o unassembled mic obial communi y membe s,
and he disco e y o mic obial in e dependencies h ough a combina ion o de eplica ion, co-
occu ence, and genome-scale me abolic modeling.
3.3 me aWGS138
me aWGS is one o he mos ecen ly eleased pipelines whose main di e en ial is ela ed wi h
he possibili y o assemble ei he sho eads o long sequences (PacBio). This Nex low9 pipeline is
buil o Singula i y wi h consequen bene i s his kind o se up b ings as discussed p e iously. I
inco po a es a wide a ie y o ools as i mus ensu e a p ope wo k low o bo h ypes o sequencing
echnologies in a adi ional end- o-end amewo k di ided in o 8 s eps. The i s s ep aims a cleaning
and pe o ming quali y con ol wi h p ope ools acco ding o he inpu , while he second s ep allows
he assembly o he sequences using ei he me aSPAdes35/MEGAHIT16 o sho sequences and
hi iasm123/me aFlye116 o PacBio eads. Following wi h he p ocess, his pipeline il e s he con igs
and pe o ms s uc u al anno a ion du ing s eps 3 and 4, espec i ely; s ep 5 is designed o es ima e
con ig abundance by mapping hem agains he eads. A e wa ds, a comple e subwo k low o
unc ional anno a ion is unde gone wi h eggNOG-mappe 67 a i s co e (s ep 6), and con ig axonomic
a ilia ion is achie ed h ough home-made sc ip s (s ep 7) o conclude wi h s ep 8, whe e he con igs
a e binned wi h MaxBin25, Me aBAT23 and CONCOCT4. Fu he mo e, me aWGS u ilizes Bine e139, a
s a e-o - he-a binning e inemen ool designed o cons uc high-quali y MAGs om he ou pu o
mul iple binning ools. As a special ema k, me aWGS pe o ms ead axonomic p o iling ia Kaiju, as
well as con ig anno a ion ha includes an in-house algo i hm and mapping agains he eads.
3.4 MG-TK140
MG-TK (Me agenomic Toolki ) pe o ms ead assembly (SPAdes57, MEGAHIT16, Flye141,
me aMDBG125) and binning (Me aBAT23, SemiBin221, Me aDecode 142), gene p edic ion, and
clus e ing in o non edundan gene ca alogs, ollowed by abundance es ima ion and unc ional
anno a ion. I is s uc u ed a ound h ee main phases: p ocessing aw sequences, building a gene
ca alog, and econs uc ing species om MAGs wi h downs eam phylogene ic analyses. I p oduces
a wide ange o ou pu s, including assemblies, MAGs, gene p edic ions, SNP calls and mapping
ou pu s. A special ema k MG-TK exhibi s is i s abili y o gene a e de ailed abundance ma ices o
bo h axonomic and unc ional ea u es, wi h hie a chical summa ies a ailable a mul iple le els. The
axonomic p o iles a e epo ed using GTDB143 lineages, while unc ional anno a ions a e p o ided o
majo da abases such as KEGG144, SEED145, CAZy146, eggNOG45, and TCDB147. MG-TK also
es ima es comple eness o unc ional modules, such as KEGG pa hways, and links genes o mul iple
anno a ions o deepe explo a ion by he use . Beyond gene ca alogs, MG-TK in eg a es MAG/MGS
(Me agenomics Species) in o ma ion, associa ing MAGs wi h hei me agenomic species and
p o iding de ailed gene con en , including ep esen a i e MAGs o each species. Addi ionally, MG-
TK can p o ide assembly-independen p o iles ia a wide a ie y o ools including iboFinde 148,
Me aPhlAn47 and mOTUs102.
3.5 VEBA149
VEBA (Vi al Euka yo ic Bac e ial A chaeal) is a Conda-execu ed pipeline designed ha enables
he eco e y and classi ica ion o genomes om all domains o li e including a chaeas, p oka yo es,
mic oeuka yo es, and i uses. I s a s wi h a common sho ead-p ep ocessing and assembly om
which he p ocess is bi u ca ed o p oka yo ic and i al binning; unbinned con igs om he i al
module a e einco po a ed in o he p oka yo ic con ig se . Residual con igs om he p oka yo ic
module a e hen conside ed o euka yo ic MAG gene a ion o p oceed wi h he anno a ion and
classi ica ion co e ing he genomes ob ained in each module. Hence, se e al da abases a e
conside ed a his s ep such as UniRe 50/90150, MIBiG151, VFDB49, CAZy146, KO amKOALA86, P am152,
NCBI am-AMR153 and An iFam154. Also, a join phylogeny is ob ained based on MAG-gene models
and lineage ma ke de ec ion. An in e es ing app oach VEBA ollows is ep esen ed by he module
co e age.py ha collec s all he unbinned con igs, om i al, euka yo ic and p oka yo ic s eps, o
pu sue a pseudo-coassembly, whe e i e a i ely he e e ence as a (buil om he con igs) and he
so ed BAM iles used as a inal pass h ough p oka yo ic and euka yo ic binning modules. This
pseudo-coassembly app oach is op ional, being easily enabled du ing he wo k low execu ion; he
pipeline documen a ion widely discusses when his ype o assembly should be used in speci ic cases.
No ably, VEBA au oma es he de ec ion o candida e phyla adia ion (CPR) bac e ia and in eg a es a
consensus mic oeuka yo ic da abase o op imize gene modeling and axonomic classi ica ion.
4. Hyb id pipelines
4.1 A ia y155
A ia y is a modula , Snakemake61-based pipeline, wi h Conda as package manage , designed
o single o hyb id me agenomic assembly and MAG eco e y, suppo ing bo h sho and long- ead
inpu sequences. The wo k low is dis ibu ed in 8 modules ollowing a adi ional wo k low s a ing wi h
quali y and di e si y assessmen o he eads, ollowed by a disc imina ed assembly acco ding o he
ype o inpu , MEGAHIT16 o me aSPAdes35 o sho eads only o me aFlye116 in case o long eads
solely. Fo hyb id assembly he p ocess is di ided in o ou s ages: polishing wi h Racon128 and
Pilon129, me ics-based il e ing, assembly and disca d o low-quali y bins and e-assembly wi h
Unicycle 156. The pipeline p oceeds wi h a subsequen assembly e alua ion in e ms o agmen a ion,
misassembly de ec ion and di e si y quan i ica ion, and a complemen a y module mo es o wa d wi h
a ead mapping o he assembly and abundance s a is ics calcula ion. To con inue wi h he wo k low,
he con igs a e binned using up o 6 ools (Me aBAT23, Rosella157, Me aBAT171, VAMB88, MaxBin25
and CONCOCT4) and e ined a e wa ds wi h 5- ime loop ha includes CheckM224, Rosella Re ine and
DASTool7. The pipeline ends wi h MAG eco e y assessmen ia Co e M106, CheckM2 and SingleM
o p oceed wi h MAG anno a ion h ough GTDB-Tk225, P odigal39 and eggNOG45. Va ian calling, ANI
analysis and geno ype eco e y wi h Lo ikee 158 a e in e es ing a ibu es o e ed by A ia y as a
complemen o he adi ional genomic ea u e de ec ion. A ia y’s design p esen s a se ies o
ad an ages ha include he possibili y o unning modules, mul i-sample handling and scalabili y
ac oss di e en compu a ional in as uc u es.
4.2 MUFFIN159
MUFFIN is a ep oducible pipeline buil wi h Nex low9 designed o hyb id assembly by
in eg a ing sho - ead (Illumina) and long- ead (nanopo e) sequencing da a. MUFFIN begins i s
wo k low wi h a quali y con ol o he eads ( as p53 and Fil long) o p og ess h ough hyb id assembly
(me aSPAdes35 o me aFlye116 wi h polishing) and di e en ial binning (CONCOCT4, Me aBAT23, and
MaxBin25). A e bin e ining wi h he Me aWRAP23 e inemen module, a hyb id eassembly is pu sued
wi h Unicycle 156. The pipeline ends wi h bin classi ica ion h ough CheckM64 1.1 and sou mash13
(combined wi h GTDB143), and wi h bin anno a ion wi h eggNOG45 and a KEGG144 pa se , p o iding
high-quali y, anno a ed MAGs and insigh s in o he me abolic po en ial o he mic obial communi y.
Op ionally, he use can p o ide me a ansc ip omics da a o pe o m a de no o ansc ip assembly
(T ini y160), quan i ica ion (Salmon161) and anno a ion (eggNOG). Addi ionally, gi en i s modula i y
design, he wo k low can s a as well wi h use -p o ided bins, di e en ial eads o only RNA-seq da a.
MUFFIN can be execu ed wi h ei he Conda o Docke , and i s na i e Nex low ea u es con e o i he
possibili y o es a he pipeline in case o ailing, un on di e en compu ing in as uc u es, mul i-
sample handling, among o he s.
4.3 n -co e/mag162
n -co e/mag is a Nex low9 pipeline de eloped ollowing he n -co e guidelines ha ensu es
obus ness and ep oducibili y. I suppo s bo h sho - ead and long- ead sequences, as well as hyb id
da ase s, and i le e ages a modula design, con aine iza ion (Docke , Singula i y, among o he s) and
95. D öge, J., G ego , I. & McHa dy, A. C. Taxa o - k:
p ecise axonomic assignmen o me agenomes by
as app oxima ion o e olu iona y neighbo hoods.
Bioin o ma ics 31, 817–824 (2015).
96. K apohl, J. & Picke , B. E. SnakeWRAP: a
Snakemake wo k low o acili a e au oma ed
p ocessing o me agenomic da a h ough he
me aWRAP pipeline. F1000Resea ch 11, (2022).
97. Ziemski, M. e al. MOSHPIT: accessible, ep oducible
me agenome da a science on he QIIME 2
amewo k. P ep in a
h ps://doi.o g/10.1101/2025.01.27.635007 (2025).
98. Bolyen, E. e al. Rep oducible, in e ac i e, scalable
and ex ensible mic obiome da a science using QIIME
2. Na . Bio echnol. 37, 852–857 (2019).
99. Na ayanasamy, S. e al. IMP: a pipeline
o ep oducible e e ence-independen in eg a ed
me agenomic and me a ansc ip omic analyses.
Genome Biol. 17, 260 (2016).
100. Kopylo a, E., Noé, L. & Touze , H. So MeRNA: as
and accu a e il e ing o ibosomal RNAs in
me a ansc ip omic da a. Bioin o ma ics 28, 3211–
3217 (2012).
101. Bushnell, B. BBMap: A Fas , Accu a e, Splice-Awa e
Aligne . LBL Publica ions, (2014).
102. Sunagawa, S. e al. Me agenomic species p o iling
using uni e sal phylogene ic ma ke genes. Na .
Me hods 10, 1196–1199 (2013).
103. Schudoma, C. Sou ce code o : g _quan i ie .
h ps://gi hub.com/cschu/g _quan i ie (2023).
104. Tad en , N. e al. SnakeMAGs: a simple, e icien ,
lexible and scalable wo k low o econs uc
p oka yo ic genomes om me agenomes.
F1000Resea ch 11, 1522 (2023).
105. E en, A. M., Vineis, J. H., Mo ison, H. G. & Sogin, M.
L. A Fil e ing Me hod o Gene a e High Quali y Sho
Reads Using Illumina Pai ed-End Technology. PLOS
ONE 8, e66643 (2013).
106. A oney, S. T. N. e al. Co e M: ead alignmen
s a is ics o me agenomics. Bioin o ma ics 41,
b a 147 (2025).
107. Schmid , T. S. B. e al. SPIRE: a
Sea chable, Plane a y-scale mIc obiome REsou ce.
Nucleic Acids Res. 52, D777–D783 (2024).
108. Coelho, L. P. e al. NG-me a-p o ile : as p ocessing
o me agenomes using NGLess, a domain-speci ic
language. Mic obiome 7, 84 (2019).
109. Seemann, T. Sou ce code o : Ba nap-Bac e ial
ibosomal RNA p edic o .
h ps://gi hub.com/ seemann/sho ill (2018).
110. Seemann, T. Sou ce code o : ABRica e-Mass
sc eening o con igs o an imic obial and i ulence
genes. h ps://gi hub.com/ seemann/ab ica e (2020).
111. San os-Júnio , C. D., Pan, S., Zhao, X.-M. & Coelho,
L. P. Mac el: an imic obial pep ide sc eening in
genomes and me agenomes. Pee J 8, e10555 (2020).
112. Göbel, D., S oye, J., Sczy ba, A., & Becks e e, M.
The Cloud-based Wo k low Manage (CloWM) - An
in eg a ed pla o m o highly scalable wo k low
execu ion. Ge man Con e ence on Bioin o ma ics
2024 (GCB), Biele eld. Zenodo.
h ps://doi.o g/10.5281/zenodo.14039069 (2024).
113. Cla ke, E. L. e al. Sunbeam: An ex ensible pipeline
o analyzing me agenomic sequencing expe imen s.
Mic obiome 7, 1–13 (2019).
114. Wood, D. E. & Salzbe g, S. L. K aken: ul a as
me agenomic sequence classi ica ion using exac
alignmen s. Genome Biol. 15, R46 (2014).
115. Peng, K. e al. Benchma king o analysis ools and
pipeline de elopmen o nanopo e long- ead
me agenomics. Sci. Bull. 70, 1591–1595 (2025).
116. Kolmogo o , M. e al. me aFlye: scalable long- ead
me agenome assembly using epea g aphs. Na .
Me hods 17, 1103–1110 (2020).
117. Be and, D. e al. Hyb id me agenomic assembly
enables high- esolu ion analysis o esis ance
de e minan s and mobile elemen s in human
mic obiomes. Na . Bio echnol. 37, 937–944 (2019).
118. Kaji ani, R. e al. Me aPla anus: a me agenome
assemble ha combines long- ange sequence links
and species-speci ic ea u es. Nucleic Acids Res. 49,
e130 (2021).
119. Hu, J., Fan, J., Sun, Z. & Liu, S. Nex Polish: a as and
e icien genome polishing ool o long- ead
assembly. Bioin o ma ics 36, 2253–2255 (2020).
120. Asnica , F. e al. P ecise phylogene ic analysis o
mic obial isola es and genomes om me agenomes
using PhyloPhlAn 3.0. Na . Commun. 11, 1–10 (2020).
121. Kim, D., Song, L., B ei wiese , F. P. & Salzbe g, S. L.
Cen i uge: apid and sensi i e classi ica ion o
me agenomic sequences. Genome Res. 26, 1721–
1729 (2016).
122. Po ik, D. M. e al. Highly accu a e me agenome-
assembled genomes om human gu mic obio a
using long- ead assembly, binning, and consolida ion
me hods. P ep in a
h ps://doi.o g/10.1101/2024.05.10.593587 (2024).
123. Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. &
Li, H. Haplo ype- esol ed de no o assembly using
phased assembly g aphs wi h hi iasm. Na . Me hods
18, 170–175 (2021).
124. Mau ice, N., Lemai e, C., Vicedomini, R. & F ioux, C.
Maple : a pipeline o assessing assembly quali y in
axonomically ich me agenomes sequenced wi h
HiFi eads. Bioin o ma ics 41, b a 334 (2025).
125. Benoi , G. e al. High-quali y me agenome assembly
om long accu a e eads wi h me aMDBG. Na .
Bio echnol. 42, 1378–1383 (2024).
126. Mapleson, D., Ga cia Accinelli, G., Ke lebo ough, G.,
W igh , J. & Cla ijo, B. J. KAT: a K-me analysis
oolki o quali y con ol NGS da ase s and genome
assemblies. Bioin o ma ics 33, 574–576 (2017).
127. Liu, L., Yang, Y., Deng, Y. & Zhang, T. Nanopo e long-
ead-only me agenomics enables comple e and high-
quali y genome econs uc ion om mock and
complex me agenomes. Mic obiome 10, 209 (2022).
128. Vase , R., So ic, I., Naga ajan, N. & Sikic, M. Fas and
accu a e de no o genome assembly om long
unco ec ed eads. Genome Res. 27, 737-746 (2017).
129. Walke , B. J. e al. Pilon: An In eg a ed Tool o
Comp ehensi e Mic obial Va ian De ec ion and
Genome Assembly Imp o emen . PLOS ONE 9,
e112963 (2014).
130. S ewa , R. D. e al. Compendium o 4,941 umen
me agenome-assembled genomes o umen
mic obiome biology and enzyme disco e y. Na .
Bio echnol. 37, 953–961 (2019).
131. The UniP o Conso ium. UniP o : he Uni e sal
P o ein Knowledgebase in 2025. Nucleic Acids Res.
53, D609–D617 (2025).
132. Kie , K., Zhou, Z. & Anan ha aman, K. VIBRANT:
au oma ed eco e y, anno a ion and cu a ion o
mic obial i uses, and e alua ion o i al communi y
unc ion om genomic sequences. Mic obiome 8, 90
(2020).
133. Kie , K. & Anan ha aman, K. Deciphe ing Ac i e
P ophages om Me agenomes. mSys ems 7,
e00084-22 (2022).
134. Co ne , L. e al. The GEN-ERA oolbox: uni ied and
ep oducible wo k lows o esea ch in mic obial
genomics. GigaScience 12, 1–10 (2022).
135. B ůna, T., Ho , K. J., Lomsadze, A., S anke, M. &
Bo odo sky, M. BRAKER2: au oma ic euka yo ic
genome anno a ion wi h GeneMa k-EP+ and

AUGUSTUS suppo ed by a p o ein da abase. NAR
Genomics Bioin o ma. 3, lqaa108 (2021).
136. Meunie , L., Bau ain, D. & Co ne , L. AMAW:
au oma ed gene anno a ion o non-model euka yo ic
genomes. F1000Resea ch 12, 186 (2023).
137. Belmann, P. e al. Me agenomics-Toolki : he lexible
and e icien cloud-based me agenomics wo k low
ea u ing machine lea ning-enabled esou ce
alloca ion. NAR Genomics Bioin o ma. 7, lqa 093
(2025).
138. Mainguy, J. e al. me agWGS, a comp ehensi e
wo k low o analyze me agenomic da a using Illumina
o PacBio HiFi eads. P ep in a
h ps://doi.o g/10.1101/2024.09.13.612854 (2024).
139. Mainguy, J. & Hoede, C. Bine e: a as and accu a e
bin e inemen ool o cons uc high quali y
Me agenome Assembled Genomes. J. Open Sou ce
So w. 9, 6782 (2024).
140. Hildeb and, F. e al. Dispe sal s a egies shape
pe sis ence and e olu ion o human gu bac e ia. Cell
Hos Mic obe 29, 1167-1176.e9 (2021).
141. Kolmogo o , M., Yuan, J., Lin, Y. & Pe zne , P. A.
Assembly o long, e o -p one eads using epea
g aphs. Na . Bio echnol. 37, 540–546 (2019).
142. Liu, C.-C. e al. Me aDecode : a no el me hod o
clus e ing me agenomic con igs. Mic obiome 10, 46
(2022).
143. Pa ks, D. H. e al. GTDB: an ongoing census o
bac e ial and a chaeal di e si y h ough a
phylogene ically consis en , ank no malized and
comple e genome-based axonomy. Nucleic Acids
Res. 50, D785–D794 (2022).
144. Kanehisa, M., Sa o, Y., Kawashima, M., Fu umichi, M.
& Tanabe, M. KEGG as a e e ence esou ce o gene
and p o ein anno a ion. Nucleic Acids Res. 44, D457–
D462 (2016).
145. O e beek, R. e al. The SEED and he Rapid
Anno a ion o mic obial genomes using Subsys ems
Technology (RAST). Nucleic Acids Res. 42, D206
(2014).
146. D ula, E. e al. The ca bohyd a e-ac i e enzyme
da abase: unc ions and li e a u e. Nucleic Acids Res.
50, D571–D577 (2022).
147. Saie , M. H., J e al. The T anspo e Classi ica ion
Da abase (TCDB): 2021 upda e. Nucleic Acids Res.
49, D461–D467 (2021).
148. Cokelae , T., Des illechab ol, D., Legend e, R. &
Ca don, M. ‘Sequana’: a Se o Snakemake NGS
pipelines. J. Open Sou ce So w. 2, 352 (2017).
149. Espinoza, J. L. e al. Un eiling he mic obial ealm
wi h VEBA 2.0: a modula bioin o ma ics sui e o
end- o-end genome- esol ed p oka yo ic,
(mic o)euka yo ic and i al mul i-omics om ei he
sho - o long- ead sequencing. Nucleic Acids Res.
52, e63 (2024).
150. Suzek, B. E. e al. UniRe clus e s: a comp ehensi e
and scalable al e na i e o imp o ing sequence
simila i y sea ches. Bioin o ma ics 31, 926–932
(2015).
151. Zdouc, M. M. e al. MIBiG 4.0: ad ancing biosyn he ic
gene clus e cu a ion h ough global collabo a ion.
Nucleic Acids Res. 53, D678–D690 (2025).
152. Mis y, J. e al. P am: The p o ein amilies da abase in
2021. Nucleic Acids Res. 49, D412–D419 (2021).
153. Feldga den, M. e al. AMRFinde Plus and he
Re e ence Gene Ca alog acili a e examina ion o he
genomic links among an imic obial esis ance, s ess
esponse, and i ulence. Sci. Rep. 11, 12728 (2021).
154. Ebe ha d , R. Y. e al. An iFam: a ool o help iden i y
spu ious ORFs in p o ein anno a ion. Da abase 2012,
bas003 (2012).
155. Newell, R. J. P., A oney, S. T. N., Zaugg, J., S e nes,
P., Tyson, G. W., & Woodc o , B. J. A ia y: Hyb id
assembly and genome eco e y om me agenomes
wi h A ia y ( 0.12.0). Zenodo.
h ps://doi.o g/10.5281/zenodo.15208119 (2025).
156. Wick, R. R., Judd, L. M., Go ie, C. L. & Hol , K. E.
Unicycle : Resol ing bac e ial genome assemblies
om sho and long sequencing eads. PLOS
Compu . Biol. 13, e1005595 (2017).
157. Newell, R. J. P., Tyson, G. W., & Woodc o , B. J. .
Rosella: Me agenomic binning using UMAP and
HDBSCAN ( 0.5.3). Zenodo.
h ps://doi.o g/10.5281/zenodo.10460259 (2024).
158. Newell, R. J. P., McMas e , E. S., C aig, P., Boden,
M., Tyson, G. W., & Woodc o , B. J. Lo ikee : s ain-
esol ed me agenome analysis using local
eassembly ( 0.8.2). Zenodo.
h ps://doi.o g/10.5281/zenodo.10275469 (2023).
159. Damme, R. an e al. Me agenomics wo k low o
hyb id assembly, di e en ial co e age binning,
me a ansc ip omics and pa hway analysis
(MUFFIN). PLOS Compu . Biol. 17, 1–13 (2021).
160. G abhe , M. G. e al. Full-leng h ansc ip ome
assembly om RNA-Seq da a wi hou a e e ence
genome. Na . Bio echnol. 29, 644–652 (2011).
161. Pa o, R., Duggal, G., Lo e, M. I., I iza y, R. A. &
Kings o d, C. Salmon p o ides as and bias-awa e
quan i ica ion o ansc ip exp ession. Na . Me hods
14, 417–419 (2017).
162. K akau, S., S aub, D., Gou lé, H., Gabe ne , G. &
Nahnsen, S. n -co e/mag: a bes -p ac ice pipeline o
me agenome hyb id assembly and binning. NAR
Genomics Bioin o ma. 4, (2022).
163. Wick, R. R., Judd, L. M., Go ie, C. L. & Hol , K. E.
Comple ing bac e ial genome assemblies wi h
mul iplex MinION sequencing. Mic ob. Genomics 3,
e000132 (2017).
164. Ha eman, N. J. e al. E alua ing he le uce
me a ansc ip ome wi h MinION sequencing o
u u e space ligh ood p oduc ion applica ions. Npj
Mic og a i y 7, 22 (2021).
165. De Cos e , W. & Rademake s, R. NanoPack2:
popula ion-scale e alua ion o long- ead sequencing
da a. Bioin o ma ics 39, b ad311 (2023).
166. Schube , M., Lindg een, S. & O lando, L.
Adap e Remo al 2: apid adap e imming,
iden i ica ion, and ead me ging. BMC Res. No es 9,
88 (2016).
167. An ipo , D., Ko obeyniko , A., McLean, J. S. &
Pe zne , P. A. hyb idSPAdes: an algo i hm o hyb id
assembly o sho and long eads. Bioin o ma ics 32,
1009–1015 (2016).
168. on Meijen eld , F. A. B., A khipo a, K., Cambuy, D.
D., Cou inho, F. H. & Du ilh, B. E. Robus axonomic
classi ica ion o uncha ed mic obial sequences and
bins wi h CAT and BAT. Genome Biol. 20, 217 (2019).
169. Le y Ka in, E., Mi di a, M. & Söding, J. Me aEuk—
sensi i e, high- h oughpu gene disco e y, and
anno a ion o la ge-scale euka yo ic me agenomics.
Mic obiome 8, 48 (2020).
170. Bo y, M., Hübne , A., Roh lach, A. B. & Wa inne , C.
PyDamage: au oma ed ancien damage iden i ica ion
and es ima ion o con igs in ancien DNA de no o
assembly. Pee J 9, e11845 (2021).
171. Ka licki, M., An onowicz, S. & Ka nkowska, A. Tia a:
deep lea ning-based classi ica ion sys em o
euka yo ic sequences. Bioin o ma ics 38, 344–350
(2022).
172. Cama go, A. P. e al. Iden i ica ion o mobile gene ic
elemen s wi h geNomad. Na . Bio echnol. 42, 1303–
1312 (2024).
173. Almeida, F. M. de, Campos, T. A. de & Pappas, G. J.
Scalable and e sa ile con aine -based pipelines o
de no o genome assembly and bac e ial anno a ion.
F1000Resea ch 12, 1205 (2023).
174. Ko en, S. e al. Canu: scalable and accu a e long-
ead assembly ia adap i e k-me weigh ing and
epea sepa a ion. Genome Res. 27, 722–736 (2017).
175. Schwenge s, O. e al. Bak a: Rapid and s anda dized
anno a ion o bac e ial genomes ia alignmen - ee
sequence iden i ica ion. Mic ob. Genomics 7, 000685
(2021).
176. Jolley, K. A. & Maiden, M. C. BIGSdb: Scalable
analysis o bac e ial genome a ia ion a he
popula ion le el. BMC Bioin o ma ics 11, 595 (2010).
177. G aham, E. D., Heidelbe g, J. F. & Tully, B. J.
Po en ial o p ima y p oduc i i y in a globally-
dis ibu ed bac e ial pho o oph. ISME J. 12, 1861–
1866 (2018).
178. Blin, K. e al. an iSMASH 7.0: new and imp o ed
p edic ions o de ec ion, egula ion, chemical
s uc u es and isualisa ion. Nucleic Acids Res. 51,
W46–W50 (2023).
179. Hu, K., Huang, N., Zou, Y., Liao, X. & Wang, J.
Mul iNanopolish: e ined g ouping me hod o
educing edundan calcula ions in Nanopolish.
Bioin o ma ics 37, 2757–2760 (2021).
180. Tamames, J. & Puen e-Sánchez, F. SqueezeMe a, a
highly po able, ully au oma ic me agenomic
analysis pipeline. F on . Mic obiol. 10, 3349 (2019).
181. Bushmano a, E., An ipo , D., Lapidus, A. & P jibelski,
A. D. naSPAdes: a de no o ansc ip ome assemble
and i s applica ion o RNA-Seq da a. GigaScience 8,
giz100 (2019).
182. Caspi, R. e al. The Me aCyc da abase o me abolic
pa hways and enzymes - a 2019 upda e. Nucleic
Acids Res. 48, D445–D453 (2020).
183. Olson, R. D. e al. In oducing he Bac e ial and Vi al
Bioin o ma ics Resou ce Cen e (BV-BRC): a
esou ce combining PATRIC, IRD and ViPR. Nucleic
Acids Res. 51, D678–D689 (2023).
184. Gillespie, J. J. e al. PATRIC: he Comp ehensi e
Bac e ial Bioin o ma ics Resou ce wi h a Focus on
Human Pa hogenic Species. In ec . Immun. 79,
4286–4298 (2011).
185. B e in, T. e al. RAST k: A modula and ex ensible
implemen a ion o he RAST algo i hm o building
cus om anno a ion pipelines and anno a ing ba ches
o genomes. Sci. Rep. 5, 8365 (2015).
186. Wang, S., Sunda am, J. P. & Spi o, D. VIGOR, an
anno a ion p og am o small i al genomes. BMC
Bioin o ma ics 11, 451 (2010).
187. The Galaxy Communi y e al. The Galaxy pla o m o
accessible, ep oducible and collabo a i e
biomedical analyses: 2022 upda e. Nucleic Acids
Res. 50, W345–W351 (2022).
188. Kalan a , K. L. e al. IDseq—An open sou ce cloud-
based pipeline and analysis se ice o me agenomic
pa hogen de ec ion and moni o ing. GigaScience 9,
giaa111 (2020).
189. Chen, I.-M. A. e al. The IMG/M da a managemen
and analysis sys em .7: con en upda es and new
ea u es. Nucleic Acids Res. 51, D723–D732 (2023).
190. Kanehisa, M., Fu umichi, M., Sa o, Y., Ishigu o-
Wa anabe, M. & Tanabe, M. KEGG: in eg a ing
i uses and cellula o ganisms. Nucleic Acids Res.
49, D545–D551 (2021).
191. Galpe in, M. Y. e al. COG da abase upda e 2024.
Nucleic Acids Res. 53, D356–D363 (2025).
192. Ha , D. H. e al. TIGRFAMs and Genome P ope ies
in 2013. Nucleic Acids Res. 41, D387–D395 (2013).
193. A kin, A. P. e al. KBase: The Uni ed S a es
Depa men o Ene gy Sys ems Biology
Knowledgebase. Na . Bio echnol. 36, 566–569
(2018).
194. Sea e , S. M. D. e al. The ModelSEED Biochemis y
Da abase o he in eg a ion o me abolic anno a ions
and he econs uc ion, compa ison and analysis o
me abolic models o plan s, ungi and mic obes.
Nucleic Acids Res. 49, D575–D588 (2021).
195. Richa dson, L. e al. MGni y: he mic obiome
sequence da a analysis esou ce in 2023. Nucleic
Acids Res. 51, D753–D759 (2023).
196. Finn, R. D., Clemen s, J. & Eddy, S. R. HMMER web
se e : in e ac i e sequence simila i y sea ching.
Nucleic Acids Res. 39, W29–W37 (2011).
197. Webe , N. e al. Nephele: a cloud pla o m o
simpli ied, s anda dized and ep oducible
mic obiome da a analysis. Bioin o ma ics 34, 1411–
1413 (2018).
198. S ande en, F. J., Dahlquis -Axe, G., Spelle , C. F.,
Meehan, C. J. & Tedde , A. An e icien pipeline o
c ea ing me agenomic-assembled genomes om
ancien o al mic obiomes. P ep in a
h ps://doi.o g/10.1101/2024.09.18.613623 (2024).
199. Jónsson, H., Ginolhac, A., Schube , M., Johnson, P.
L. F. & O lando, L. mapDamage2.0: as app oxima e
Bayesian es ima es o ancien DNA damage
pa ame e s. Bioin o ma ics 29, 1682–1684 (2013).
200. Zhao, D. e al. Euk inde : a pipeline o e ie e
mic obial euka yo e genome sequences om
me agenomic da a. mBio 16, e00699-25 (2025).
201. Van Nguyen, H. & La enie , D. PLAST: pa allel local
alignmen sea ch ool o da abase compa ison. BMC
Bioin o ma ics 10, 329 (2009).
202. Lin, H.-H. & Liao, Y.-C. Accu a e binning o
me agenomic con igs ia au oma ed clus e ing
sequences using in o ma ion o genomic signa u es
and ma ke genes. Sci. Rep. 6, 24175 (2016).