Sequencing Vertebrate Diversity

50 species from the VGP assembly catalog

Abramis brama Abramis brama
Acanthisitta chloris Acanthisitta chloris
Acanthopagrus latus Acanthopagrus latus
Acomys russatus Acomys russatus
Alca torda Alca torda
Amblyraja radiata Amblyraja radiata
Ammospiza caudacuta Ammospiza caudacuta
Anguilla anguilla Anguilla anguilla
Antennarius maculatus Antennarius maculatus
Apus apus Apus apus
Argentina silus Argentina silus
Ascidiella aspersa Ascidiella aspersa
Aythya ferina Aythya ferina
Barbus barbus Barbus barbus
Borostomias antarcticus Borostomias antarcticus
Bucephala clangula Bucephala clangula
Callithrix jacchus Callithrix jacchus
Candoia aspera Candoia aspera
Carcharodon carcharias Carcharodon carcharias
Cervus elaphus Cervus elaphus
Chelmon rostratus Chelmon rostratus
Chiroxiphia lanceolata Chiroxiphia lanceolata
Ciconia maguari Ciconia maguari
Cnephaeus nilssonii Cnephaeus nilssonii
Coregonus lavaretus Coregonus lavaretus
Cottoperca gobio Cottoperca gobio
Cynocephalus volans Cynocephalus volans
Dendropsophus ebraccatus Dendropsophus ebraccatus
Diceros bicornis Diceros bicornis
Dryobates pubescens Dryobates pubescens
Electrona antarctica Electrona antarctica
Elgaria multicarinata Elgaria multicarinata
Equus caballus Equus caballus
Erpetoichthys calabaricus Erpetoichthys calabaricus
Eubalaena glacialis Eubalaena glacialis
Falco biarmicus Falco biarmicus
Gadus morhua Gadus morhua
Gastrophryne carolinensis Gastrophryne carolinensis
Geotrypetes seraphini Geotrypetes seraphini
Gobius niger Gobius niger
Grus americana Grus americana
Harpia harpyja Harpia harpyja
Heptranchias perlo Heptranchias perlo
Hippopotamus amphibius Hippopotamus amphibius
Hydrolagus colliei Hydrolagus colliei
Hyperoodon ampullatus Hyperoodon ampullatus
Labrus bergylta Labrus bergylta
Lampetra fluviatilis Lampetra fluviatilis
Lemur catta Lemur catta
Leucopleurus acutus Leucopleurus acutus

Building the Tree of Life

The Vertebrate Genomes Project (VGP) aims to generate near error-free, chromosome-level reference genome assemblies for all ~70,000 extant vertebrate species.

Galaxy provides the computational infrastructure to achieve this at scale, with standardized, reproducible workflows accessible to researchers worldwide.

"95% of the main discoveries that have driven biotechnology came from studying things that were not model organisms at the time." - Giulio Formenti
VGP Phylogenetic Tree

274 genomes and counting

VGP Workflow Modules

VGP Workflow Modules

Click image to view fullscreen

Pre-Assembly & Assembly

HiFi reads Species name Genetic code
VGP0
Mitogenome Assembly
GenBank file Annotation images
HiFi reads K-mer length Ploidy
VGP1
K-mer Profiling
Meryl database GenomeScope plots
HiFi reads Parental Illumina
VGP2
K-mer Profiling Trio
Meryl DBs GenomeScope profiles
HiFi reads Meryl DB GenomeScope
VGP3
HiFi Assembly
Primary assembly Alternate assembly
HiFi reads HiC reads Meryl DB
VGP4
HiFi+HiC Assembly
Haplotype 1 Haplotype 2
HiFi reads Parental reads Meryl DBs
VGP5
Trio Assembly
Paternal haplotype Maternal haplotype

Post-Assembly Processing

Assemblies Trimmed HiFi Meryl DB
VGP6
Purge Duplicates
Purged primary Purged alternate
Assembly Trimmed HiFi Meryl DB
VGP6b
Purge Single
Purged assembly
Assembly (GFA) Bionano cmap
VGP7
Bionano Scaffolding
Scaffolds QC plots
Assembly (GFA) HiC reads
VGP8
HiC Scaffolding
Scaffolded assembly Contact map

VGP Workflows by Compute

Core hours used by workflows

Top Assembly Tools

Core hours used by individual tools

Peak Memory by Tool

Assembly needs memory!

Powered by Galaxy + ACCESS-CI

VGP leverages the US national cyberinfrastructure through ACCESS-CI, running on resources at TACC (Jetstream2), PSC, NCSA, and SDSC.

This provides the massive memory (up to 4TB) and CPU resources (up to 128 cores) needed for large vertebrate genome assemblies.

Galaxy provides an equivalent of >$2,000,000/year of free computational infrastructure to genomics researchers.
ACCESS-CI Infrastructure

gxy.io/what-is-vgp

QR Code
1 / 9
Space pause   navigate
Paused
× Fullscreen view