Lecture 17: DNA Sequencing Technologies
Learning Objectives
By the end of this lecture, you will be able to:
- Explain the principle of sequencing-by-synthesis and how it differs from sequencing-by-ligation
- Compare historical sequencing platforms (454, SOLiD, Ion Torrent) and explain why they were displaced
- Describe Illumina’s dye chemistry evolution from 4-channel to 2-channel to 1-channel imaging
- Explain Element Biosciences’ avidite chemistry and why decoupling detection from incorporation matters
- Describe PacBio’s zero-mode waveguides and how circular consensus sequencing produces HiFi reads
- Explain nanopore sequencing, direct RNA sequencing, and adaptive sampling
- Choose an appropriate sequencing platform given a specific experimental goal
1. Introduction: The Sequencing Revolution
DNA sequencing has undergone extraordinary transformations since Frederick Sanger’s chain-termination method in 1977. That original approach – using dideoxynucleotides to terminate growing DNA chains at random positions – dominated for nearly three decades and was the workhorse behind the Human Genome Project (completed 2003). The human genome cost approximately $2.7 billion to sequence using Sanger technology.
The transition from Sanger to next-generation sequencing (NGS) began in 2005 and triggered a cost reduction that outpaced Moore’s Law by orders of magnitude. We now routinely sequence a human genome for under $200 in a matter of hours.
Generations of Sequencing
| Generation | Era | Key Feature | Examples |
|---|---|---|---|
| 1st | 1977-2004 | Chain termination | Sanger (ABI 3730) |
| 2nd | 2005-present | Massively parallel short reads | 454, Illumina, SOLiD, Ion Torrent, Element |
| 3rd | 2011-present | Single-molecule, long reads | PacBio, Oxford Nanopore |
Nearly all modern sequencing platforms share the sequencing-by-synthesis (SBS) principle: watch a DNA polymerase incorporate nucleotides one at a time and detect which base was added at each step. The platforms differ in how they detect incorporation – light, pH change, electrical current, or fluorescence duration. (SOLiD is a notable exception, using oligonucleotide ligation rather than polymerase activity.)
2. Historical Technologies
Three platforms dominated the early NGS era. All have been largely superseded, but understanding them illuminates the design trade-offs that shaped current technology.
2.1 454 Pyrosequencing
454 Life Sciences launched the first commercial NGS platform in 2005 (Margulies et al. 2005, Nature 437:376-380). Roche acquired the company in 2007 and discontinued the platform in 2013.
How it worked:
- Fragment DNA and ligate adapters
- Capture single fragments on beads via emulsion PCR – each bead carries ~10 million copies of one fragment
- Deposit beads into picotiter plate wells (one bead per well)
- Flow nucleotides sequentially (T, then A, then G, then C, repeat)
- When a nucleotide is incorporated, pyrophosphate (PPi) is released
- PPi is converted to ATP by sulfurylase; ATP drives luciferase to produce a flash of light
- A CCD camera captures the light; signal intensity is proportional to the number of bases incorporated
Key characteristics:
- Read length: up to ~1000 bp (longest of any NGS platform at the time)
- Throughput: ~700 Mb per run
- Main error type: homopolymer insertion/deletions – runs of identical bases (e.g., AAAA) produce a single light pulse whose intensity must be quantified, and intensity estimation degrades for runs >6 bp
- Run time: ~10 hours
454 Pyrosequencing Animation
Watch nucleotides flow over a well. When the correct nucleotide is incorporated, pyrophosphate triggers a light flash.
2.2 SOLiD (Sequencing by Oligonucleotide Ligation and Detection)
Applied Biosystems launched SOLiD in 2007, building on work by Shendure et al. 2005, Science 309:1728-1732.
How it worked:
- DNA fragments amplified on beads via emulsion PCR (similar to 454)
- Beads deposited on a glass slide
- Sequencing proceeds by ligation, not polymerase synthesis
- Short fluorescently-labeled di-base probes (8-mers) hybridize to the template adjacent to a primer
- DNA ligase joins probe to the primer
- Fluorophore is imaged, then cleaved
- Process repeats with offset primers – each base is interrogated twice from different ligation rounds
Two-base encoding (color space):
Each fluorescent label encodes a dinucleotide, not a single base. The read is recorded as a series of colors (0, 1, 2, 3) rather than bases. Decoding requires knowing the first base, and a single error propagates through the entire read. However, true SNPs change only one color, while sequencing errors change two consecutive colors – enabling powerful error correction.
Key characteristics:
- Read length: ~75 bp (short)
- Accuracy: very high (~99.94%) after color-space error correction
- Throughput: up to 30 Gb per run
- Main limitation: short reads, complex analysis pipeline, slow adoption
2.3 Ion Torrent
Life Technologies (now Thermo Fisher) launched Ion Torrent in 2010 (Rothberg et al. 2011, Nature 475:348-352). This was the first post-light sequencing technology.
How it worked:
- DNA fragments on beads via emulsion PCR, loaded into microwells on a semiconductor chip
- Nucleotides flowed sequentially (like 454)
- When a nucleotide is incorporated, a hydrogen ion (H+) is released, lowering the pH
- An ISFET (ion-sensitive field-effect transistor) sensor beneath each well detects the pH change
- No optics, no cameras – purely electronic detection on a standard CMOS chip
Key characteristics:
- Read length: up to 400 bp (Ion S5)
- Run time: 2-4 hours (fastest of early NGS)
- Main error: homopolymer indels (same problem as 454 – voltage proportional to number of incorporations)
- Still used in clinical settings (Ion GeneStudio S5) for targeted panels
Both 454 and Ion Torrent measure the magnitude of a signal (light intensity or voltage) to determine how many identical bases were incorporated in a single flow. Distinguishing 5 incorporations from 6 is inherently imprecise because signal scales sub-linearly and noise increases. Illumina avoids this entirely by incorporating one base at a time using reversible terminators.
Comparison of Historical Platforms
| Feature | 454 | SOLiD | Ion Torrent |
|---|---|---|---|
| Year launched | 2005 | 2007 | 2010 |
| Detection method | Light (pyrosequencing) | Fluorescence (ligation) | pH (semiconductor) |
| Read length | ~1000 bp | ~75 bp | ~400 bp |
| Max throughput/run | ~700 Mb | ~30 Gb | ~50 Gb |
| Primary error type | Homopolymer indels | Substitutions | Homopolymer indels |
| Status | Discontinued 2013 | Discontinued ~2015 | Still sold (clinical) |
3. Illumina
Illumina has dominated sequencing since approximately 2010, commanding >80% of the global sequencing market at its peak. The company’s technology traces back to Solexa (founded 1998, acquired by Illumina 2007) and the foundational paper by Bentley et al. 2008, Nature 456:53-59.
Core Principle: Bridge Amplification + Reversible Terminators
Library preparation and cluster generation:
- Fragment DNA and ligate adapters containing P5 and P7 sequences
- Denature and load single-stranded fragments onto a flow cell coated with complementary oligos
- Each fragment hybridizes to a surface oligo and is copied by a polymerase
- The original template is washed away; the copied strand bends over and hybridizes to an adjacent oligo – forming a bridge
- Bridge amplification repeats ~35 cycles, creating a clonal cluster of ~1,000 identical copies
- Forward strands are cleaved (or reverse strands, depending on read); single-stranded clusters remain
Sequencing by synthesis:
- A sequencing primer hybridizes to the adapter
- All four nucleotides are added simultaneously – each carries:
- A fluorescent dye (color identifies the base)
- A 3’-O-azidomethyl blocking group (prevents further incorporation)
- After one nucleotide incorporates, excess is washed away
- The flow cell is imaged to capture fluorescence from every cluster
- Both the dye and the 3’-block are chemically cleaved
- The cycle repeats
This ensures exactly one base per cycle – no homopolymer ambiguity.
3.1 Four-Channel Chemistry (4-dye)
The original Illumina chemistry used on HiSeq, MiSeq, and GA platforms.
- 4 distinct fluorescent dyes, one per base
- 4 images per cycle (one filter per dye)
- Each base has a unique emission wavelength
3.2 Two-Channel Chemistry (2-dye)
Introduced on NextSeq and NovaSeq 6000 to reduce imaging time.
- Only 2 dyes are used
- A = Red + Green (both channels bright)
- C = Red only
- T = Green only
- G = No dye (“dark” – unlabeled)
- 2 images per cycle instead of 4
Illumina Dye Chemistry Comparison
Compare how the same cluster appears under 4-channel, 2-channel, and 1-channel imaging. Click a base to see its signature.
3.3 One-Channel Chemistry (1-dye)
Used on the iSeq 100 with an integrated CMOS sensor:
- Only 1 dye, but 2 chemistry steps per cycle
- Step 1: A and T are labeled; image taken
- Step 2: Chemically modify so only C and T are labeled; image taken
- T = bright in both images; A = bright in step 1 only; C = bright in step 2 only; G = dark in both
- Extremely compact instrument (benchtop, ~$20k list price)
3.4 XLEAP-SBS Chemistry
Illumina’s latest chemistry, deployed on NovaSeq X/X Plus and NextSeq 1000/2000:
- Redesigned polymerase and dye-linker system
- 2x faster incorporation kinetics
- ~50% lower error rate than previous chemistry
- Reduced phasing/pre-phasing artifacts
- 2-channel imaging, but with different base-color assignments than standard 2-channel: C is now the dual-color base (both channels), A is single-channel (blue only), T = green only, G = dark. This change affects index color-balancing requirements
Over many cycles, some DNA strands in a cluster fall behind (incomplete extension = phasing) or jump ahead (incorporation without blocking = pre-phasing). This makes clusters progressively out of sync, degrading quality scores toward the end of the read. XLEAP’s improved blocking chemistry reduces both effects.
Current Illumina Platforms
| Platform | Chemistry | Max Output | Max Read Length | Key Use Case |
|---|---|---|---|---|
| MiSeq | 4-channel | 15 Gb | 2 x 300 bp | Amplicons, small genomes |
| NextSeq 2000 | 2-channel (XLEAP) | 360 Gb (P3, 2x150) | 2 x 300 bp (P1/P2, up to 240 Gb) | Mid-throughput, flexible |
| NovaSeq X Plus | 2-channel (XLEAP) | 16 Tb | 2 x 150 bp | Production-scale WGS |
4. Element Biosciences
Element Biosciences (founded 2017, San Diego) introduced a fundamentally different approach to short-read sequencing with the AVITI system (launched 2022). The key innovation is described in Arslan et al. 2024, Nature Biotechnology 42:132-138.
Sequencing by Binding (SBB) / Avidite Chemistry
The central insight: decouple detection from incorporation.
In Illumina’s approach, the fluorescent dye is covalently attached to the nucleotide that gets incorporated. Even after cleaving the dye, a molecular “scar” remains on the growing strand. These scars accumulate over cycles and can interfere with polymerase activity and base calling.
Element’s approach uses a two-step process:
- Detection step: A multivalent construct called an avidite – consisting of a fluorescently-labeled streptavidin core connected via flexible linkers to multiple identical nucleotides – binds cooperatively (by avidity) to polymerase-template complexes across the polony. The avidite identifies the next correct base through polymerase selectivity. The cluster is imaged.
- Incorporation step: The avidite is washed away (it was never covalently incorporated). Then, an unlabeled 3’-blocked nucleotide is added and incorporated by the polymerase. The block is removed for the next cycle.

Because the incorporated nucleotide never carried a dye, there is no scarring. The DNA strand being synthesized contains only natural nucleotides with a removable 3’ block. This contributes to Element’s very high quality scores – most bases achieve Q40+ (99.99% accuracy).


Additional Innovations
- Rolling Circle Amplification (RCA) instead of bridge PCR: Generates polony clusters from circular DNA templates. This eliminates PCR-induced errors, optical duplicates, and index hopping.
- Two independent flow cells per AVITI run, each producing ~1 billion read pairs
- No index hopping: Because RCA does not involve free adapter-ligated fragments in solution, cross-contamination between indexed samples is essentially zero
AVITI Specifications
| Feature | AVITI 24x24 |
|---|---|
| Read pairs per flow cell | ~1 billion |
| Max output per flow cell | 300 Gb |
| Read length | 2 x 150 bp (2 x 300 bp available) |
| Quality | >90% bases Q30; Q40+ with Cloudbreak UltraQ chemistry |
| Run time | ~24-48 hr |
| List price | ~$289,000 |
In late 2024, Element launched the AVITI24, an upgraded platform delivering up to 1.5 billion reads per flow cell, multiomics capability (co-detection of RNA, protein, and morphology alongside sequencing), and Cloudbreak UltraQ kits achieving >70% of bases at Q50. List price: $424,000 ($150,000 as an upgrade from the original AVITI).
5. PacBio
Pacific Biosciences (founded 2004) pioneered single-molecule real-time (SMRT) sequencing, first described in Eid et al. 2009, Science 323:133-138. The enabling technology – the zero-mode waveguide – was developed by Levene et al. 2003, Science 299:682-686.
Zero-Mode Waveguides (ZMWs)
A ZMW is a cylindrical hole approximately 70 nm in diameter drilled through a ~100 nm metal film (aluminum) deposited on a glass substrate. The diameter is smaller than the wavelength of the excitation light (~500-600 nm), creating a condition where light cannot propagate through the hole. Instead, an evanescent field illuminates only the bottom ~30 nm of the well.

How sequencing works:
- A single DNA polymerase molecule is attached to the bottom of each ZMW
- The template DNA (with SMRTbell adapters) is loaded and binds to the polymerase
- Fluorescently labeled dNTPs (each base has a different color dye attached to the phosphate chain, not the base) diffuse freely in solution
- When a dNTP enters the ZMW and is incorporated, the polymerase holds it in the detection zone for milliseconds – long enough to capture the fluorescent signal
- Incorporation cleaves the phosphodiester bond, releasing the dye-labeled pyrophosphate
- The polymerase advances to the next position
- All of this is captured as a continuous movie of fluorescence pulses

Zero-Mode Waveguide (ZMW) Sequencing
A single polymerase at the bottom of the ZMW incorporates fluorescent nucleotides. The dye is on the phosphate group — released after incorporation.
Unlike Illumina (where the dye is attached to the base and must be cleaved), PacBio attaches the dye to the terminal phosphate of the dNTP. During incorporation, the natural phosphodiester bond formation releases the dye-labeled pyrophosphate. The incorporated nucleotide is completely natural – no cleavage step needed.
Circular Consensus Sequencing (CCS) and HiFi Reads
PacBio’s raw single-pass accuracy is approximately 85-90% (Q10-Q13), dominated by insertion and deletion errors. The breakthrough that made PacBio competitive for accuracy-sensitive applications is circular consensus sequencing (CCS), now marketed as HiFi reads.
How CCS works:
- SMRTbell library: Hairpin adapters are ligated to both ends of a double-stranded DNA fragment, creating a circular (topologically closed) molecule
- The polymerase begins at one adapter and reads through the insert, across the other adapter, back through the insert on the complementary strand, and so on
- Each complete traverse of the insert is called a subread
- With a 15 kb insert and a polymerase that reads ~100 kb before dissociation, you get ~6-7 passes
- A consensus algorithm aligns the subreads and calls a consensus base at each position
- Because insertion and deletion errors are essentially random, they cancel out across passes
Accuracy scaling:
\[Q_{\text{CCS}} \approx Q_{\text{subread}} + 10 \cdot \log_{10}(n)\]
where \(n\) is the number of passes. With ~10 passes and Q10 subreads:
\[Q_{\text{CCS}} \approx 10 + 10 \cdot \log_{10}(10) = 20 \text{ (per pass; actual consensus methods achieve Q30+)}\]
In practice, modern HiFi reads routinely achieve Q30-Q40 (99.9-99.99% accuracy) with median read lengths of 15-20 kb.


Kinetics: Detecting Base Modifications
Because PacBio records the entire movie of incorporation, it captures two kinetic parameters:
- IPD (inter-pulse duration): Time between consecutive incorporation events
- PW (pulse width): Duration of each incorporation event
Modified bases (e.g., 5mC, 6mA, 4mC) alter the polymerase kinetics, changing IPD and PW in characteristic patterns. This enables direct detection of DNA methylation without bisulfite conversion.

Current PacBio Platforms
| Platform | ZMWs per Cell | HiFi Output | Read Length | Price |
|---|---|---|---|---|
| Revio | 25 million | ~90 Gb/cell (4 cells/run) | 10-25 kb | ~$779,000 |
| Vega | 1 SMRT Cell | up to ~60 Gb | 10-25 kb | ~$169,000 |
6. Oxford Nanopore Technologies (ONT)
Oxford Nanopore Technologies (founded 2005) launched the MinION in 2014, introducing a radically different sequencing paradigm. Key references include Jain et al. 2015, Nature Methods 12:351-356 and Deamer et al. 2016, Nature Biotechnology 34:518-524.
Principle: Ionic Current Through a Protein Pore

- A protein nanopore (derived from CsgG, an E. coli curli secretion channel) is embedded in a synthetic polymer membrane
- A voltage is applied across the membrane, driving an ionic current through each pore
- A motor protein (helicase) is ligated to the DNA via sequencing adapters
- The motor protein ratchets single-stranded DNA through the pore one base at a time
- As each k-mer passes through the constriction(s) of the pore, it partially blocks the ionic current. The original R9 pore read ~5 bases at a time; the current R10 pore with dual constrictions reads ~9 bases simultaneously
- Each k-mer produces a characteristic current level – with \(4^9 = 262{,}144\) possible levels for R10
- The raw current signal (“squiggle”) is basecalled by a neural network (Dorado)
The original R9 pore (CsgG mutant) had a single constriction point and ~90-97% raw simplex accuracy (depending on basecaller version). The current R10 pore features a longer barrel with dual constriction points, dramatically improving homopolymer resolution and pushing simplex accuracy above 99%. All current ONT flow cells ship with R10.4.1 chemistry.
Nanopore Sequencing Visualization
DNA is threaded through a protein pore. Each ~5-mer modulates the ionic current differently, producing a characteristic "squiggle" signal.
Direct RNA Sequencing
One of ONT’s most distinctive capabilities is direct RNA sequencing – reading native RNA molecules without reverse transcription or PCR amplification.
How it works:
- A sequencing adapter with a motor protein is ligated directly to the poly(A) tail of mRNA (or to any RNA with an adapter)
- The motor protein feeds the RNA strand through the pore 3’ to 5’
- The current signal reflects the RNA bases (A, C, G, U) passing through the pore
- Dorado basecalls the RNA signal using RNA-specific models
RNA modifications detected:
Direct RNA sequencing preserves all chemical modifications on the RNA molecule. Current modification detection models can identify:
- m6A (N6-methyladenosine) – the most abundant mRNA modification
- m5C (5-methylcytidine)
- pseudouridine (\(\Psi\))
- inosine (from A-to-I editing)
The latest RNA004 chemistry (2025) provides significantly improved accuracy and throughput for direct RNA applications.
Every other RNA sequencing method requires reverse transcription to cDNA, which introduces biases (template switching, length bias, loss of modifications). Direct RNA sequencing reads the native molecule, preserving the full modification landscape and enabling study of the epitranscriptome at single-molecule resolution.
Adaptive Sampling (ReadUntil)
ONT’s software can make real-time decisions about individual molecules:
- As a read begins, the first few hundred bases are basecalled in real time
- The software maps this fragment against a reference or target list
- If the molecule is on-target, sequencing continues normally
- If the molecule is off-target, the voltage is briefly reversed, ejecting the strand from the pore
- The pore is immediately available for a new molecule
This enables targeted sequencing without physical enrichment – no PCR, no capture probes. In practice, adaptive sampling can achieve 5-10x enrichment of target regions.

Duplex Sequencing
When both strands of a double-stranded DNA molecule are sequenced consecutively through the same pore (the complement strand follows the template strand), the two reads can be combined into a duplex read with accuracy exceeding Q30 (>99.9%). This is analogous to PacBio’s CCS concept but achieved differently.

Current ONT Devices
| Device | Flow Cells | Channels per Cell | Max Output | Use Case |
|---|---|---|---|---|
| MinION | 1 | 512 (2,048 pores) | ~50 Gb | Portable, field work, teaching |
| P2 Solo | 2 | 2,675 | ~290 Gb/cell | Benchtop production |
| PromethION 48 | 48 | 2,675 | ~14 Tb | Large-scale production |
Data Formats
- POD5: Current raw signal format (Apache Arrow-based), replacing the legacy FAST5 format (which was HDF5-based)
- Dorado: ONT’s basecaller, uses neural networks to convert raw signal to sequence
- Output: BAM (with move table for signal-to-base mapping) or FASTQ
7. Technology Comparison
| Feature | Illumina (NovaSeq X) | Element (AVITI) | PacBio (Revio) | ONT (PromethION) |
|---|---|---|---|---|
| Read length | 2 x 150 bp | 2 x 150 bp | 10-25 kb (HiFi) | Ultra-long possible (>100 kb) |
| Single-read accuracy | ~99.9% (Q30) | ~99.99% (Q40) | ~99.9% (Q30 HiFi) | ~99% (Q20 simplex); Q30+ duplex |
| Max output/run | 16 Tb | 600 Gb | 360 Gb/day (4 cells) | ~14 Tb |
| Cost per Gb | ~$2 | ~$2-5 | ~$8 | ~$8-15 |
| Time to first results | ~24-48 hr | ~24-48 hr | ~24 hr | Minutes (real-time) |
| Primary error type | Substitutions | Substitutions | Indels (raw); balanced (HiFi) | Indels in homopolymers (improved with R10) |
| Detect base modifications | No (need bisulfite/EM-seq) | No | Yes (kinetics) | Yes (current signal) |
| Direct RNA | No | No | No | Yes |
| Adaptive sampling | No | No | No | Yes |
| Portability | Lab only | Lab only | Lab only | Field-deployable (MinION) |
Platform choice depends on the experiment. Illumina and Element dominate for high-throughput short-read applications (WGS, RNA-seq, ATAC-seq). PacBio excels at genome assembly, structural variant detection, and methylation. ONT uniquely enables real-time, portable, and direct RNA sequencing. Many projects now combine short and long reads for optimal results.
8. Practicalities: Data Formats and Exploration
Hands-on data exploration will be covered in companion Jupyter notebooks. Here we briefly introduce the key data formats you will encounter.
FASTQ (Illumina, Element, ONT after basecalling)
The universal short-read format. Four lines per read:
@read_id optional_description
ACGTACGTACGTACGT...
+
FFFFFFFFFFFFFFFF...
- Line 1: Header (starts with
@) - Line 2: DNA sequence
- Line 3: Separator (
+) - Line 4: Quality scores (ASCII-encoded Phred scores;
F= Q37)
# Peek at a FASTQ file
import gzip
with gzip.open("sample.fastq.gz", "rt") as f:
for i, line in enumerate(f):
print(line.strip())
if i >= 7: # show 2 reads
breakBAM (PacBio HiFi, also general alignment format)
Binary Alignment/Map format. For PacBio, unaligned BAM files store HiFi reads along with per-base kinetics (IPD and pulse width) in auxiliary tags.
# Peek at a PacBio HiFi BAM
samtools view -h sample.hifi.bam | head -20Key PacBio BAM tags:
rq: Read quality (float, 0-1)np: Number of full passesfi/ri: Forward/reverse IPD valuesfp/rp: Forward/reverse pulse width values
POD5 (ONT raw signal)
HDF5-based format storing the raw electrical signal from each nanopore read. This is the native ONT format, replacing the legacy FAST5 format.
# Peek at a POD5 file
import pod5
with pod5.Reader("sample.pod5") as reader:
for read in reader.reads():
print(f"Read ID: {read.read_id}")
print(f"Signal length: {len(read.signal)} samples")
print(f"Sample rate: {read.run_info.sample_rate} Hz")
breakONT raw data is not DNA sequence – it is a time series of electrical current measurements (picoamps) sampled at ~4-5 kHz. Basecalling with Dorado converts this signal into nucleotide sequence. You can re-basecall the same POD5 file with updated models to improve accuracy without re-sequencing.
Format Summary
| Format | Type | Platform | Contains | Typical Size |
|---|---|---|---|---|
| FASTQ(.gz) | Text (compressed) | All | Sequence + quality | Varies widely (1 GB to hundreds of GB) |
| BAM | Binary | PacBio, alignments | Sequence + quality + kinetics | ~10-50 GB per cell |
| POD5 | Apache Arrow | ONT | Raw electrical signal | ~50-200 GB per flow cell |
| FAST5 | HDF5 (legacy) | ONT | Raw signal (single/multi) | Deprecated; convert to POD5 |
Choosing a Platform
- Population-scale WGS or RNA-seq → Illumina NovaSeq X or Element AVITI
- De novo genome assembly → PacBio HiFi (Revio)
- Structural variant detection → PacBio HiFi or ONT long reads
- Field diagnostics or rapid surveillance → ONT MinION
- Epitranscriptomics / RNA modifications → ONT direct RNA sequencing
- Targeted panels or amplicons → Illumina MiSeq or ONT adaptive sampling
9. On the Horizon: Roche Sequencing by Expansion (SBX)
Roche – which exited sequencing in 2013 after discontinuing 454 – re-entered the field in February 2025 with a radically different approach: Sequencing by Expansion (SBX) (Fuller et al. 2025, bioRxiv).
The Xpandomer Concept
SBX overcomes a fundamental limitation of nanopore sequencing: the difficulty of resolving individual bases as they pass through a pore. Instead of reading native DNA directly, SBX first converts each DNA template into a surrogate polymer called an Xpandomer that is ~50x longer than the original.
How it works:
- DNA polymerase copies the template using expandable nucleotide triphosphates (X-NTPs) – synthetic building blocks that each encode one base but contain a long, expandable linker and a high-signal reporter code
- After synthesis, the Xpandomer is chemically expanded, stretching the reporter codes apart
- Each reporter includes a translocation control element that pauses the Xpandomer in the nanopore – one reporter at a time
- The reporter blocks ionic current in one of four distinct levels, each uniquely identifying A, C, G, or T
- Voltage pulses advance the Xpandomer through the pore one step at a time
In conventional nanopore sequencing (ONT), ~5-9 bases simultaneously occupy the pore constriction, creating a complex signal that requires neural network deconvolution and struggles with homopolymers. SBX eliminates this problem entirely: each base is read individually through a single, unambiguous current level. This is a fundamentally different approach to the signal-to-noise problem.
Hardware: CMOS Nanopore Arrays
The SBX platform uses a CMOS-based sensor chip containing ~8 million microwells, each housing a single nanopore. Each microwell integrates the electrode, detection circuit, and analog-to-digital converter on-chip – similar in concept to Ion Torrent’s semiconductor approach but reading current blockade rather than pH.
Performance
- Accuracy: Demonstrated F1 scores >99.80% (SNV) and >99.7% (InDel) for whole-genome sequencing
- Speed: In collaboration with Broad Clinical Labs and Boston Children’s Hospital, SBX achieved a Guinness World Record for fastest DNA sequencing – a complete genome in under 4 hours
- Applications demonstrated: Whole-genome sequencing, bulk RNA-seq, methylation mapping, and spatial multiomics
Platform: AXELIOS 1
The first commercial instrument, AXELIOS 1, is planned for Research Use Only (RUO) launch in 2026. Roche aims to eventually bring SBX into clinical diagnostics, leveraging its existing global diagnostics infrastructure.
Roche’s return to sequencing is notable: the company that pioneered commercial NGS with 454 pyrosequencing (2005-2013) has now developed a technology that combines biochemical innovation (Xpandomers) with nanopore readout and semiconductor manufacturing – drawing on ideas from all three generations of sequencing.
Key References
- Sanger F, Nicklen S, Coulson AR. DNA sequencing with chain-terminating inhibitors. PNAS. 1977;74(12):5463-5467.
- Margulies M, et al. Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005;437:376-380. doi:10.1038/nature03959
- Shendure J, et al. Accurate multiplex polony sequencing of an evolved bacterial genome. Science. 2005;309:1728-1732. doi:10.1126/science.1117389
- Rothberg JM, et al. An integrated semiconductor device enabling non-optical genome sequencing. Nature. 2011;475:348-352. doi:10.1038/nature10242
- Bentley DR, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53-59. doi:10.1038/nature07517
- Arslan S, et al. Sequencing by avidity enables high accuracy with low reagent consumption. Nature Biotechnology. 2024;42:132-138. doi:10.1038/s41587-023-01750-7
- Eid J, et al. Real-time DNA sequencing from single polymerase molecules. Science. 2009;323:133-138. doi:10.1126/science.1162986
- Levene MJ, et al. Zero-mode waveguides for single-molecule analysis at high concentrations. Science. 2003;299:682-686. doi:10.1126/science.1079700
- Jain M, et al. Improved data analysis for the MinION nanopore sequencer. Nature Methods. 2015;12:351-356. doi:10.1038/nmeth.3290
- Deamer D, Akeson M, Branton D. Three decades of nanopore sequencing. Nature Biotechnology. 2016;34:518-524. doi:10.1038/nbt.3423
- Fuller CW, et al. Sequencing by Expansion (SBX) – a novel, high-throughput single-molecule sequencing technology. bioRxiv. 2025. doi:10.1101/2025.02.19.639056