Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation

Kolmogorov, Mikhail; Billingsley, Kimberley J.; Mastoras, Mira; Meredith, Melissa; Monlong, Jean; Lorig-Roach, Ryan; Asri, Mobin; Alvarez Jerez, Pilar; Malik, Laksh; Dewan, Ramita; Reed, Xylena; Genner, Rylee M.; Daida, Kensuke; Behera, Sairam; Shafin, Kishwar; Pesout, Trevor; Prabakaran, Jeshuwin; Carnevali, Paolo; Yang, Jianzhi; Rhie, Arang; Scholz, Sonja W.; Traynor, Bryan J.; Miga, Karen H.; Jain, Miten; Timp, Winston; Phillippy, Adam M.; Chaisson, Mark; Sedlazeck, Fritz J.; Blauwendraat, Cornelis; Paten, Benedict

doi:10.1038/s41592-023-01993-x

Article
Published: 14 September 2023

Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation

Nature Methods volume 20, pages 1483–1492 (2023)Cite this article

6746 Accesses
10 Citations
30 Altmetric
Metrics details

Subjects

Abstract

Long-read sequencing technologies substantially overcome the limitations of short-reads but have not been considered as a feasible replacement for population-scale projects, being a combination of too expensive, not scalable enough or too error-prone. Here we develop an efficient and scalable wet lab and computational protocol, Napu, for Oxford Nanopore Technologies long-read sequencing that seeks to address those limitations. We applied our protocol to cell lines and brain tissue samples as part of a pilot project for the National Institutes of Health Center for Alzheimer’s and Related Dementias. Using a single PromethION flow cell, we can detect single nucleotide polymorphisms with F1-score comparable to Illumina short-read sequencing. Small indel calling remains difficult within homopolymers and tandem repeats, but achieves good concordance to Illumina indel calls elsewhere. Further, we can discover structural variants with F1-score on par with state-of-the-art de novo assembly methods. Our protocol phases small and structural variants at megabase scales and produces highly accurate, haplotype-specific methylation calls.

Access through your institution

Buy or subscribe

This is a preview of subscription content, access via your institution

Access options

Access through your institution

Buy this article

Purchase on Springer Link
Instant access to full article PDF

Buy now

Prices may be subject to local taxes which are calculated during checkout

**Fig. 1: Single flow cell ONT sequencing protocol.**

**Fig. 2: Small variant calling performance evaluation.**

**Fig. 4: Combined, phased small variants and SVs improve the profiling of complex genomic regions.**

**Fig. 6: Haplotype-specific methylation profiling.**

Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads

Article 01 November 2021

Analysis of short tandem repeat expansions and their methylation state with nanopore sequencing

Article 18 November 2019

Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome

Article 12 August 2019

Data availability

The cell line data (HG002, HG0073 and HG02723) are openly available through the AnVIL workspace: https://anvil.terra.bio/#workspaces/anvil-datastorage/ANVIL_NIA_CARD_Coriell_Cell_Lines_Open. Human brain sequencing datasets are under controlled access and require a dbGap application (phs001300.v4). Afterwards, the data will be available through the restricted AnVIL workspace: https://anvil.terra.bio/#workspaces/anvil-datastorage/ANVIL_NIA_CARD_LR_WGS_NABEC_GRU. Matching Illumina data used for cell line evaluations are available at: https://www.internationalgenome.org/data-portal/data-collection/30x-grch38. HPRC assemblies are available at: https://github.com/human-pangenomics/HPP_Year1_Data_Freeze_v1.0. GIAB benchmarks are available at: https://www.nist.gov/programs-projects/genome-bottle.

Code availability

The Napu implementation in WDL is available at: https://github.com/nanoporegenomics/napu_wf. Hapdup is available as a standalone tool at: https://github.com/KolmogorovLab/hapdup. Hapdiff is available at: https://github.com/KolmogorovLab/hapdiff.

References

DePristo, M. A. et al. A framework for variation discovery and genotyping using next-generation DNA sequencing data. Nat. Genet. 43, 491–498 (2011).
Article CAS PubMed PubMed Central Google Scholar
1000 Genomes Project Consortium et al. An integrated map of genetic variation from 1,092 human genomes. Nature 491, 56–65 (2012).
Article Google Scholar
100,000 Genomes Project Pilot Investigators et al. 100,000 Genomes pilot on rare-disease diagnosis in health care—preliminary report. N. Engl. J. Med. 385, 1868–1880 (2021).
Karczewski, K. J. et al. The mutational constraint spectrum quantified from variation in 141,456 humans. Nature 581, 434–443 (2020).
Article CAS PubMed PubMed Central Google Scholar
Huang, K.-L. et al. Pathogenic germline variants in 10,389 adult cancers. Cell 173, 355–370.e14 (2018).
Article CAS PubMed PubMed Central Google Scholar
ICGC/TCGA Pan-Cancer Analysis of Whole Genomes Consortium. Pan-cancer analysis of whole genomes. Nature 578, 82–93 (2020).
Article Google Scholar
Sedlazeck, F. J., Lee, H., Darby, C. A. & Schatz, M. C. Piercing the dark matter: bioinformatics of long-range sequencing and mapping. Nat. Rev. Genet. 19, 329–346 (2018).
Article CAS PubMed Google Scholar
Chen, X. et al. Manta: rapid detection of structural variants and indels for germline and cancer sequencing applications. Bioinformatics 32, 1220–1222 (2016).
Article CAS PubMed Google Scholar
Mahmoud, M. et al. Structural variant calling: the long and the short of it. Genome Biol. 20, 246 (2019).
Article PubMed PubMed Central Google Scholar
Zarate, S. et al. Parliament2: accurate structural variant calling at scale. Gigascience 9, giaa145 (2020).
Article PubMed PubMed Central Google Scholar
Zook, J. M. et al. A robust benchmark for detection of germline large deletions and insertions. Nat. Biotechnol. 38, 1347–1355 (2020).
Article CAS PubMed PubMed Central Google Scholar
Wagner, J. et al. Curated variation benchmarks for challenging medically relevant autosomal genes. Nat. Biotechnol. 40, 672–680 (2022).
Article CAS PubMed PubMed Central Google Scholar
Wagner, J. et al. Benchmarking challenging small variants with linked and long reads. Cell Genom. 2, 100128 (2022).
Article CAS PubMed PubMed Central Google Scholar
Lee, H. & Schatz, M. C. Genomic dark matter: the reliability of short read mapping illustrated by the genome mappability score. Bioinformatics 28, 2097–2105 (2012).
Article CAS PubMed PubMed Central Google Scholar
Martin, M. et al. WhatsHap: fast and accurate read-based phasing. Preprint at bioRxiv https://doi.org/10.1101/085050 (2016).
Loh, P.-R. et al. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48, 1443–1448 (2016).
Article CAS PubMed PubMed Central Google Scholar
Sedlazeck, F. J. et al. Accurate detection of complex structural variations using single-molecule sequencing. Nat. Methods 15, 461–468 (2018).
Article CAS PubMed PubMed Central Google Scholar
Jiang, T. et al. Long-read-based human genomic structural variation detection with cuteSV. Genome Biol. 21, 189 (2020).
Article CAS PubMed PubMed Central Google Scholar
Shafin, K. et al. Haplotype-aware variant calling with PEPPER-Margin-DeepVariant enables high accuracy in nanopore long-reads. Nat. Methods 18, 1322–1332 (2021).
Article CAS PubMed PubMed Central Google Scholar
Lin, J.-H., Chen, L.-C., Yu, S.-C. & Huang, Y.-T. LongPhase: an ultra-fast chromosome-scale phasing algorithm for small and large variants. Bioinformatics 38, 1816–1822 (2022).
Article CAS PubMed Google Scholar
Mahmoud, M., Doddapaneni, H., Timp, W. & Sedlazeck, F. J. PRINCESS: comprehensive detection of haplotype resolved SNVs, SVs, and methylation. Genome Biol. 22, 268 (2021).
Article CAS PubMed PubMed Central Google Scholar
Logsdon, G. A., Vollger, M. R. & Eichler, E. E. Long-read human genome sequencing and its applications. Nat. Rev. Genet. 21, 597–614 (2020).
Article CAS PubMed PubMed Central Google Scholar
Rhie, A. et al. Towards complete and error-free genome assemblies of all vertebrate species. Nature 592, 737–746 (2021).
Article CAS PubMed PubMed Central Google Scholar
Nurk, S. et al. The complete sequence of a human genome. Science 376, 44–53 (2022).
Article CAS PubMed PubMed Central Google Scholar
Liao, W.-W. et al. A draft human pangenome reference. Nature 617, 312–324 (2023).
Article CAS PubMed PubMed Central Google Scholar
Jarvis, E. D. et al. Automated assembly of high-quality diploid human reference genomes. Nature 611, 519–531 (2022).
Article CAS PubMed PubMed Central Google Scholar
Koren, S. et al. Canu: scalable and accurate long-read assembly via adaptive -mer weighting and repeat separation. Genome Res. 27, 722–736 (2017).
Article CAS PubMed PubMed Central Google Scholar
Jain, M. et al. Nanopore sequencing and assembly of a human genome with ultra-long reads. Nat. Biotechnol. 36, 338–345 (2018).
Article CAS PubMed PubMed Central Google Scholar
Kolmogorov, M., Yuan, J., Lin, Y. & Pevzner, P. A. Assembly of long, error-prone reads using repeat graphs. Nat. Biotechnol. 37, 540–546 (2019).
Article CAS PubMed Google Scholar
Shafin, K. et al. Nanopore sequencing and the Shasta toolkit enable efficient de novo assembly of eleven human genomes. Nat. Biotechnol. 38, 1044–1053 (2020).
Article CAS PubMed PubMed Central Google Scholar
Rautiainen, M. et al. Verkko: telomere-to-telomere assembly of diploid chromosomes. Nat. Biotechnol. https://doi.org/10.1038/s41587-023-01662-6 (2023).
Billingsley, K. J. et al. Processing human frontal cortex brain tissue for population-scale Oxford Nanopore long-read DNA sequencing SOP v2. protocols.io https://doi.org/10.17504/protocols.io.kxygxzmmov8j/v2 (2022).
Baker, B. et al. Processing human frontal cortex brain tissue for population-scale SQK-LSK114 Oxford Nanopore long-read DNA sequencing SOP v1. protocols.io https://doi.org/10.17504/protocols.io.kxygx3zzog8j/v1 (2022).
Alvarez Jerez, P. et al. Processing frozen cells for population-scale Oxford Nanopore long-read DNA sequencing SOP v1. protocols.io https://doi.org/10.17504/protocols.io.5jyl8pnk7g2w/v1 (2022).
Gibbs, J. R. et al. Abundant quantitative trait loci exist for DNA methylation and gene expression in human brain. PLoS Genet. 6, e1000952 (2010).
Article PubMed PubMed Central Google Scholar
Schatz, M. C. et al. Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space. Cell Genom. 2, 100085 (2022).
Article CAS PubMed PubMed Central Google Scholar
Li, H. yak: yet another k-mer analyzer. GitHub https://github.com/lh3/yak (2023).
Smolka, M. et al. Comprehensive structural variant detection: from mosaic to population-level. Preprint at bioRxiv https://doi.org/10.1101/2022.04.04.487055 (2022).
English, A. C., Menon, V. K., Gibbs, R. A., Metcalf, G. A. & Sedlazeck, F. J. Truvari: refined structural variant comparison preserves allelic diversity. Genome Biol. 23, 271 (2022).
Article CAS PubMed PubMed Central Google Scholar
Yang, J. & Chaisson, M. J. P. TT-Mars: structural variants assessment based on haplotype-resolved assemblies. Genome Biol. 23, 110 (2022).
Article PubMed PubMed Central Google Scholar
Vollger, M. R. et al. Long-read sequence and assembly of segmental duplications. Nat. Methods 16, 88–94 (2019).
Article CAS PubMed Google Scholar
Kirsche, M. et al. Jasmine: population-scale structural variant comparison and analysis. Nat. Methods 20, 408–417 (2023).
Article CAS PubMed PubMed Central Google Scholar
Chowdhury, M., Pedersen, B. S., Sedlazeck, F. J., Quinlan, A. R. & Layer, R. M. Searching thousands of genomes to classify somatic and novel structural variants using STIX. Nat. Methods 19, 445–448 (2022).
Article CAS PubMed PubMed Central Google Scholar
Li, H. et al. A synthetic-diploid benchmark for accurate variant-calling evaluation. Nat. Methods 15, 595–597 (2018).
Article PubMed PubMed Central Google Scholar
Byrska-Bishop, M. et al. High-coverage whole-genome sequencing of the expanded 1000 Genomes Project cohort including 602 trios. Cell 185, 3426–3440.e19 (2022).
Article CAS PubMed PubMed Central Google Scholar
Li, H. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34, 3094–3100 (2018).
Article CAS PubMed PubMed Central Google Scholar
Lin, Y. et al. Assembly of long error-prone reads using de Bruijn graphs. Proc. Natl Acad. Sci. USA 113, E8396–E8405 (2016).
Article CAS PubMed PubMed Central Google Scholar
Mikheenko, A., Prjibelski, A., Saveliev, V., Antipov, D. & Gurevich, A. Versatile genome assembly evaluation with QUAST-LG. Bioinformatics 34, i142–i150 (2018).
Article CAS PubMed PubMed Central Google Scholar
Cheng, H., Concepcion, G. T., Feng, X., Zhang, H. & Li, H. Haplotype-resolved de novo assembly using phased assembly graphs with hifiasm. Nat. Methods 18, 170–175 (2021).
Article CAS PubMed PubMed Central Google Scholar
Cheng, H. et al. Haplotype-resolved assembly of diploid genomes without parental data. Nat. Biotechnol. 40, 1332–1335 (2022).
Article CAS PubMed Google Scholar
Li, H., Feng, X. & Chu, C. The design and construction of reference pangenome graphs with minigraph. Genome Biol. 21, 265 (2020).
Article PubMed PubMed Central Google Scholar
Wick, R. R., Schultz, M. B., Zobel, J. & Holt, K. E. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31, 3350–3352 (2015).
Article CAS PubMed PubMed Central Google Scholar
Heller, D. & Vingron, M. SVIM-asm: structural variant detection from haploid and diploid genome assemblies. Bioinformatics 36, 5519–5521 (2020).
Article CAS PubMed Central Google Scholar
Razaghi, R. et al. Modbamtools: analysis of single-molecule epigenetic data for long-range profiling, heterogeneity, and clustering. Preprint at bioRxiv https://doi.org/10.1101/2022.07.07.499188 (2022).

Download references

Acknowledgements

This work was supported in part by the Intramural Research Program of the National Cancer Institute (M.K.), the National Human Genome Research Institute (A.M.P.), the National Institute on Aging (B.J.T.) and the Center for Alzheimer’s and Related Dementias (C.B.), within the Intramural Research Program of the NIA and the National Institute of Neurological Disorders and Stroke (grant nos. ZIANS003154, ZIAAG000538), National Institutes of Health (grant no. AG000538). The Brain and Body Donation Program has been supported by the National Institute of Neurological Disorders and Stroke (grant no. U24 NS072026 National Brain and Tissue Resource for Parkinson’s Disease and Related Disorders), the National Institute on Aging (grant nos. P30AG19610 and P30AG072980, Arizona Alzheimer’s Disease Center), the Arizona Department of Health Services (contract 211002, Arizona Alzheimer’s Research Center), the Arizona Biomedical Research Commission (contracts 4001, 0011, 05–901 and 1001 to the Arizona Parkinson’s Disease Consortium) and the Michael J. Fox Foundation for Parkinson’s Research. B.P. was partly supported by NIH grant nos. R01HG010485, U24HG010262, U24HG011853, OT3HL142481, U01HG010961 and OT2OD033761. M. Mastoras was supported by NIH grant no. T32HG012344. K.D. was supported by the JSPS Research Fellowship for Japanese Biomedical and Behavioral Researchers at NIH. The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript. We acknowledge the support of Oxford Nanopore Technologies staff in generating this dataset, in particular A. Markham. We acknowledge the support of the Circulomics Inc. team in generating this protocol, in particular K. Liu, J. Burke, M. Kim and D. Kilburn. We also acknowledge the Terra support team for their help with the data storage and cloud computing solutions. This work utilized the computational resources of the NIH HPC Biowulf cluster (https://hpc.nih.gov). We thank members of the North American Brain Expression Consortium (NABEC) for providing samples derived from brain tissue. We are grateful to the Banner Sun Health Research Institute Brain and Body Donation Program of Sun City, Arizona for the provision of human biological materials.

Author information

These authors contributed equally: Mikhail Kolmogorov, Kimberley J. Billingsley.

Authors and Affiliations

Center for Cancer Research, National Cancer Institute, National Institutes of Health, Bethesda, MD, USA
Mikhail Kolmogorov & Jeshuwin Prabakaran
Center for Alzheimer’s and Related Dementias, National Institute on Aging and National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
Kimberley J. Billingsley, Pilar Alvarez Jerez, Laksh Malik, Xylena Reed, Rylee M. Genner, Kensuke Daida & Cornelis Blauwendraat
Laboratory of Neurogenetics, National Institute on Aging, National Institutes of Health, Bethesda, MD, USA
Kimberley J. Billingsley, Ramita Dewan, Kensuke Daida, Bryan J. Traynor & Cornelis Blauwendraat
UC Santa Cruz Genomics Institute, Santa Cruz, CA, USA
Mira Mastoras, Melissa Meredith, Jean Monlong, Ryan Lorig-Roach, Mobin Asri, Trevor Pesout, Karen H. Miga & Benedict Paten
Human Genome Sequencing Center, Baylor College of Medicine, Houston, TX, USA
Sairam Behera & Fritz J. Sedlazeck
Google LLC, Mountain View, CA, USA
Kishwar Shafin
Department of Biological Sciences, University of Maryland Baltimore County, Baltimore, MD, USA
Jeshuwin Prabakaran
Chan Zuckerberg Initiative, Redwood City, CA, USA
Paolo Carnevali
Department of Quantitative and Computational Biology, University of Southern California, Los Angeles, CA, USA
Jianzhi Yang & Mark Chaisson
Genome Informatics Section, Computational and Statistical Genomics Branch, National Human Genome Research Institute, National Institutes of Health, Bethesda, MD, USA
Arang Rhie & Adam M. Phillippy
Neurodegenerative Diseases Research Unit, National Institute of Neurological Disorders and Stroke, National Institutes of Health, Bethesda, MD, USA
Sonja W. Scholz
Department of Neurology, Johns Hopkins University Medical Center, Baltimore, MD, USA
Sonja W. Scholz & Bryan J. Traynor
Department of Bioengineering, Northeastern University, Boston, MA, USA
Miten Jain
Department of Physics, Northeastern University, Boston, MA, USA
Miten Jain
Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA
Winston Timp
Department of Computer Science, Rice University, Houston, TX, USA
Fritz J. Sedlazeck

Authors

Mikhail Kolmogorov
View author publications
You can also search for this author in PubMed Google Scholar
Kimberley J. Billingsley
View author publications
You can also search for this author in PubMed Google Scholar
Mira Mastoras
View author publications
You can also search for this author in PubMed Google Scholar
Melissa Meredith
View author publications
You can also search for this author in PubMed Google Scholar
Jean Monlong
View author publications
You can also search for this author in PubMed Google Scholar
Ryan Lorig-Roach
View author publications
You can also search for this author in PubMed Google Scholar
Mobin Asri
View author publications
You can also search for this author in PubMed Google Scholar
Pilar Alvarez Jerez
View author publications
You can also search for this author in PubMed Google Scholar
Laksh Malik
View author publications
You can also search for this author in PubMed Google Scholar
Ramita Dewan
View author publications
You can also search for this author in PubMed Google Scholar
Xylena Reed
View author publications
You can also search for this author in PubMed Google Scholar
Rylee M. Genner
View author publications
You can also search for this author in PubMed Google Scholar
Kensuke Daida
View author publications
You can also search for this author in PubMed Google Scholar
Sairam Behera
View author publications
You can also search for this author in PubMed Google Scholar
Kishwar Shafin
View author publications
You can also search for this author in PubMed Google Scholar
Trevor Pesout
View author publications
You can also search for this author in PubMed Google Scholar
Jeshuwin Prabakaran
View author publications
You can also search for this author in PubMed Google Scholar
Paolo Carnevali
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhi Yang
View author publications
You can also search for this author in PubMed Google Scholar
Arang Rhie
View author publications
You can also search for this author in PubMed Google Scholar
Sonja W. Scholz
View author publications
You can also search for this author in PubMed Google Scholar
Bryan J. Traynor
View author publications
You can also search for this author in PubMed Google Scholar
Karen H. Miga
View author publications
You can also search for this author in PubMed Google Scholar
Miten Jain
View author publications
You can also search for this author in PubMed Google Scholar
Winston Timp
View author publications
You can also search for this author in PubMed Google Scholar
Adam M. Phillippy
View author publications
You can also search for this author in PubMed Google Scholar
Mark Chaisson
View author publications
You can also search for this author in PubMed Google Scholar
Fritz J. Sedlazeck
View author publications
You can also search for this author in PubMed Google Scholar
Cornelis Blauwendraat
View author publications
You can also search for this author in PubMed Google Scholar
Benedict Paten
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

M.K., K.J.B., C.B. and B.P. conceptualized and designed the study. K.J.B., P.A.J., L.M., R.D., X.R., R.M.G., K.D. and M.J. were responsible for protocol optimization and sequencing. M.K., M. Mastoras, M. Meredith, J.M., M.A., K.S., T.P., J.P. and P.C. were responsible for algorithmic development. M.K., K.J.B., M. Mastoras, M. Meredith, J.M., R.L.-R., M.A., P.A.J., R.M.G., K.D., S.B., K.S., T.P., P.C., J.Y., A.R., M.J., W.T., M.C., F.J.S., C.B. and B.P. performed data analysis. M.K., K.J.B., S.W.S., B.J.T., K.H.M., M.J., W.T., A.M.P., M.C., F.J.S., C.B. and B.P. interpreted data and oversaw the study. M.K. and B.P. drafted the manuscript. All authors provided feedback and helped revise the manuscript.

Corresponding authors

Correspondence to Mikhail Kolmogorov, Kimberley J. Billingsley, Cornelis Blauwendraat or Benedict Paten.

Ethics declarations

Competing interests

K.S. is an employee of Google LLC and owns Alphabet stock as part of the standard compensation package; authors from Google LLC did not have access to the cell line and brain tissue sample data. W.T. has two patents (8,748,091 and 8,394,584) licensed to Oxford Nanopore Technologies. F.J.S. received research support from Illumina, Pacific Biosciences and Oxford Nanopore Technologies. S.W.S. serves on the Scientific Advisory Council of the Lewy Body Dementia Association and the Multiple System Atrophy Coalition. S.W.S. and B.J.T. receive research support from Cerevel Therapeutics. B.J.T. holds patents on the clinical testing and therapeutic implications of the C9orf72 repeat expansion. The remaining authors declare no competing interests.

Peer review

Peer review information

Nature Methods thanks Justin Zook and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Peer reviewer reports are available. Primary Handling Editors: Hui Hua and Lei Tang, in collaboration with the Nature Methods team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended Data Fig. 1 Variant calling and methylation analysis using Napu.

Raw ONT sequencing reads are basecalled by Guppy 6.1.2, which simultaneously produces methylation tags. A diploid, de-novo phased assembly is produced using a combination of Shasta and Hapdup. These assemblies are used to call SVs with Hapdiff. Small variants are called against a reference genome with Pepper-Margin-DeepVariant. The phased alignment file generated by Margin is used to produce haplotype-resolved methylation calls. Small variants and SVs are jointly phased by Margin, producing a single harmonized vcf.

Extended Data Fig. 2 Assemblies of 14 brain tissues and 3 cell lines generated by Shasta+Hapdup.

(A) NG50 and NGA50 contiguity measured using QUAST. Sample 06_66 had the lowest contiguity due to the decreased sequencing yield. (B) Assembly length. (C) Mean assemblies QV computed using yak. (D) Contiguity of phased blocks, broken at phase switches. An increased value for HG02723 suggests an increased heterozygosity rate. Cell lines marked with asterisks.

Extended Data Fig. 3 Assembly metrics comparison against HG002 assemblies produced in Jarvis et. al (2022).

Our assemblies are highlighted in green. Flye (ONT+trio) were produced using standard ONT reads at 60x coverage and Illumina parental information; Flye (ONT UL + trio) is similar, but using ultra-long ONT extraction. HiCanu and hifiasm used 34x HiFi reads and Illumina parental sequencing. DipAsm used 34x HiFi reads and 60x Hi-C reads. Original evaluations from Jarvis et al. are shown. See Supplementary Table 5 for more detail.

Extended Data Fig. 4 TT-Mars evaluation of Hapdup and Sniffles2 calls.

SV calls from Hapdup and Sniffles2 were compared to the assemblies from the HPRC for HG002 (top), HG00733 (middle), and HG02723 (bottom) with TT-Mars. The calls were either validated by the alignment (green), not validated (orange), or couldn’t be annotated by TT-Mars (blue). We evaluated all SVs across the genome (left), as well as the subset of SVs that don’t overlap centromeres or segmental duplications larger than 10 Kbp (right).

Extended Data Fig. 5 Flagger results based on HiFi alignments to cell line CARD and HPRC-Y1 assemblies.

The y-axis of each panel indicates the unreliability percentages which are the total number of bases flagged as misassembly divided by the total assembly length and multiplied by one hundred.

Extended Data Fig. 6 Flagger results based on ONT alignments to cell line CARD and HPRC-Y1 assemblies.

The y-axis of each panel indicates the unreliability percentages which are the total number of bases flagged as misassembly divided by the total assembly length and multiplied by one hundred.

Extended Data Fig. 7 Lenient SV catalog.

Similar to Fig. 5a but including SVs close to centromeres, telomeres, or within segmental duplications were removed. Number of SVs across samples. In the left panel, SVs were annotated with three SV catalogs (the gnomAD-SV database, a long-read-based SV catalog, and the HPRC v1.0 SV catalog). SVs are matched if they have at least 10% genomic overlap. The colors highlight the maximum frequency across these catalogs, the lighter blue showing ‘rare’ SVs (with an allele frequency below 1%) in the catalogs, or unmatched. SVs may be unmatched, either because they are novel or due to the difficulties in the database comparison. The right panel shows the number of rare SVs in protein-coding genes, grouped by their impact on the gene structure.

Extended Data Fig. 8 IGV view of a 4.2 Kbp heterozygous deletion of a transcription start site and exon of RBFOX1.

The coverage histogram (dark grey) shows the drop in read coverage. The alignment of about half of the reads, labelled by strand (red/blue), support the deletion. The GENCODE track, ENCODE candidate cis-regulatory elements, and conservation tracks are shown at the bottom.

Extended Data Fig. 9 Comparison of R9 and R10 sequencing runs using three cell lines.

Benchmarks were performed similarly to those described in Figs. 2–4. ‘Indel no HP/TD’ corresponds to indels outside of homopolymers and tandem repeats. Assembly SV F1 scores were computed outside of centromeres and segmental duplications. Additional statistics are given in Supplementary Table 13.

Extended Data Fig. 10 F1-score for SV inside clusters of different sizes.

The HiFi calls for HG002 genome were used as reference, and calls within 2 kbp were clustered using single linkage clustering. The number of true positive calls in each category is shown as text. When VNTR grouping is enabled, all insertions and deletions within the same haplotype in a single VNTR are combined into a single call. A substantial portion of the reduced Sniffles2 concordance is explained by the differences in representation of SV clusters by the assembly-based and mapping-based approaches.

Supplementary information

Supplementary Information

Supplementary methods.

Reporting Summary

Peer Review File

Supplementary Tables

Supplementary Tables 1–22.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Kolmogorov, M., Billingsley, K.J., Mastoras, M. et al. Scalable Nanopore sequencing of human genomes provides a comprehensive view of haplotype-resolved variation and methylation. Nat Methods 20, 1483–1492 (2023). https://doi.org/10.1038/s41592-023-01993-x

Download citation

Received: 28 January 2023
Accepted: 04 August 2023
Published: 14 September 2023
Issue Date: October 2023
DOI: https://doi.org/10.1038/s41592-023-01993-x

This article is cited by

Plant pangenomes for crop improvement, biodiversity and evolution
- Mona Schreiber
- Murukarthick Jayakodi
- Martin Mascher
Nature Reviews Genetics (2024)
Detection of mosaic and population-level structural variants with Sniffles2
- Moritz Smolka
- Luis F. Paulin
- Fritz J. Sedlazeck
Nature Biotechnology (2024)
De novo diploid genome assembly using long noisy reads
- Fan Nie
- Peng Ni
- Jianxin Wang
Nature Communications (2024)
Tradeoffs in alignment and assembly-based methods for structural variant detection with long-read sequencing data
- Yichen Henry Liu
- Can Luo
- Xin Maizie Zhou
Nature Communications (2024)
Epigenomic insights into common human disease pathology
- Christopher G. Bell
Cellular and Molecular Life Sciences (2024)