Sequential regulatory activity prediction across chromosomes with convolutional neural networks.

  1. Jasper Snoek3
  1. 1 Calico Labs;
  2. 2 Harvard University;
  3. 3 Google Brain
  • * Corresponding author; email: drk{at}calicolabs.com
  • Abstract

    Functional genomics approaches to better model genotype-phenotype relationships have important applications toward understanding genomic function and improving human health. In particular, thousands of noncoding loci associated with diseases and physical traits lack mechanistic explanation. Here, we develop the first machine-learning system to predict cell type-specific epigenetic and transcriptional profiles in large mammalian genomes from DNA sequence alone. Using convolutional neural networks, this system identifies promoters and distal regulatory elements and synthesizes their content to make effective gene expression predictions. We show that model predictions for the influence of genomic variants on gene expression align well to causal variants underlying eQTLs in human populations and can be useful for generating mechanistic hypotheses to enable GWAS loci fine mapping.

    • Received July 17, 2017.
    • Accepted March 23, 2018.

    This manuscript is Open Access.

    This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International license), as described at http://creativecommons.org/licenses/by/4.0/.

    OPEN ACCESS ARTICLE
    ACCEPTED MANUSCRIPT

    Preprint Server