• Research Brief
  • Genetic Networks

Disease-causing mutations are discovered in unexplored regions of the genome

Feb 11 / 16

Transformational new possibilities are opening up for medicine. The human splicing code reveals unexpected insights into the genetic origins of diseases such as cancers, spinal muscular atrophy and autism.


This study is the first serious attempt to decode the portions of the human genome outside the genes that are typically considered in genetic disease studies. Combining their pioneering work on deep learning with novel techniques in genetics, the researchers seek out mutations that cause changes in gene splicing.


Since the genome was sequenced in 2003, scientists, engineers and doctors have struggled to understand which DNA mutations actually cause disease. Most existing methods examine mutations in exons, the segments of DNA that encode proteins; however, scientists have found that many diseases cannot be explained by these mutations. Other sections of DNA, the introns, encode instructions for splicing, or how to cut and paste exons together. is process determines which proteins will be produced. When mutations alter splicing, genes may produce no protein, the wrong one or some other problem, which could lead to disease. 

To find mutations in these less explored segments, recent approaches such as genome wide association studies take disease data and compare the mutations of patients to those of healthy individuals, seeking out patterns. However, these studies do not explain why a particular mutation is problematic and causes disease.

This study seeks to provide that explanation by taking a completely different approach. It trains computers to mimic how the cell directs splicing by detecting patterns within DNA sequences – the “splicing code”. It searches for mutations deep within introns and ranks them according to how much they change splicing, and thus their likelihood to cause disease. 


Deep learning can detect mutations in a broader region of the human genome than previously explored. Deep learning can make sense of incredibly complex relationships, such as those found in living systems in biology and medicine. e researchers successfully used the latest deep learning methods to train computers to detect tens of thousands of mutations that cause changes in gene splicing, and potentially lead to disease. e model was able to search for mutations deep within introns, which are outside of previously explored DNA segments that code for protein.

Many of the genetic determinants found for autism, colon cancer and spinal muscular atrophy were unexpected. e model was trained to predict how the mutations detected would affect gene splicing. is allowed it to accurately assess and rank the likelihood that the mutations would cause disease. For example, it correctly predicted 94 per cent of the genetic origins of well-studied diseases such as spinal muscular atrophy, a leading genetic cause of infant mortality, and colon (colorectal) cancer. More importantly, it made accurate predictions for mutations that had never been seen before. Using DNA sequences from children with autism, the model also identified 39 new genes that could be implicated in autism spectrum disorder, a 40 per cent increase from about 100 previously known autism genes.


The research team used deep learning computer algorithms to build and train a system to read the genome: to scan DNA sequences, read the genetic instructions on how to splice together the exons that code for proteins, and determine which proteins will be produced. 

The model was used to search for and classify the mutations that cause splicing to go wrong, both within and outside of the protein coding exons. It ranked the mutations according to how much they change splicing, thus predicting their likelihood of causing well-studied disease like colon cancer and spinal muscular atrophy.

The researchers then applied their method to autism spectrum disorder, a complex genetic condition. They compared mutations discovered in the whole genome sequences of five children with autism, but not in controls. First they used the traditional approach of studying protein-coding exons and then compared the results with those from their deep learning ranking system for mutations.

After two to four weeks, the second 15-day treatment arm began with 18 out of the 24 men. ose who had received the placebo previously now took the valproate, and vice-versa. Training and testing were conducted in a similar fashion but with different pitch classes and first names.


The contribution of this study to a better understanding of how mutations alter gene expression is an enormous step in advancing whole genome precision, or personalized, medicine. Capable of determining the effects of common, rare and even spontaneous mutations, this new method could be used to understand the genetic bases for a wide array of diseases and disorders beyond autism, colon cancer and spinal muscular atrophy. 


University of Toronto: Brendan Frey (CIFAR Senior Fellow), Timothy Hughes (CIFAR Senior Fellow), Stephen Scherer (CIFAR Senior Fellow), Hui Xiong, Babak Alipanahi, Leo Lee, Hannes Bretschneider, Ben Blencowe.

Microsoft Research: Nebojsa Jojic


H. Y. Xiong et al., “e human splicing code reveals new insights into the genetic determinants of disease.” Science, 347:6218 (2015).

Read the full Research Brief