Researchers William Balch (standing) and Chao Wang led the new Cell Reports study
Researchers William Balch (standing) and Chao Wang led the new Cell Reports study

Want to know which mutations are really risky? New Scripps Research method may help

Study bridges basic science and techniques to advance therapeutics.

August 28, 2018


“We’re all mutants,” explains William Balch, PhD, a professor at Scripps Research. Scientists have long assumed that a person who appeared healthy had a relatively “normal” genome. “But we’re now realizing that variation is rampant and makes us who we are—uniquely,” he says. 

That’s why Balch and his lab have designed a way to look at a strand of DNA to find out what trait it codes for, or what deleterious mutation it might harbor. “We need to understand how these variants contribute to a trait—like the color of your hair—or, importantly, the harmful ones, a disease,” says Balch.

Until recently, there was no way to compare variants across large groups of people. Now, with the technology to sequence the human genome getting cheaper and cheaper, and with the launch of the national All of Us Research Program, Balch thinks we finally have enough data to determine how variants shape the population and each of us as individuals—and how variants could inform drug design.

Balch and study first author Chao Wang, PhD, a postdoctoral associate at Scripps Research, describe their tool for studying human genomes in a paper out recently in the journal Cell Reports. Using a statistical method called Gaussian-process based machine learning, scientists can now take a gene, any gene, and using only a few of the many variants found in the population, discover the many shapes of the protein it encodes for—and the many functions of those shapes.

“This tells us how the entire protein is put together from a dynamic and biologically relevant perspective—and what happens when there is a defect,” Balch says.

As a proof of concept, the team looked at genetic variants in people with cystic fibrosis. The sparse collection of variants the researchers focused on sit on different spots on the DNA code. When this code is read by the cell to make a protein, the variants code for specific amino acids—the building blocks of the protein.

Now, with their new ‘variation spatial profiling’ method that captures sequence meaning through its functions, the DNA code takes on a new physical structure—a phenotype landscape, as Balch sees it, that can be mined.

On that landscape sit the data points of collected variants that tell something about the protein’s many roles in the worldwide population. These data points are like boreholes delineated on a physical map that, when analyzed together, help the petroleum industry determine where to drill for oil. “We’ve adapted that to biology,” Balch says.

On the protein, these trusted variant locations told Wang and Balch where the protein was vulnerable to mutations that cause cystic fibrosis. Using a new biological principle, they termed ‘spatial covariance,’ by analogy to its use in physics, geology and machine learning, their analysis then predicted the likelihood where uncharacterized mutations found in the rest of the protein could also cause disease.

“From a sparse amount of information, we can measure the consequences of variation across the entire protein,” says Balch. A second test showed the method could also assess variants linked to Alzheimer’s disease to predict onset. The approach is universal in design and can be applied to many challenging questions in understanding human disease, from neurodegeneration to cancer.

The new method could prove useful in assessing future therapies as Wang and Balch showed how spatial covariance can predict the response of variation in the population to a therapeutic. Many drugs fail in clinical testing because they only help a fraction of people and are developed with disease models with poor relevance to human biology. A better understanding of human genetic variation in the early stages of drug development may now reveal how different pathways in the population will respond to drug treatment, and how drugs can be tailored for personalized medicine.

“We are using evolution’s rules to understand ‘you’ by looking at how the rules are played out in ‘all of us,’” says Balch.

The work in this study is part of a growing effort to use machine learning to inform biology and medicine. For example, Oxford’s Mihaela van der Schaar, PhD, is harnessing a different form of machine learning using clinical data directly to make recommendations for patient care. And Caltech’s Francis Arnold, PhD, is pioneering use of Gaussian processing to engineer proteins for practical industrial processes. 

Balch and Wang are now taking advantage of rapid advances in artificial intelligence to understand how variants interact across the entire genome to predict and manage human biology.

“This is just step one as we recognize the approach will enable a deeper understanding of central dogma [DNA to RNA to protein] in a new spatial framework, its role in natural selection and—perhaps—our origins,” Balch says.

The study, “Bridging Genomics to Phenomics at Atomic Resolution through Variation Spatial Processing,” was supported by the National Institutes of Health (grants HL095524, DK051870 and AG049665), the Tobacco-Related Disease Research Program (grant TRDRP 23RT-0012) and by Cystic Fibrosis Foundation Therapeutics.


For more information, contact press@scripps.edu See More News