Finding significant DNA mutations
NEW HAVEN, Conn.—A Yale University-led team of researchers has delved deeply into previously underexplored regions of the human genome, resulting in the identification of dozens of cancer triggers hidden among sections of non-coding DNA that vary little within the population.
The study was based upon the work of two major ongoing genomic projects. The first is the 1000 Genomes Project, an international effort to catalogue and sequence the genomes of at least 1,000 individuals from around the world. This data provided the researchers insight into regions of DNA that vary little even across a diverse population, suggesting their importance to human health, which are called conserved regions.
“The fact that these regions of DNA vary little within the population tells us that they are critically important, and maybe even functional in an evolutionary sense,” says Dr. Mark Gerstein, the Albert L. Williams Professor of Biomedical Informatics at Yale.
The second project that formed the basis of the Yale-led study is the Encyclopedia of DNA Elements (ENCODE) Project, which is a public research consortium that the National Human Genome Research Institute launched in 2003 to begin cataloging all functional elements of the human genome. The project has found that much of the human genome consists of DNA that does not code for proteins, but rather influences other genes, called non-coding DNA.
Several of the researchers on these projects believed the two investigations had an obvious intellectual connection. Combining the datasets allowed the Yale-led team to explore non-coding DNA elements from the ENCODE project and focus on those that are identified by the 1000 Genomes Project data as highly conserved.
Starting from these married datasets, the team contrasted the data with mutations in tumor samples from nearly 100 patients with breast or prostate cancer, identifying dozens of mutations in these samples that occurred in areas of DNA that ordinarily vary little.
The researchers identified natural patterns of variation and identified regions resistant to mutation, allowing them to rank disease mutations in resistant regions of DNA in order of significance. For example, mutations that occur in resistant regions of DNA would rank more highly than those in ultra-sensitive areas where variation is more common. Mutations in close proximity to major regulatory-network hubs also rank highly. Identifying and ranking these mutations will allow researchers to identify the few DNA anomalies that are the most significant and find genes associated with those regions of DNA.
Based on this system, the team was able to construct a software tool to help identify high- ranking mutations for further examination. The team used this tool to screen about 100 cancer genomes and identified many non-coding drivers for these diseases. They followed up with experimental work to validate the importance of these drivers.
“For example, with prostate cancer we identified a mutation that recurs in an independent cohort of patients,” says Gerstein, who is the study’s co-senior author. “We then were able to connect it to its downstream gene, which was perturbed in expression in cancers.”
“It’s very satisfying,” says Gerstein. “We had a basic-science idea going into it, but we also had the opportunity to translate it into a practical software tool. With the tool we created, you give it the mutations you want to examine, and it ranks them so that researchers can quickly identify the few that are the most significant.”
The research was funded by the U.S. National Institutes of Health and the Williams Professorship Funds. The paper appears in the Oct. 4 issue of Science. Ekta Khurana, an associate research scientist in Gerstein’s lab, is a first author of the study; other Yale authors include Yao Fu, Xinmeng Jasmine Mu, Lucas Lochovsky, Jieming Chen, Arif Harmanci, Alexej Abyzov, Suganthi Balasubramanian, Declan Clarke, Yong Kong, Cristina Sisu and Michael Wilson.