![]()
|
|
|
All about the algorithms
07-11-2012
SHARING OPTIONS:
CAMBRIDGE, Mass.—Researchers from the Massachusetts
Institute of Technology (MIT) and Harvard University have described a new
algorithm that offers a significant reduction in the amount of time required to
locate a particular gene sequence in a genome database. Not only that, but the
algorithm offers greater speedup the more genomes it’s searching. The
development has the potential to offer some relief, given that genome
sequencing is becoming a staple in research, with a full-genome sequencing now
able to be completed in a couple of weeks. Though the rate at which genomes can
be sequenced has been doubling roughly every four months since 2002, the
computing power for such sequencing has only been doubling roughly every 18
months.
“You have all this data, and clearly, if you want to store
it, what people would naturally do is compress it,” Bonnie Berger, a professor
of applied math and computer science at MIT and senior author on the paper,
said in a press release. “The problem is that eventually you have to look at
it, so you have to decompress it to look at it. But our insight is that if you
compress the data in the right way, then you can do your analysis directly on
the compressed data. And that increases the speed while maintaining the
accuracy of the analyses.”
The algorithm takes advantage of the fact that there is
often extensive overlap in the genomes of closely related species, and even
some overlap in distantly related ones. Berger, Michael Baym, Ph.D., and Po-Ru
Loh, her former and current grad students respectively, developed a way to
mathematically represent different species’ genomes so that overlapping data is
only stored once. This allows searches of multiple genomes to focus on the
differences between the genomes and save time.
When the team experimented on a database of 36 yeast
genomes, they compared their new algorithm to BLAST, the Basic Local Alignment
Search Tool, which is one of the most commonly used genomic-search algorithms
in biology. The experiment saw the new algorithm performing twice as fast as
BLAST in a search of 10 genomes, and four times as fast in a search of all 36
genomes.
“If I want to run a computation on my genome, it takes a
certain amount of time,” said Baym in a press release. “If I then want to run
the same computation on your genome, the fact that we’re so similar means that
I’ve already done most of the work.”
The new algorithm is expected to have potential in a variety
of applications in which researchers are looking to identify similar genomic
sequences, such as the identification of microbes. It could aid researchers in
determining the causes of infections, to characterize “microbiomes,” to
characterize microbes in fertile or infertile soil and possibly even in the
study of physical evidence in forensics.
Berger, Baym and Loh are now working to extend their
technique to also work with information on proteins and RNA sequences.
SOURCE:
MIT press release Code: E07101201 Back |
|
||
|
Home |
FAQs |
Search |
Submit News Release |
Site Map |
About Us |
Advertising |
Resources |
Contact Us |
Terms & Conditions |
Privacy Policy
|