It’s a numbers game: Is it an industry of biology and chemistry or ones and zeros?
The learning curve for me continues to be steep, but one unexpected reality in the industry that has enthralled me from the very beginning is the vital and ever-increasing role played by computing and informatics providers. The work being done by folks such as Nallatech (see this month’s guest commentary by Dr. Malachy Devlin), The European Bioinformatics Institute, the Institute for Systems Biology, Silicon Graphics, IBM and countless other public and private organizations to provide useful data sets, software tools and hardware has created the vibrant, in silico world of the drug discovery market.
The scope of in silico work being conducted runs the gamut from simply capturing, sorting and screening data, to robust operations such as the Human Proteome Folding Project and the so-called “Blue Brain” project headed by Henry Markram at Ecole Polytechnique Federale de Lausanne (EPFL), which has the lofty goal of creating a detailed computer model of the function of the neocortex. The complexity of the EPFL project is astounding. The IBM supercomputer employed by Markram and his team to crunch their data has a maximum computing power of 22.8 teraflops. That’s a mere 22.8 trillion floating-point operations per second. If the average human has a difficult time imaging the enormity of the number 1 billion, the idea of 22.8 trillion calculations per second should be enough to cause a cerebral meltdown and for me to ditch my anachronistic, romantic notion of drug discovery.
Yet it makes me wonder if, perhaps, the industry could also be headed for a data meltdown. As scientists imagine more exquisite algorithms to run at ever-faster computing speeds, the volumes of data being generated is enormous. As Devlin points out in the introduction of this month’s guest column, “while the raw computational power as predicted by ‘Moore’s Law’ has led to the number of transistors that can be integrated doubling every 18 months, the genomic data at GenBank is doubling every 6 months.”
Does this mean that the sheer amount of data available will soon outstrip computing power advances? Well, that would be hard to argue, since a good portion of the work done today on a wide range of data sets has been focused on collecting data from older published resources and working it into a coherent data set for scientists to use.
But while data is being either created or collected at break-neck speed and scientists are giddily imaging how they can use it, there remain a couple of stumbling blocks.
First, access to various data sets and the hardware and software tools needed isn’t cheap and that puts it out of reach for many smaller companies and their researchers. There is hope for these smaller organizations, however. As Christopher Hogue, principal investigator for the Blueprint Initiative points out: “Often, similar databases and bioinformatics packages are freely available through academic or government institutions.”
Which brings me to the second potential stumbling block, the data itself. Collecting data from thousands upon thousands of sources is no small task in itself. But combine that with needing to convert and store the data in a usable format and determining, somehow, that every single piece of data is both correct and valid and suddenly what seemed like a fun trip to the library is now like a climb up Mount Everest. And it doesn’t take someone capable of imaging 22.8 trillion to see how even one small error could corrupt the data, and likewise turn your in silico experiment into a massive waste of time.
The good news here is that there are many, many data sets available that have been assembled with meticulous detail by private and public organizations alike, so when you need a source of good, “clean” data, you should be able to find what you need with a little homework.
After all this, though, in silico modeling is only one part of the process. As Gregory Barnick, general manager of informatics at Bio-Rad points out, speaking specifically about toxicity testing “in silico work is only meant to narrow the field and is not meant to replace in vivo or in vitro work.”