|
Guest commentary: Bioinformatics research enables better biological information
January 2010
SHARING OPTIONS:
Until the early 1990s, biology researchers primarily studied
one or two genes at a time, often for their entire careers; now, they might
study hundreds of thousands of DNA segments in one experiment. In addition,
researchers can probe complementary non-coding parts of the genome and profile
relevant proteins and metabolites to seek new understanding and treatments of
disease. Ongoing bioinformatics research is critical to make sense of an almost
limitless supply of biomolecular information.
The National Institutes of Health (NIH) defines
bioinformatics as "research, development or application of computational tools
and approaches for expanding the use of biological, medical, behavioral or
health data, including those to acquire, store, organize, archive, analyze or
visualize such data." The NIH acknowledges the overlap with computational
biology: "the development and application of data-analytical and theoretical
methods, mathematical modeling and computational simulation techniques to the
study of biological, behavioral and social systems."
Continuing electronic advancements of our increasingly
digital world open opportunities to analyze and leverage vast amounts of
information across research boundaries. For example, although physical analog
measurements of a cell and integrated circuit are very different, processing
these measurement signals is similar once they're converted to the digital
domain. Following analog to digital conversion, the tools to generate insight
are the same: fast digital signal processing, measurement systems software,
data management, visualization and decision analysis.
With life sciences evolving so quickly, bioinformatics
research is critical to building information infrastructures that can match
that evolution, allow researchers to access data quickly and easily and develop
new informatics tools as needed. It is no longer enough to generate the best
measurement data; researchers also require the best meaning and insight from
that data.
Bioinformatics research enables new insights from platforms
and analysis. Bioinformatics tools are required to create and evolve
measurement platforms and data analysis methods to reach new biological
insight. Each new measurement can change the questions that scientists can
answer. Researchers start with a hypothesis about what they want to measure,
and they understand initial measurements will be imperfect. They consider what
mathematical model will describe the behavior being measured, what the signal
should look like, how they will implement the measurement in a fast, efficient
way, and ultimately how they will communicate their results to the worldwide
scientific community.
Oligonucleotide DNA microarrays, for example, have depended
on bioinformatics for initial development and ongoing advances in measurement
tools and analysis methods. Researchers started with a hypothesis that it could be possible to
measure gene expression patterns in a tissue sample using complementary probes
of nucleic acid polymers on a glass slide. By taking advantage of the DNA
building blocks that act like two sides of a zipper, each probe could test for
the presence of a specific DNA sequence
Initial microarray development required prior bioinformatics
work on sequencing before researchers could design the hundreds of thousands of
probes for an array.
Bioinformatics research was essential to develop the
mathematical models, software, visualizations and processes to analyze the data
because initially, no tools were available.
Bioinformatics enabled extensions of microarray platforms to
explore new layers of biology, such as comparative genomic hybridization (aCGH)
to identify multiple and missing pieces of chromosomes in cancer cells compared
with normal cells. With each new platform or modification, researchers depend
on bioinformatics for probe development, analysis methods and visualization
software.
The process of platform development from measurement to
analysis often becomes iterative, as researchers push the envelope to get a
more complete "picture" of the sample being analyzed. Experiments that yield
incomplete information often motivate efforts to improve platforms and
algorithms to achieve better-defined and more focused results. For example,
early aCGH research indicated that even outside of cancer biology, genes are
present at varying levels. This led to studies of copy number variation as a
key element of genetic structural variation across multiple diseases.
Bioinformatics research has improved platform results
independent of instrument improvements. Mass spectrometers, for example, help
identify proteins by matching the mass/charge of their constitutive peptide's
fragments against a database. New spectral clustering techniques and better
spectrum-to-peptide matching algorithms have improved peptide identification
without any change to the mass spectrometer.
Bioinformatics research enables integration of information
from multiple sources. Bioinformatics enables scientists to transcend data-type
boundaries and begin to view the cell as a system of complex interactions.
Study of the whole provides insights into emergent systems-level behavior that
isn't visible by looking at individual genes and proteins separately.
Learning how a cell responds to stimuli requires integration
of data from multiple experiments and measurement platforms. In the cascade of
biological events, proteins interact with receptors that also interact with
proteins and genes. Integration of gene expression and protein networks, for
example, can reveal pathways that potentially govern disease progression and might
help identify people who would most likely respond to a particular
therapy.
Bioinformatics is becoming more collaborative as researchers
integrate information from multiple sources and put it into a contextual view
for new knowledge and worldwide availability.
Cytoscape, a tool that encourages information sharing, is an
open-source bioinformatics software platform that enables researchers to
visualize molecular-interaction networks and integrate these interactions with
experimental data and other data from other sources, such as pathway databases
and scientific literature. Putting data into a biological context helps
increase understanding of molecular networks, interactions and pathways
involved in biological processes. Cytoscape allows users to query biological
networks to derive computational models and to view, manipulate and analyze
their data to reach biological insight.
The Cancer Genome Atlas Project to characterize molecular
alterations in cancer is another example of integrating data from multiple
sources. This collaborative effort, led by the National Cancer Institute and
the National Human Genome Research Institute, has demonstrated the feasibility
of using integrated genomic strategies.
Scientists are developing new data,
sharing it with researchers worldwide, and developing innovative bioinformatics
tools and technologies to study cancer with greater precision and efficiency.
Findings already are influencing treatment; investigators have reported that
genetic alterations in patients with glioblastoma (a form of brain cancer) are
linked with resistance to a drug that is commonly used for treatment.
Bioinformatics research is also expanding to computer
modeling to simulate and calculate, for example, gene expression over time.
Researchers are creating models with the goal of feeding data through them to
predict how a living cell will react. Further research will involve validating
such models experimentally.
New directions in bioinformatics research include synthetic
biology and visual analytics. Synthetic biology is a growing field of
bioinformatics research. The redesign of biological systems and component parts
for useful and practical purposes has many parallels to the electronics
industry. Standardized, integrated electronic parts, devices and tools have
enabled a well-developed, mature industry. Advocates of synthetic biology
similarly champion development of tools and processes that will enable
standardized, integrated biological parts and devices to create synthetic
genomes. While synthetic biology requires a revolution in tools and technology,
these approaches may address significant challenges in healthcare, energy and
the environment.
The Artemisinin Project uses synthetic biology to make safe,
effective anti-malarial medicines accessible to people in developing countries.
Representatives from academia, biotechnology, pharmaceutical and the nonprofit
sector are developing semi-synthetic artemisinin because the natural source,
tree bark, is too expensive for extensive use. Other synthetic biology efforts
involve research to generate energy-rich fuels by engineering the enzymes that
are part of the pathways that create these molecules, insert them into bacteria
and grow them on a large scale.
Visual analytics is a new, emerging field. Although all
sciences are improving the ability to collect and analyze information, new
tools are required to analyze massive, complex, incomplete and uncertain
worldwide information. IEEE recognizes this challenge and in 2006 founded the
Symposium on Visual Analytics Science and Technology. It focuses on the R&D
agenda for visual analytics developed under the leadership of the Pacific
Northwest National Laboratory to define the directions and priorities for
future R&D programs focused on visual analytics tools.
lEEE defines visual analytics as "the science of analytical
reasoning supported by highly interactive visual interfaces. People use visual
analytics tools and techniques to synthesize information into knowledge; derive
insight from massive, dynamic and often conflicting data; detect the expected
and discover the unexpected; provide timely, defensible and understandable
assessments; and communicate assessments effectively for action." This
interdisciplinary science includes statistics, mathematics, knowledge
representation, management and discovery technologies, cognitive and perceptual
sciences, decision sciences and more.
The advances in bioinformatics research in some ways
parallel history. In the 16th century, Tycho Brahe collected precise measurements
on the positions of planets. Johannes Kepler made Tycho's data more meaningful
by using it to develop his laws of planetary motion. Sir Isaac Newton extended
the value further by developing principles of physics, such as universal
gravitation and laws of motion.
While Newton's principles intuitively match everyday
experience, Albert Einstein's 20th century discoveries pushed science into the
non-intuitive realm. Now bioinformatics research is tackling data complexity
and interrelatedness that describe a new world. We don't have the answers yet,
but we are getting in touch with the right questions.
Bioinformatics research gives us new glimpses of processes
that have been going on for millions of years, the billions of molecular events
happening in our bodies that enable us to function. People who will reduce this
data to laws and principles to explain them will make great steps forward.
Darlene J.S. Solomon is chief technology officer for Agilent
Technologies in Santa Clara, Calif. She holds a B.S. degree in chemistry from
Stanford University and a Ph.D. in bioinorganic chemistry from the
Massachusetts Institute of Technology.
Back |
Home |
FAQs |
Search |
Submit News Release |
Site Map |
About Us |
Advertising |
Resources |
Contact Us |
Terms & Conditions |
Privacy Policy
|