GENET archive


9-Misc: Genomics - data rich but information poor

genet-news mailing list

-------------------------------- GENET-news --------------------------------

TITLE:  Data glut
SOURCE: The Boston Globe, USA, by John Dodge
DATE:   Feb 24, 2003

------------------ archive: ------------------

Data glut

As gene research yields information counted in terabytes, researchers
struggle to visualize and process it while technology businesses scramble
to profit from it.

The sequencing of the human genome over the past decade was supposed to
help revive the flagging fortunes of the information technology industry.

To some extent, the life sciences market, which relies heavily on
computational biology, has lived up to the promise. Research centers in
both the private and public sectors placed orders last year for thousands
of servers and storage systems capable of handling terabytes of the new
genomic, proteomic, drug, and health care data generated hourly.

That's the good news. The bad news is that, during the past year,
companies that develop software tools for managing and exploiting all of
the new data struggled mightily. Red ink, consolidation, and layoffs were
the norm. Welcome to the tumultuous world of "bioinformatics," the
underachieving wonder child of a genomics revolution-in-waiting.

Bioinformatics is where computing intersects with biotechnology. The
field is made up of software, database, visualization, and other
companies, often funded by pharmaceutical giants, whose technology
handles and analyzes genomic bits and bytes along with other biological
and clinical data. (The field is sometimes known simply as informatics.)
But selling life sciences software, services, and information has proven
tougher than anticipated, so much so that many purveyors have sought to
transform themselves into drug companies, betting that tangible products
will be their salvation.

Incyte Genomics Inc. closed operations and pared its work force by 37
percent in November as it lessened its reliance on technology and moved
deeper into drug discovery. Celera Genomics Group went through a similar
reorganization in June. Others like InforMax and NetGenics were lucky
enough to find merger partners in Invitrogen and LION Biosciences AG,
respectively. The late Doubletwist Inc., a once hot start-up, closed its
doors last spring.

"Pharmaceutical companies are realizing they are not making as much
progress as they thought investing in genomics, proteomics, and
informatics," says Phillips Kuhl, president of the Cambridge Healthtech
Institute research firm. "They're not getting the returns on investment."

Investment has been shifted to obtaining promising drug candidates from
which pharmaceutical companies, desperate to replenish depleted
pipelines, see a quicker payoff. Johnson & Johnson just agreed to pay
$2.4 billion in cash for the biopharmaceutical company Scios Inc.
Pharmacia Corp.'s powerful lineup of cancer drugs lured Pfizer Inc. into
a merger. Bristol Myers Squibb Co. has paid dearly for its alliance with
ImClone Systems Inc.

"You see very large sums of money being paid for later-stage assets
[drugs in development]," says David Block, chief operating officer at
Celera Genomics. "Big pharmas are more cautious in spending on informatics."

Few dispute the long-term potential of mining genomic data to attack the
root causes of disease, but the commercial glow from sequencing the human
genome has dimmed. Now, the intense pressure to come up with revenue-
producing drugs has spread from big pharmaceutical companies to the
smaller companies and the once-promising start-ups whose fortunes were to
be made selling technology to bring genome-derived drugs to market
quickly and inexpensively.

"We've got to figure out ways to reduce time and cost of drug
development," says Eric Neumann, vice president of bioinformatics at
Beyond Genomics Inc. in Waltham. "Informatics is the key. The question
is: How do you shrink-wrap great insight into commercial systems?" The
company is advancing the notion of systems biology, which investigates
entire biological systems instead of just individual molecules and cells.

Another barrier is the advances made in genomic and proteomic research
themselves. Researchers have found that the more they know, the more they
realize they don't know.

"We are in a data-rich environment, but the fact is we are information
poor," says Peter Sorger, an associate professor of biology at MIT and
co-chairman of the school's budding systems biology initiative. "You look
at biological systems with much more complexity than before.
Bioinformatics has concentrated entirely on sequence information, but
it's only a tiny piece of the puzzle."

The myth that drug discovery is on track to becoming primarily
computational hasn't helped either. "It sounds wonderful if you can do
everything on computers," Neumann says. "You don't have to pay as much
[as when conducting real lab experiments]. We really have a poor
understanding of what a gene actually does and where and when it should
do it. You can understand the entire genome and [still] understand less
than 1 percent about what is going on in a cell."

Celera Genomics, which used its now-legendary "shotgun approach" for
sequencing the human genome several chunks at a time, changed gears last
year to focus on drug discovery as it became clear that sales of
sequencing information alone would not sustain the company. Marketing for
the Celera Discovery System, a Web-based source of public and private
biological data, was transferred to sister company Applied Biosystems in
June. Celera trimmed 132 jobs as a result, while Applied Biosystems later
trimmed 400.

Companies today have to offer a broad spectrum of integrated biological,
chemical, and clinical information, not just novel sequencing data, says
Tony Kerlavagh, senior director of bioinformatics applications at Celera.
"Research scientists and bioinformatics groups want complete unencumbered
access to every bit of information."

As a customer for such products, Beyond Genomics's Neumann concurs.
"There's been a shift from just data to knowledge extraction," he says.

Compounding Celera's problems, its exclusivity for information about the
human genome lasted only a short time as sequencing information moved
into the public domain.

"There was very little information that could not be extracted from the
public databases. The six- to nine-month jump on it Celera had was not
enough to make a big difference," says Sorger, whose bioengineering lab
produces a terabyte of data in a typical month. In the future, Sorger
says, personalized medicine could mean each individual's electronic
medical records will amount to several terabytes of data.

The economy, of course, has depresssed technology buying. But more than
that, off-the-shelf bioinformatics are often not what the doctor ordered.

"Everybody says they have a total information package and none of them
actually do," says William S. Hayes, a bioinformatics scientist at
AstraZeneca Group's research and development site in Waltham.
"Applications are cobbled together from incompatible systems. There's so
many mergers taking place [that] it takes quite a while for products from
smaller companies to get integrated, if they ever do."

The irony is that AstraZeneca would rather buy off-the-shelf software
instead of developing it internally. "Anything we can buy costs a lot
less than anything we can develop," Hayes says. "Some say if we don't
develop it in-house, we don't have an advantage. That's ridiculous. The
only times we do internal [development] is when we can't find something

The problem is twofold: The bioinformatics industry often fails to give
researchers what they want and, when it does, incompatibilities between
databases, tools, and whatever else is in the picture can be overwhelming.

Sorger and his MIT colleagues have used many informatics applications
off-the-shelf only to shortly afterward put them back. His biggest
complaint is that they tend to be scientifically lacking.

"We bought a lot of software and noticed the underlying science was
simplistic" he says. "So [researchers] would choose to use buggy public
software instead."

When the software doesn't work, scientists conducting experiments in both
commercial and academic labs simply squirrel away their findings in a
paper notebook or Excel spreadsheet, says Sorger.

"Think about experimental science," he says. "You're trying to adapt your
experiment to Mother Nature and it's extremely difficult. Once it's
working and your database is not set up, you ignore the database. It's
the most demoralizing part of all of this. After two to three years of
installing large database systems, people ignore them because they don't
really help solve the problems they have with their experiments.''

So it's not uncommon for valuable data to be sitting in Excel
spreadsheets, accessible only by a handful of people if not a single
researcher. "You end up with 5,000 Excel spreadsheets instead of one
consistent data source," adds Sorger.

These problems don't necessarily preclude a healthy informatics industry
at some point, according to Hayes and the others. He sees the XML
programming language as a positive step toward reducing incompatibilities
and the ability to easily disseminate data.

"XML is a start," he said. "If we can transfer the data, we could deploy
new [biological] algorithms very quickly." A standards body called "IC3"
was formed in 2001 to facilitate universal data exchange in the life
sciences, though it's not always easy to reach consensus. Progress comes

Despite the optimism that bioinformatics will prosper some day, the
numbers describe the current situation. Significant bioinformatics deals,
from mergers to alliances, last year dropped 41 percent to 194, from 331
in 2001, according to research from Cambridge HealthTech.

"The role of informatics is going to grow astronomically, but the kind of
things that have to be produced are not clear," says Neumann. "Nobody has
a clear understanding of the tools that will be needed. We have hunches."

John Dodge is executive editor of BioIT World, a monthly IDG publication
about technology in the life sciences. He can be reached at