Science Talk:
After the Genome
"The Genome,
We Are Sure, Is Packed with Subtleties"
Paul Schimmel, Professor, Department of Molecular Biology
It's very exciting. None of us know what treasures lie beneath
the sequence.
There has been a huge capital investment, not only by the government
through the National Institutes of Health, the National Cancer Institute,
and the National Science Foundation, but also through private foundations,
like the American Cancer Society, the Howard Hughes Medical Institute,
the Wellcome Trust, [and] private industry, particularly on the
entrepreneurial side.
All of [these organizations] have large programs trying to understand,
at the beginning, the function of the proteins that are encoded
by the human genome. Rarely have we seen capital investment coming
from so many corners focused on one problem.
Over the next 10 years, I believe probably 90 percent of the proteins
will have an assigned function. Maybe that's too optimistic, but
it's certainly within reach.
What is much harder to come to grips with is how [to] put it all
together to make an organism. How does this all fit together? There
are approaches being used by diverse groups trying to knock out
genes and relate them to phenotypes, particularly related to embryonic
development and differentiation.
The genome, we are sure, is packed with subtletiesthe expression
on your face, body language, intuitive faculties, gestures, the
things that we do that we don't even think aboutthese are
things that we don't understand at all in a detailed sense as they
relate to the genome, but more and more we're getting the feeling
that [these subtleties] are genetically encoded. They are part of
this array that we just don't understand.
That's where the advances need to be made. What are these genes?
Even if you know the proteins, how do they work to generate a highly
sophisticated organism?
I think that we will have all the pieces to the jigsaw puzzle figured
out ("this must be part of a lake and this must be part of a forest
over here, and this must be part of a house over here"). Putting
them together to get the whole picture is very difficult.
How long that will take is harder. Will it be in the next 100 years?
That's a good question. I do believe the end result will be that
humans will have a sense of how you go from a puffer fish to a mouse
to a humanorganisms with a similar numbers of genes and many
of the same genes, but obviously [leading to] very different outcomes.
|
Functional
Analysis and Genetic Diversity in Yeast and Malaria
Elizabeth Winzeler, Assistant Professor, Department of Cell
Biology
One of the big areas of investigation in the post-genome era will
be assigning function to the genes that are predicted in the genome
project. One of the techniques that I am most familiar with is expression
profiling. As genome sequences become available, it's easy to create
arrays [of various nucleotide sequences] that can then interrogate
every gene in the genome. Then, by hybridizing the RNA from different
tissues or disease states or different stages of an organism's life
cycle, you can start determining when a gene is probably transcriptionally
active, and that actually gives you quite a bit of information about
the potential functional role for that gene.
This can really go a long way towards narrowing down the list of
potentially interesting targets that you might want to concentrate
on if you are involved in the drug discovery process.
I started working on post-genome functional analysis in [the budding
yeast] Saccharomyces in 1996, right after the genome sequence
was released, and I became involved in a number of different projectsdeveloping
tools for expression profiling as well as creating knockout strains
for every gene in the yeast genome. I'm still doing a little bit
of yeast research. For example, we [also] recently used oligonucleotide
arrays to map all of the chromosomal origins of DNA replicationthere
are about 400 in yeastby isolating DNA fractions that were
enriched for origin activity and then hybridizing the fractions
to high density oligonucleotide arrays.
We have also used oligonucleotide arrays to study genetic diversity
in yeast. Usually, only one strain or individual representative
from a particular organism is sequenced. By comparing the patterns
which result when genomic DNA is hybridized to arrays, we can find
out how closely related different strains are. I've looked at 10
or 11 different yeast isolates. I think this technology is going
to be very interesting to population geneticists in the future.
You can get a much more descriptive look at the genome, and you
can find regions of the genome that are evolving at faster rates.
In the past couple years, I've been working on applying this type
of technology to organisms that are more difficult to work with
and are more relevant to human health. The malaria parasite has
a genome size that is about two times as large as yeast. The sequence
has been done for about six months, and the annotations should become
available [soon]. The parasite also has both haploid and diploid
phases, like Saccharomyces, but has a complex life cycle
involving both humans and mosquitoes, is difficult to maintain in
culture, and has gene function that cannot be studied using classical
forward genetics.
Malaria is a major health problem worldwide. There are 300 million
cases a year, and there has been a resurgence in the number of cases
because of drug resistance. Many inexpensive anti-malarials are
no longer effective.
While genetic studies are difficult, it's relatively easy to get
RNA from all the different stages of the parasite's lifecycle and
this offers us new ways to study gene function in the parasite.
In the past year, I've designed an oligonucleotide array that contains
about 500,000 probes to two different Plasmodium genomes
[a mouse strain, and the human strain]. The array we designed at
TSRI arrived a month or two ago, and what we are doing now is collecting
RNA samples from many different conditions. We're exposing parasites
to drugs to identify new genes involved in [resistance] pathways.
We're hybridizing genomic DNA in order to characterize genetic diversity
in different field isolates and find out how similar or different
the isolates are. Eventually, we'd like to take these tools into
the field and map the spread of drug resistance.
If you start doing longitudinal studies after you introduce a new
drug, you might be able to identify the drug targets or the mechanisms
of resistance, because we predict we will see pockets of variability
developing within the genome over time that are associated with
the drug's target. This may lead to new knowledge about the mechanisms
of drug resistance. If you can start finding the mutations that
are associated with drug resistance, then that tells you how to
treat patients in the field.
|
"The
Main Reason to Sequence The Genome Was to Facilitate Positional
Cloning"
Bruce Beutler, Professor, Department of Immunology
It will take a very long time to close the phenotype gap. The fact
is, there are about 34,000 genes, give or take a few thousand. If
you add up all the phenotypes known from mutations in humans and
from knockouts in mice, you come up with about 5,000. So something
like six out of seven genes don't have an essential function attached
to them yet.
The way that people go about identifying phenotypes now is to mutate
every gene in the genome and keep certain phenotypes of interest
to them under surveillance. In this way, in principle, one can find
every gene that is required for a particular function. Once you
have a phenotype, then comes the problem of finding the particular
mutation that caused it. That's done by positional cloning. That's
where sequencing the genome has been particularly useful.
In fact, the main reason to sequence the genome was to facilitate
positional cloning. I think a lot of people don't realize that.
It's a rapid way to find the function of genes.
In the old days, when you positionally cloned something, you first
had to map the mutation. By following meiosis, you would confine
the mutation to a point between two markers on the chromosomehopefully
a very small area, less than a million base pairs long. Second,
you would have to clone all the DNA from end-to-end across that
area. Third, you would have to find all the genes that were candidates
in that area. And finally, you would have to find the mutation.
The sequencing of the genome has made it so that you don't have
to do steps two and three anymore. You no longer have to clone all
the DNA across the area, because the sequence is known. And you
no longer have to look for genes because, in principle, they've
all been found and annotated. Now the limiting factor in finding
mutations is doing the genetic mapping, and that might take about
a year. Then finding the gene, in theory, should be trivial. It
used to be that the process of cloning the critical region and identifying
candidates would, by themselves, take several years. So things have
gotten a lot easier.
|
"You
Can't Get Too Hung Up On Any One Protein"
Ian Wilson, Professor, Department of Molecular Biology
The overall plan for the Joint Center for Structural Genomics is
to try to produce as many new structures as possible. By "new" we
mean ones for which you can't predict the fold from the sequence.
However, a lot of these will turn out to be similar structures to
others. For example, we have recently worked on a protein that is
less than 15 percent identical to anything in the Potein Data Base,
and we found out its structure is [almost] identical to another
protein.
To start off, we've been concentrating on one organism, Themotoga
maritima to see how much of it we can clone, express, purify,
crystallize, collect synchrotron data, determine the structure,
and deposit in the databank. In collaboration with Scott Lesley
of GNF, we're trying to see how many proteins from that one organism
we can pass through the various steps of the pipeline that are required
[for] high-throughput structural genomics.
The other organism that we're currently working on is C.elegans.
These are likely to be much more difficult proteins to express.
They're more complex, but they're more representative of eukaryotic
organisms, such as mouse and human [the specific organism]. Here,
we are concentrating on proteins that are likely to have novel folds
or at least have folds that cannot be predicted at present.
For proteins that we are really interested in, we can also look
for homologues and orthologues in other organisms. But in structural
genomics, you can't get too hung up on any one protein, because
it's a numbers game. The goal, which the NIH suggests that we should
be able to achieve, is, in year four [of the project], to produce
100 to 200 structures per year. That comes down to nearly one every
working day. And within four to six weeks from the time we have
finished refining the structure, we have to deposit them into the
Protein Data Bank.
That's what we're working towards and that's what we're trying
to achieve. And since everything is deposited in the public domain,
that information is accessible to everybody. Thus, the structures
produced by structural genomics should enable the work of biologists,
molecular biologists, and cell biologists worldwide.
|
Go back to News & Views Index
|
|