Drosophila White Paper 2003
August 13, 2003
Explanatory
Note: The first Drosophila White
Paper was written in 1999.
Revisions to this document were made in 2000 and the final version was
published as the Drosophila White Paper 2001
http://flybase.bio.indiana.edu/docs/news/announcements/drosboard/Whitepaper2001.html
In 2003,
the Drosophila Board of Directors voted to write a new White Paper to take
stock of the progress made in the preceding two years and to assess current and
future needs of the Drosophila research community. A draft prepared by the Board was circulated to the
community-at-large through FlyBase and directed email. With the input of the community
included, a revised version was submitted for formal approval by the Drosophila
Board. The final version will be
provided to the Trans-NIH Genomics Resources Group, a committee including
representatives of the various NIH institutes that oversees broad resource and
infrastructure initiatives for genome research. It will also be available as a resource to other
agencies and interested parties to inform them of recent progress and
priorities of the Drosophila research community. This Drosophila White Paper 2003 and a summary of
community input are posted on FlyBase at: http://flybase.bio.indiana.edu/docs/news/announcements/drosboard/
The
contributions of Drosophila as a model system for understanding basic
biological mechanisms are even more evident today than in the previous
years. This is in large part due
to the advances in genomic technologies, which when combined with the powerful
genetic manipulations possible with Drosophila, allow researchers to dissect
complex biological problems that could not have been successfully approached in
the past. In addition, the
translations of Drosophila research to other arenas, including studies of human
population dynamics, development and disease mechanisms continue to yield
impressive successes. To name a
few recent examples, we note how the signaling pathway for dorsal/ventral
pattern formation in Drosophila embryos has quite unexpectedly provided a
crucial paradigm for signaling in human inflammation and innate immunity.
Counterparts for over 70% of human disease genes are found in Drosophila and
many of these fly genes are being extensively studied. Hence the number of examples showing
that Drosophila can serve as an excellent disease model continues to
increase. Indeed, a growing number of researchers
studying human biology are collaborating with Drosophila researchers to apply
the powerful genetics of Drosophila to understand the mechanisms of Huntington
disease, Parkinson disease,
spinocerebellar ataxia, early
onset Alzheimer disease, and other genetic disorders. Overall, more than 60% of
human genes have homologs in Drosophila.
Thus most cellular and developmental processes are functionally
conserved. Key insights have been
gained in recent years into the genetic and cellular mechanisms of processes
such as neurodegeneration, vasculogenesis, stem cell determination, cell and
tissue polarity, signal transduction, growth control and organogenesis. Models proposed for the function of
many newly described mammalian proteins are based on the mutant phenotypes
associated with the Drosophila homologues and Drosophila is now widely used for
in vivo functional
analyses that are difficult to carry out in vertebrates. The availability of Drosophila genomic
sequences and its integration into well-studied biology of flies have provided
a boost to the power of comparative genomics. A recent example is how the identification of genes that
play a role in malaria transmission is relying heavily on comparisons of the
genomes of Anopheles and Drosophila.
Studies of
Drosophila have provided fertile testing ground for new approaches in genomic
research. Continued and even
greater success relies on the maintenance or expansion of key projects and
facilities and on the development of new technologies. To this end, the Drosophila research
community has identified current bottlenecks to rapid progress and defined its
most critical priorities for the next two years. We begin by first noting recent achievements that have been
most important for the community-at-large:
á
High
quality finishing sequence of the euchromatin of Drosophila melanogaster
á
Reannotation
of the euchromatin (Release 3.1)
á
An
expanding library of complete cDNAs
á
An
expanding collection of mutant strains with transposable element insertions in
newly annotated genes
á
Progress
toward the goal of complete coverage of the genome with chromosomal
deficiencies
á
Progress on
a heterochromatin genome project
á
Development
of RNA-interference technologies for cultured cells and flies
á
Transcriptional
profiling of the complete life cycle and many tissue types
á
Database
development to integrate genome and genetic resources for Drosophila
melanogaster
á
Sequence
and partial assembly of the euchromatin of Drosophila pseudoobscura
There is
overwhelming agreement that the following three resources must be supported to
serve the entire community of Drosophila researchers.
1)
A
well-funded stock center with a carrying capacity of at least 20,000 strains. This number takes into account current
efforts to accumulate at least one mutant allele for every gene, deficiencies
that provide extensive coverage of the genome, and the lines being generated by
the ongoing gene disruption projects.
The Bloomington Stock Center, which is serving the community extremely
well, can accommodate this immediate goal if it is provided adequate funding.
It is important to note that the community anticipates a need to
house 10,000 - 20,000 additional strains in the near future. This number includes having at least
two different mutant alleles of each gene, a refined set of molecularly mapped
deficiencies and duplications (particularly needed for mapping X chromosomal
genes), and sets of widely used transgenic marker strains for inducible gene
expression or protein trapping.
Given current ongoing efforts to generate these strains,
well-characterized collections should be available to the community in three to
five years. This expansion will
require either a significant expansion of the physical facilities and personnel
at the Bloomington Stock Center, or the identification of a second national
facility.
2)
Expanded
and improved electronic databases to capture and organize Drosophila data, and
integrate the information with databases used by other research communities. It is essential to support efforts that
can keep pace with the enormous rate and increasing complexity of data being
generated by Drosophila researchers, including up-to-date gene annotations and
the characterization of mutant phenotypes, RNA and protein expression profiles,
interacting gene, protein, RNA and small molecule networks. These efforts must also include
effectively linking Drosophila databases with those of other organisms,
including other well-established model systems and emerging systems for genome
research. Not only will this
development promote more rapid progress in Drosophila research, it should
significantly enhance progress in functional genomics overall by promoting
crosstalk among scientists working in different fields. Up-to-date and well-organized
electronic databases are essential conduits to translate information from fly
research to human research.
3)
A molecular
stock center that would provide the community with fair and equal access to key
molecular resources at affordable costs. These resources include commonly used vectors, cDNA and
genomic libraries and quality controlled cDNA or oligo-based microarrays and
genomic tiling arrays. Reliance on
commercial companies to provide microarrays may not be an adequate long-term
solution as it limits the widespread use and data distribution of important
technologies and information. We
believe that a molecular center that could generate and distribute these
reagents, particularly cDNA and genomic arrays, and serve as a technological
advice center would do much to advance the use of functional genomics by
individual investigators. Finally,
we point out that a well-run molecular stock center would be cost effective for
grant dollars and could serve multiple research communities.
In addition
to the resources described above, certain research projects that require large
infrastructures and investments over several years must be in place to realize
the full potential of Drosophila as a model system for functional and
comparative genomics. Several of
these projects are ongoing, use existing technologies, and require adequate
funding for their successful completion.
Others are projects that require the development of new
technologies. The research
community considers the following high priority projects.
4)
Sequencing
of a set of complete cDNAs representing the vast majority, if not all of the
genes of Drosophila melanogaster.
The cDNAs will be of enormous use by the community of researchers for
gene annotations and expression studies at the level of individual genes or on
global scales by microarrays. We
understand that NIH has made a 3-year commitment to the BDGP to sequence ~ 5000
new cDNAs with full length ORFs.
Together with the previous work, this should provide an estimated 80%
coverage. We emphasize the
importance of full funding of this project and the need to identify alternative
transcripts for many genes to understand the added complexity of multiple gene
products.
5)
Insertion
of the complete cDNA set into appropriate vectors for proteome and ribonome
studies. Such studies may
include analysis of protein-protein, DNA-protein and RNA-protein interactions. In addition to these studies, the
complete cDNA set could be used as a tool for large-scale production of
antibodies against Drosophila proteins.
Well-characterized cDNAs, which have been corrected for
amplification-mediated mutations, need to be placed in vectors that can be
manipulated for various proteomics applications. This would allow these tools
to be efficiently produced and made available to the community at reasonable
costs.
6)
Gene
disruption for a mutational analysis of the genes of Drosophila melanogaster.
An ongoing NIH-funded project will provide for the generation and
sequencing of nearly 10,000 unique P-element insertions for an anticipated 75%
coverage of the annotated genes.
Because many genes will be refractory to mutagenesis by transposable
elements, alternatives to P element gene disruption techniques should also be
considered a high priority.
Developing technologies such as TILLING, PCR-based deletion screening,
and SNP mapping of point mutants are important to accomplish the functional
analysis of the entire genome by mutations.
7)
Completion
of a Drosophila heterochromatin genome project. The sequence analysis of heterochromatin remains the major
roadblock toward the completion of the genome projects of essentially all
multi-cellular organisms.
Developing and testing technologies to tackle the challenges of dealing
with heterochromatin can best be accomplished in Drosophila melanogaster where a variety of experimental tools
can be brought to bear on the challenges of dealing with highly repetitive
DNAs. In addition, a
heterochromatin genome project is necessary to completely understand the
informational content and molecular organization of the Drosophila genome.
8)
The
sequencing of additional Drosophila species. The sequencing of D. pseudoobscura has recently been completed and
researchers worldwide are reaping the benefits for functional annotation of
coding sequences, for prediction of DNA enhancer sequences and RNA cis-regulatory sequences and identification of
non-coding RNAs. The sequencing of D. simulans and D. yakuba remain the top priorities for immediate
sequencing in the next year. In
March 2003, the Drosophila Board asked a group of colleagues, with expertise in
the areas of ecology, phylogeny, evolutionary biology, developmental biology, and bioinformatics, for advice
on the number and identity of species that should be considered top priorities
for the next sequencing projects. After careful consideration, the expert
group recommended the following
eight species, in addition to D. simulans and D. yakuba, for genome
sequencing in the next two years: D. willistoni, D. erecta, D.
ananassae, D. virilis , D. grimshawi, and D. mohavensis at 8X coverage; D. sechellia and D. persimilis at 3X coverage.
This proposal received enthusiastic endorsement by the Drosophila Board
and widespread community support.
A White Paper proposal that incorporated community input was submitted
to the NHGRI in June 2003 and is currently under review. Applications of the proposed
comparative genomics project include improving D. melanogaster gene annotations, identification of
conserved non-coding and coding regions of genes (including non-coding RNAs),
and tracking changes associated with gene and chromosome evolution. Because of
the vast knowledge of the phylogeny and biology of the drosophilids, we are
confident that the investment in these genome projects will be considered an
outstanding success, not only by Drosophila researchers, but by all who are
interested in comparative genomics and molecular evolution. Beyond the benefits to the Drosophila
community, this project will lead to the development of bioinformatic tools
that can be applied subsequently to the comparison and annotation of larger
vertebrate genomes. Improvements
in genome sequencing technologies over the last several years have lowered the
costs involved considerably to an estimated $3 million for a genome the size of
D. pseudoobscura. Thus, the total cost of this project
should be a fraction of the cost of sequencing a mammalian genome.
9)
Capturing
spatial expression patterns for all Drosophila genes. Particularly powerful is the protein-trap technology using a
transposable element with a GFP-containing exon to mark proteins and analyze
tissue and sub-cellular distribution of proteins in vivo.
Support to generate, maintain and provide these lines to the community
is considered a high priority since in vivo applications are broad and
powerful. Ongoing efforts have
also demonstrated the utility of genome-wide analysis of RNA expression
patterns using RNA in situ hybridization to embryos.
Thus far, 2500 genes have been analyzed and these efforts have
demonstrated an economy of scale.
This analysis should be completed for all genes and extended to other
tissues at different stages of the life cycle. The development of sophisticated imaging methods that
could capture dynamic expression patterns
in multi-dimensions and with sub-cellular resolution will add
substantially to the utility of this information.
Below we
categorize additional needs of the community that are judged to be best met by
R0-1, investigator-initiated efforts or pilot grants, rather than by large
project grants.
1)
An
efficient means of cryopreservation of Drosophila at any stage of development. There is no question that renewed
efforts to develop a suitable cryopreservation technique remains a high
priority for Drosophila researchers.
Successful application would reduce the stress on the national stock
center, ensure that valuable genetic resources are not lost and could curtail
costs involved in running fly kitchens, and constantly maintaining laboratory
stocks in all Drosophila labs.
2)
Continued
development of technologies for RNAi in whole flies. RNAi is now being used with high
success in cultured cell lines using simple delivery methods. However, efficient delivery in whole
flies remains a major challenge.
3)
Molecular
mapping of chromosomal deficiencies and duplications. The community uses chromosomal
deletions extensively to map genes of interest and to identify dosage-sensitive
modifiers of phenotypes.
Currently, an estimated 85 to 91% of the euchromatic portion of the
Drosophila genome is deleted and subdivided by existing chromosomal
deletions. A project to
molecularly map the endpoints of the existing set of deletions would be
straightforward to carry out. The
results would immediately define molecular intervals for mutations of interest
and tie cytogenetic breakpoints of these heavily used chromosomes to the genome
sequence. This would complement
the DrosDel project currently being carried out by a consortium in Europe. In
addition, and particularly relevant for analysis of X-linked genes, is the
molecular characterization of existing and newly generated genomic
duplications.
4)
Development
of new cell lines. Cell lines
have found increasing use in Drosophila but only a limited number of Drosophila
cell lines are available. In
particular, there is a need for tissue-specific cell lines that could be used
in RNAi screens (for example epithelial cells to screen for genes involved in
epithelial cell polarity), and for cell-cell interaction studies (i.e. cell
lines that fail to express a certain signaling pathway). Having access to a diverse set of
cell lines should facilitate the biochemical purification and analysis of
molecular complexes and would complement whole organism approaches.