RefMan
Sections RefMan
Table of Contents FlyBase
Documents
FlyBase Reference Manual
B. Detailed Descriptions of FlyBase Structure and Data
This section Last Updated: 10 November 2005
B.1. Genes
The Genes
section of FlyBase contains information on Drosophila genes that has been curated from the literature and sequence databases. Data
from all species of the family Drosophilidae are included. The initial data
set was produced by merging the genes data in the text of Lindsley
and Zimm (1992) with the old LOCI table of Ashburner, and Merriam's Genevent
database. Information from all three sources has, however, been considerably
revised and reformatted. New gene and allele records are added through FlyBase's
curation of the literature and sequence databases. The curation of phenotypic
data, a particularly complex class of Genes data, is discussed in Phenotypic
Data in FlyBase, Drysdale (2001).
Some of the records in Genes
will be transient. As more data become available some gene records will merge
with others. Furthermore, some of these records are based on minimal data, for
example, the annotation to an EMBL
or GenBank
sequence record. Our policy is to include data wherever we can. As records merge
(or split) they will always be traceable by their secondary gene identifier
numbers and by their synonyms.
One of the major differences between
Lindsley and Zimm (1992) on the one
hand, and Lindsley and Grell (1968)
and Bridges and Brehme (1944), on
the other, is that the 1944 and 1968 books were very much catalogs of mutations,
rather than of genes. Bridges
and Brehme (1944) and Lindsley and
Grell (1968) were allele based, while Lindsley
and Zimm (1992) is largely, although not entirely, gene based. FlyBase is
a gene based database, and Genes
reflects this change. Having said that, it will be apparent that the transition
is by no means complete in genes. For the majority of genes, mutant phenotypes
are described in the respective allele records. In many cases, where, as far
as we know, all mutant alleles have a similar phenotype, then this description
will be found in the record for the first allele in genes. Many genes in Lindsley
and Zimm (1992) had no alleles specified, although it is clear that these
genes were identified by one or more mutant alleles. In these cases we have
arbitrarily designated an allele with the superscript 1. (Likewise, where an
allele is referred to in text with a gene designation, we have regarded this
as implying allele 1, where this seems reasonable, and made the change to state
allele 1 explicitly). There remain, in Genes, many cases where phenotypic information
is to be found within the gene record itself. This is especially so for genes
for which there is a great amount of data.
Errors in Genes.
Genes data will not be free of errors,
typographical, of fact, or of interpretation. Please inform FlyBase when you
find any error in these data. It will then be corrected. E-mail to flybase-updates
at morgan.harvard.edu (reformat to standard e-mail address) or contact
a member of the FlyBase group, whose addresses and phone/fax numbers are given
in Reference Manual I: The FlyBase Project.
B.1.1.
General description of Genes data
The Genes file contains
a set of Drosophila gene records, the data of each record being organized into
many different fields. As far as possible, we have implemented controlled vocabularies
for the descriptions. These are indicated by [cv]. The controlled vocabularies
are to be found in controlled-vocabularies.txt.
This process is by no means complete, except for some of the simpler fields,
such as mutagen. For example all X ray induced alleles are described as 'X ray'
(without the quotes) in the allele origin field, never 'X rays', 'X-ray' or
'X-rays'.
The use of controlled vocabularies
will increase in the future. This will allow users to more easily search the
database and retrieve genes or alleles with particular properties.
Overall syntax: The maximum
line length is 255 characters; there are no blank lines; all lines begin with
either * or #; lines that begin with # have no other characters; lines that
begin with * have a letter in column 2, a space in column 3 and at least one
more character beginning in column 4. The character # appears nowhere else in
the file. The character * does, unfortunately, but the string *[A-Z,a-z] does
not.
Record structure: The lines
that are just '#' identify the end of record for a gene. All other lines hold
data for a gene, each field is one or more lines that have the same character
in field 2. This character identifies the field and, sometimes, its position
within a record (see below).
B.1.2.
List of Genes field descriptions
These are the current field designations
in alphabetical order:
*a gene symbol
*b genetic location
*c cytological location
*d biological role of gene product [cv]
*e full name of gene or allele
*f cellular compartment of which gene product is a component
[cv]
*g nucleic acid sequence databank and other DNA accession
number
*h polymorphism data
*i symbol synonym(s)
*j xenogenetic interaction information on alleles
*k phenotypic information on alleles
*l transposable element data
*m protein database accession number
*n aberrations causing position-effect variegation of gene
[cv]
*o origin/mutagen [cv]
*p phenotypic information on genes
*q information concerning functional relationships between
genes
*r information on wild-type biological role
*s molecular information for genes and alleles
*t class of gene [cv]
*u miscellaneous information on genes and alleles
*v information on availability
*w discoverer
*x reference(s)
*y secondary FlyBase identifier number(s)
*z primary FlyBase identifier number
*A allele symbol
*B alternative genetic location
*C comments on cytology associated with allele
*D comments on cytological location
*E a duplicate of a *x field, used to tie data to
a reference
*F function of gene product [cv]
*G insertion chromosome associated with allele
*H date record entered or updated
*I transgene construct that carries allele
*J protein domain information
*K arguably most useful aneuploids for this gene
*L synonym for transgene construct symbol
*M probable ortholog in reference species of drosophilid
*N synonym for insertion symbol
*O progenitor allele or chromosome if relevant to
allele
*P aberration causing the allele
*Q complementation information concerning alleles
*R comments on origin, including progenitor genotype
if irrelevant to allele
*S genetic interaction information on alleles
*T recent review article that discusses this gene
*U nickname
*V name synonym
*Y name of gene product
Field structure: The first
line of each record is the *a field. There is only one of these per record.
Other fields may appear in any order, and most can appear more than once, not
necessarily consecutively. All fields before the first *A field (if any *A)
refer to the gene. All fields between two *A fields (or between and *A field
and a #) refer to the immediately preceding allele. Thus, for example, *b fields
always appear before any *A fields, but *e fields can appear anywhere (e.g.,
"*e white" and "*e white-apricot"). Fields before the first *A are in a defined
order:
aHiezyCbcwBDdJUltrfvFghmnpqsuxE
In pretty outputs the *-codes are
replaced by a text term describing the field.
Special characters: There
are no special characters used in this file. Superscripts are enclosed between
square brackets []; subscripts between double square brackets [[]]. Greek letters
are written out, e.g. alpha, beta.
B.1.3.
Detailed description of the Genes fields
In this description the fields are
grouped logically, rather than alphabetically. Links in the list of field designations
in section B.1.2. above go to the relevant detailed field descriptions below.
- *H. Dating of
records and updates. All gene records have two date fields. The first, 'Date
entered', is the date a gene record was entered into the Sybase tables. The
second is 'Last updated', the date the record was last updated. When entered
the two dates will be the same. The 'zero' date of all records then
extant was 16 May 1994. FlyBase dates are represented as dd mm yy, mm being
the initial 3-letter abbreviation of the month, and yy being the last two
digits of the year (e.g., 01 Jul 94).
- *z, *y. Each
gene and allele record in FlyBase has a unique identifier number (see section
F.1. of Reference Manual F: Links To and From FlyBase).
The primary identifier number is in the *z field, secondary identifier numbers
are in *y fields.
Syntax: *z FBgn_integer
e.g., *z FBgn0001234
- *a. This is the
standard abbreviation (gene symbol) for the name of the gene. In the genes
file, gene records are sorted alphabetically. The order of precedence is:
all-Greek symbols (in alphabetical order), symbols that begin with a number
(in numerical order, secondarily sorted on suffix, i.e., 1, 2, 2a,
2b, 3), symbols that begin with a letter, lower case having precedence over
upper, and numerals precedence over letters, i.e., b, B, b1, ba).
Syntax: *a <Nnnn>\symbol
e.g., *a bb
*a Dhyd\Minos
Nnnn is an abbreviation for the species. The default species is D. melanogaster,
in which case there is no species abbreviation. If a gene is from another
species of drosophilid then this is indicated by Nnnn, where N is normally
the initial letter of the genus, and nnn are normally the first three letters
of the specific epithet. A list of species
abbreviations is in the Nomenclature
section of FlyBase.
Syntax: *e <Nnnn\>name
e.g., *e bobbed
*e Dhyd\Minos
Genes encoded by the mitochondrial genome all have the prefix Nnnn\mt:.
The D. melanogaster gene encoding the cytochrome oxidase subunit
II is, therefore, mt:CoII, the D. simulans gene encoding
the mitochondrial proline tRNA is Dsim\mt:tRNA:P. The record MT:DNA
is used for data concerning the mitochondrial genome and its products that
cannot be assigned to any single mitochondrial gene. The symbol mt:ori
is used for the non-coding A+T rich region of the mitochondrial origin of
replication.
FlyBase includes data on artificial gene constructs, for example fusions between
different genes. Fusion genes are named using the gene symbols of their components
separated by a double colon, e.g., Antp::Scr. The components are
listed in alphabetical order. When a component of a construct is from a species
other than D. melanogaster then its symbol is prefixed by Nnnn\
to indicate the species of origin. For example the lexA gene from E. coli
has the symbol Ecol\lexA. A list
of the species abbreviations used is to be found in the Nomenclature
section of FlyBase.
- *e. This is the
full name of the gene or allele. FlyBase takes a minimalist definition of
a gene. As an example, Notch is regarded as a gene, but facet,
Confluens, split etc. are not. These phenotypically distinct
allelic forms that have, in the past, been named as if they were genetic loci
are included as gene synonyms.
FlyBase is not entirely consistent in the way directly duplicated genes are
handled: for example the five HSP70 encoding genes at Hsp70A and
Hsp70B and the five larval cuticle protein encoding genes at 44D
are all listed independently but the five major histone protein coding regions,
tandemly repeated at the base of 2L, are each listed as a separate
gene, but only once.
Some loci have only been identified by molecular methods, not having been
mapped. Such loci are included in genes. Other "loci" included
in this file have not been genetically mapped or characterized but are assumed
to exist on the basis of, for example, a purified protein. Some loci have
been impossible to name in any logical way, due to a lack of data. As a temporary
expedient these are named as anon-*, where the * indicates a code.
These loci will be renamed as and when more data becomes available.
STS sequences identified by Drosophila genome projects appear in the nuceic acid sequence data archive, and in the NCBI's dbSTS database. These short sequences are routinely matched
against the universe of public sequence data and often have 'significant'
matches to genes identified in species other than Drosophila. Such matches
are clues that similar genes may occur in D. melanogaster. For this
reason STS sequences with significant matches are identified as 'genes' in
this file, and have the temporary name ESTSn (for STS sequences from
the European project) or BSTSn (for those from Berkeley), where n is the code
used by the Genome Project (e.g., ESTS100F7T, BSTSDm0092). STS sequences that
match known Drosophila genes will be linked to the relevant gene record by
their accession numbers in the GenBank/EMBL/DDBJ and dbSTS data archives.
STS sequences that have no matches whatsoever are only linked to their parental
clone in the clones tables. All STSs with matches are similarly linked to
their parental clones in these tables.
- *b, *B. Genetic
map position. Given as Chromosome number-map position, e.g. 3-10. If a gene
has not been mapped within a chromosome, then only the chromosome is indicated
as, for example, 2-. This implies '2- (not located)'. Many genes have been
mapped cytogenetically but not genetically. Their map positions have been
estimated and are enclosed in []. (Not {} as in Lindsley
and Zimm (1992).) The published map positions of some genes are clearly
at variance with their cytogenetic positions. In such cases we have estimated
their genetic position and indicate this by enclosing the estimate in [].
*B is used to store comments on genetic map positions, including unresolved
differences between some genetic map positions in Lindsley
and Zimm (1992) and those in Ashburner's original files.
To estimate genetic map positions from cytogenetic we use a standard table
made by plotting all of the available data and then interpolating. Estimated
genetic positions are normally only made to the nearest whole number. The
exceptions to this rule are in regions of very low recombination relative
to the cytogenetic map. The table of cytogenetic
vs. genetic map positions used is available in the Maps
section of FlyBase.
Syntax: *b chromosome_symbol-number
e.g., *b 1-66.0
- *c. Cytogenetic
map positions. These are given as extreme left and right hand limits. In the
case where one of these limits is said to be a doublet, e.g., "35D1,2", then
only the outermost band (in this case 35D1 if this was the left-hand end of
the range) is given. The limits are separated by a hyphen.
Syntax: *c left_hand_limit--right_hand_limit
e.g., *c 25C--25D
Many genes have been mapped genetically but not cytogenetically. Their map
positions have been estimated and are enclosed in [].
Following the cytogenetic range there may be a statement regarding how it
was established, e.g., by in situ hybridization. When a cytogenetic range
or a statement of how it was derived appears "unattributed", i.e., not in
a block headed "Data from ref. nnnn", it is computed from all available data
and the tightest deducible range is shown. In cases where different reports
give conflicting data, FlyBase has made a decision to mark one or more statements
as suspect by prefacing them with "???". Such statements are excluded from
the computations that give rise to CytoSearch
data. If you find that an error has been made in this process, please inform
us by email to flybase-updates at morgan.harvard.edu.
- *D. *D is used
to store comments on cytological map positions. This may include text giving,
for example, information that a weaker in situ signal was seen elsewhere.
- *K. Arguably
most useful aneuploids for this gene. This is the algorithm for identifying
the listed aberrations:
1) Admissible aberrations are ones that have no progenitor (too hard to work
out what's missing) and whose class is one of Deficiency, Deficient translocation,
Deficiency (first two listed breaks) plus Inversion, Tandem duplication or
the three insertional duplication classes, plus separable components of aberrations
that have no progenitor and whose class is one of the insertional transposition
classes (this may be extended to inversion recombinants and translocation
segregants in the future).
2) Aberrations are first prioritized into the following categories:
- those available at Bloomington
- those with a 2000 reference
- those with a 1990-9 ref not including L&Z
- those with a 1980-9 ref
- those with a 1970-9 ref
- those with a 1960-9 ref not including L&G
and then each category is sorted by distance between first two listed breaks
(number of bands, smallest aberration first, taking the minimum size). This
is the "league table" of aberrations.
3) The first aberration in the league table that is stated (in the aberration
record) to delete the relevant gene is listed as:
*K Deficiency: <Df symbol>
Similarly the first ab in the league table that is stated (in the aberration
record) to be duplicated for the relevant gene is listed as:
*K Duplication: <Dp symbol>
4) The first aberration in the league table whose minimum deleted region extends
at least two bands either side of the gene's region of uncertainty is listed
as:
*K Deficiency: <Df symbol> (inferred from cytology)
but only if it appears earlier in the league table than the one (if any) listed
in step 3. Similarly for duplications, as:
*K Duplication: <Dp symbol> (inferred from cytology)
- *i. Symbol synonyms.
As mentioned above FlyBase takes a very liberal view of synonyms, and the
table gene-synonyms.txt in the Genes
section is provided as a tool to allow the identification of the name, and
symbol, that FlyBase uses for each gene or allele. In Genes
these data are kept in the *i field, for both gene and allele synonyms.
Syntax: *i synonym_symbol: synonym name <text, e.g. a reference>
e.g., *i ho: heldout
*U. Nickname.
Nicknames are valid alternative symbols for a gene or allele. Nicknames support
the use in Drosophila genotypes of foreign gene symbols sans the species identifier,
for example, lacZ rather than Ecol\lacZ. Nicknames are assigned
only to foreign genes that frequently appear in Drosophila transgene constructs.
- *V. Name synonyms. This field records
full names that correspond to symbols that have become synonyms of both genes and alleles. No effort is made
to represent the relationships between symbol synonyms and their corresponding name synonyms. Not all symbol
synonyms have a name synonym, and vice versa.
- *Y. Name of
the gene product. This field is moderately controlled. The suffix '-like'
is used to indicate that a gene product has been named by similarity.
- *d. Biological
role of gene product. This field gives information concerning the biological
role(s) of the gene product. The terms used are from the process ontology
of the Gene Ontology Consortium
database and include the GO identifier number. The 'evidence' for an attribution
may follow the term as a 'pipe' (i.e., after the character |). Statements
of evidence are drawn from a small controlled vocabulary:
inferred from mutant phenotype
inferred from genetic interaction
inferred from physical interaction
inferred from sequence similarity
inferred from direct assay
inferred from expression pattern
inferred from electronic annotation
traceable author statement
non-traceable author statement
Note about 'inferred from mutant phenotype': The GO consortium regards alterations
of gene expression as 'phenotype' in the context of this evidence code. The
description of mutant phenotypes in the FlyBase Allele data (see section on
*k), however, is restricted to alterations of the anatomy or organismal function
of the mutant, and does not include expression pattern data. For more about
the GO evidence codes see http://www.geneontology.org/doc/GO.terms_and_ids.
- *F. Function
of gene product. This field gives information about the function(s) of the
gene product. The terms used are from the function ontology of the Gene
Ontology Consortium database and include the GO identifier number. GO
function terms also include cross-reference to the ENZYME
database. Statements of evidence are drawn from a small controlled vocabulary:
inferred from mutant phenotype
inferred from genetic interaction
inferred from physical interaction
inferred from sequence similarity
inferred from direct assay
inferred from expression pattern
inferred from electronic annotation
traceable author statement
non-traceable author statement
Note about 'inferred from mutant phenotype': The GO consortium regards alterations
of gene expression as 'phenotype' in the context of this evidence code. The
description of mutant phenotypes in the FlyBase Allele data (see section on
*k), however, is restricted to alterations of the anatomy or organismal function
of the mutant, and does not include expression pattern data. For more about
the GO evidence codes see http://www.geneontology.org/doc/GO.terms_and_ids.
- *J. Description
of the structural features of gene products. These data are not curated by
FlyBase but are from the InterPro
database. InterPro provides an integrated view of the commonly used protein
domain or signature databases. Release 3.1 (May 2001) was built from Pfam
6.0, PRINTS 30.0,
PROSITE 16.35, ProDom
2001.1, SMART 3.1 and the current SWISS-PROT
+ TrEMBL data.
Syntax for InterPro cross references:
*J InterPro_number == InterPro_accession_name
e.g., *J IPR000014 == PAS domain.
- *f. Cellular
compartment of which gene product is a component. This field gives information
about the cellular compartment(s) of which the gene product is a component.
These include not only the obvious parts of a cell (nucleus, mitochondrion),
but also all defined supra-molecular complexes (e.g., small ribosomal subunit,
proteasome. The terms used are from the cellular component ontology of the
Gene Ontology Consortium database
and include the GO identifier number. Statements of evidence are drawn from
a small controlled vocabulary:
inferred from mutant phenotype
inferred from genetic interaction
inferred from physical interaction
inferred from sequence similarity
inferred from direct assay
author said so
not available
- *g. Nucleic acid
sequences. In these fields FlyBase stores pointers to nucleic acid sequence
data, usually in the form of EMBL/Genbank/DDBJ/NCBI accession (AC) numbers.
If a sequence has been published but is not yet in one of these data banks
a brief journal reference is given instead (the full reference will be found
in References). Data from the three
nucleic sequence databases are received on a daily basis by FlyBase.
FlyBase is also cross-referenced to a number of other sequence databases.
These cross-references are stored in the *g line (if nucleic acid) or *m line
(if protein). These other databases and the database code used in FlyBase
to identify links to those databases are listed in Reference
Manual F.3. The accession numbers for all external sequence links are
listed in the file external-databases.txt.
The EMBL/NCBI/DDBJ sequence accession numbers have no database code prefix.
Syntax: *g <database_code/>accession_number
e.g., *g X12345 *g EPD/23023
If the nucleic acid sequence accession includes coding regions then each coding
region has a unique PID number. These are appended to the nucleic acid sequence
accession number, following a semi-colon, e.g.,
*g U42989; g1150983
Note that the number of PIDs attached to a sequence record may be more than
one for two reasons. The first is that the EBI and NCBI often assign PID numbers
independently to the same object; the other is that there is more than one
protein product from a single gene (as the result, for example, of alternative
splicing).
- *r. The *r field
is used for information about the wild-type biological role of a gene. The
objective is for each gene record to have a *r field in which information
about the gene's biological role is summarized. The present situation, however,
is that for the majority of genes this information is still to be found in
the *p field of the gene record. FlyBase is systematically rewriting these
*p fields (historically derived from the 'Phenotype' field of Lindsley
and Zimm (1992)) so that the summary of wild-type function is moved to
the *r field.
- *n. Aberrations
causing position-effect variegation of gene. This is a controlled field to
indicate aberrations that cause position-effect variegation of a gene.
Syntax: *n recessive PEV in: <aberration_symbol>
*n dominant PEV in: <aberration_symbol>
*n no PEV in: <aberration_symbol>
- *m. Protein sequence
data. The *m field stores pointers to protein sequence data, usually in the
form of SWISS-PROT/TREMBL/PIR protein sequence databank accession (AC) numbers.
Because of potential clashes between the accession numbers between databases
the AC numbers are prefixed "SWP/", "TREMBL/" or "PIR/".
These fields are also used for cross-references between FlyBase and structural
data on Drosophila proteins held on PDB (Protein Data Bank, Brookhaven),
the NRL_3D databank and the G protein-coupled receptor database (GCRDb). These
records have the prefixes PDB/, NRL_3D/ and GCR/ respectively. Cross-references
to the 'factors' table of the TRANSFAC database (E. Wingender, J. Biotechnol.
35:273-280, 1994) have the prefix TF/.
Syntax: *m database_code/accession_number
e.g. *m SWP/P12428
- *M. Probable
ortholog in other species of drosophilid. The *M field is a pointer between
"orthologous" genes in another species of drosophilid. A single species (D.
melanogaster when possible) is treated as the "reference" for a given
gene, and links are made with *M fields between the gene of the reference
species and probable orthologs. No direct *M links are made between the non-reference
genes.
Links are only made where there is good genetic or phenotypic (including sequence)
evidence for homology of entire genes. It is not uncommon for a gene to be
present once in species a but twice (or more) in species b
(e.g., Adh in D. melanogaster vs. D. mulleri).
In such cases all possible pair-wise links are made via *M fields.
Syntax: *M <Nnnn>\gene_symbol
Although genes in different species of Drosophila characterized by
sequencing generally have the same gene symbol as the presumed homolog in
D. melanogaster this is by no means true for genes characterized
by mutations in these species. In these instances 'homology' is usually deduced
from mutant phenotype and linkage group. No attempt has (yet) been made to
impose homologies, over and above suggestions made in the literature.
- *p. Phenotype.
The *p field holds phenotypic information about a gene (or, as explained above,
about its mutant alleles in some cases). This field is free text and, by and
large, has not yet been standardized with respect to its vocabulary. One special
use of the *p field is to hold information on gene interactions. These are
expressed as follows:
*p Interacts genetically with: [gene_symbol]
- *u. The *u field
is for miscellaneous information concerning a gene, as free text. Notes concerning
the identification of the gene, or the derivation of the gene symbol/name
are stored following the corresponding 'Identification:' or 'Etymology:' prefix.
- *s. Molecular
data. These fields keep molecular data about genes and alleles. The *s field
at the gene level is subdivided into five categories. In addition to the free
text category there are four additional categories distinguished by a set
of controlled prefixes:
Gene order: Accommodates gene order/orientation data derived
by molecular, rather than genetic, means. The data will be presented in the
format 'Gene order: In direction of increasing cytology: Dredd- su(s)+' where
+ indicates 5'-3' proceeds with increasing cytological location, - the opposite,
and ? where the direction of transcription is not declared. Where orientation
with respect to the chromosome is not known, gene sequence is preceded by
the statement "Overall orientation not stated" and + and - simply reflect
orientation of the transcripts with respect to each other. Where a 'Gene order'
line begins or ends with an ellipsis (...) this indicates that the complete
gene order described in the publication is more extensive than this subset
reported for the gene in question. Gene reports for genes at either end of
the reported line will continue the molecular gene order over a greater extent.
Maps to clone: Accommodates positive relationships between
a gene and clones (P1, BAC, YAK) as used by large scale public genome projects.
Does not map to clone: Accommodates negative relationships
between a gene and clones (P1, BAC, YAK) as used by large scale public genome
projects.
Identified with: Accommodates relationships between a gene
and ESTs or STSs as generated by large scale public genome projects.
The *s field at the allele level is free text but for the following three
controlled prefixes.
Construct: Used to denote an 'allele' engineered in vitro by recombinant
DNA technology and assayed in the genome after germline transformation or
in transient assays in the whole organism or cell culture.
Amino acid replacement: prefixes a standard format statement about
the nature of the mutation. Format is 'letterNletter' where each letter refers
to the standard amino acid single letter code, and N is the residue of the
encoded protein that is altered. Thus C67Y denotes that the cysteine at position
67 is replaced by a tyrosine. Stop codons are represented by @. Question marks
? represent uncertainty or lack of information about the amino acid or position
in question.
Nucleotide substitution: prefixes a standard format statement about
the nature of the mutation. Format is 'letterNletter' where each letter refers
to the nucleotide, and N is the position of the affected nucleotide. Thus
C313T denotes that the C at position 313 is replaced by a T. Note that the
numbers in "Nucleotide substitution" data reflect author statement and do
not necessarily have any significance with respect to "Nucleotide substitution" statements from other authors.
- *q. The *q field
holds data about genes or groups of alleles that pertain to the relationship
between that gene and other genes. For example, statements that alleles of
gene A complement alleles of gene B, that, in addition to explicitly named
alleles of this locus, a further ten alleles had been isolated, or that the
gene may be the same as another, would be kept in this field. This field accommodates
data stored with several controlled prefixes:
"Source for merge: gene1 gene2" statements mark publications as containing
the evidence that the named gene1 and gene2, previously recorded as being
distinct, correspond to the same gene, giving rise to the merging of the two
gene records in FlyBase into one.
Other controlled prefixes for this field deal with functional complementation
relationships between the gene in question and genes of other species. Prefixes
are:
Functionally complemented by:
Does not functionally complement:
Is not functionally complemented by:
Partially functionally complements:
Partially functionally complemented by:
Gain of function effect when expressed in:
No gain of function effect when expressed in:
- *l. Information
about the nature and molecular characteristics of transposable elements is
contained in *l field.
*l element type:
*l terminal repeat length in bp:
*l total length in bp:
*l target site duplication length in bp:
*l number of copies in genome:
*l component genes:
The allowed values of 'element type:' are:
LINE, LINE-like retrotransposons
SINE, SINE-like elements
LTR, retroviral-like elements with long terminal repeats
IR, elements with inverted repeat termini
FB, fold-back elements
- *h. Polymorphism
data. The *h fields store data from population studies. These data are subdivided
into categories.
variability: a (more or less) quantitative statement of variability
at the locus.
sampled from: the geographic locations of the populations sampled.
sample size: the number of populations/strains analyzed.
no. of KB assayed: the extent of the region assayed.
type of assay: method used to measure variability (see CV).
comments: comments on the results and conclusions of the analysis.
- *t. Class of
gene. This field holds information about the class of the genetic element.
The default is a protein-coding gene carried by the nuclear genome of a species
of drosophilid.
The following classes of nuclear non-protein-coding gene are recognized:
*t nuclear_non-protein-coding_RNA_gene: the parent class of the following:
*t cytosolic_tRNA_gene: for tRNA encoding genes.
*t cytosolic_ribosomal_RNA_gene: for rRNA encoding genes.
*t nuclear_small_nucleolar_RNA_gene: for snoRNA encoding genes.
*t nuclear_snRNA_gene: for small-nuclear (snRNP) encoding genes.
*t nuclear_untranslated_RNA_gene: for other nuclear chromosomal genes none
of whose transcripts encode a protein.
*t small_intermediate_RNA_encoding_gene: for genes reported to encode siRNAs.
*t microRNA_encoding_gene: for miRNA encoding genes.
Mitochondrial genes. Genes encoded by the mitochondrial genome have the symbol
prefix 'mt:' or 'Nnnn\mt:' if from a species other than D. melanogaster.
The following classes of mitochondrial_gene are recognized:
*t mitochondrial_gene: the parent class of the following and used only for
generic MT:DNA records and for the mitochondrial replication origin, mt:ori.
*t mitochondrial_protein-coding_gene: for protein coding genes of the mitochondrial
genome.
*t mitochondrial_non-protein-coding_gene: the parent class of the following:
*t mitochondrial_tRNA _gene: for mitochondrial
encoded tRNA genes.
*t mitochondrial_ribosomal_RNA_gene: for mitochondrial
encoded rRNA genes.
*t pseudogene: Nonfunctional loci with sequence identity to a functional gene.
*t microsatellite: Loci composed of tandem repeats of short (1 to 10 bps)
nucleotide sequences.
*t transposable_element. A natural transposable element of a drosophilid.
Information concerning the class of the element is held in the *l field.
*t transposable_element_gene. A gene carried by a natural transposable element
of a drosophilid. The symbol of this gene will be of the form 'N\m', where
'N' is the symbol of the transposable element and 'm' is the symbol of the
particular gene.
*t repetitive_element. A natural non-coding repetitive element of a drosophilid.
This is used for non-coding elements for which evidence that they are transposable
is lacking. Includes satellite DNA sequences (satDNA).
*t virus_symbiont_pathogen: Viruses, symbionts, parasites and pathogens of
Drosophila. Includes components of such entities.
*t safe_element: Structural and/or non-coding functional elements. Includes
telomeres, centromeres, DNA amplification sites, scaffold sites, and boundary
elements. Does not include non-coding elements of other classes, e.g., promoters,
enhancers, introns, which are considered to be components of the default class
of genes.
*t sire_element: Synthetic and/or isolated regulatory elements, restricted
to regulatory elements widely used in an isolated context, such as mobile
activating elements. Does not include regulatory elements used to drive reporter
genes. An example is the synthetic GMR (glass multimer reporter) element,
as used in transgene constructs designed to activate adjacent endogenous genes.
*t fusion_gene: Genes synthesized as a fusion of two, or more, coding regions,
at least one being a Drosophila gene. Each component of a fusion gene has
a single gene entry as either a normal gene, foreign_gene or a fusion_gene.
*t foreign_gene: A gene from a non-drosophilid.
*t foreign_fusion: A fusion gene, as defined above, that includes a coding
region from a foreign gene.
*t foreign_transposon: Used for foreign transposons brought into Drosophila
for the purposes of analysis or transgene generation.
*t foreign_transposable_element_gene: A gene carried by a transposable element
of a non-drosophilid.
*t safe_element.f: A structural and non-coding functional element from a species
other than D. melanogaster, frequently used in D. melanogaster
transgene constructs.
*t sire_element.f: A SIRE (see definition above) from another species.
*t uncertain: Many genes in FlyBase have information
that is only of historical interest, because they were identified by mutations
that are now lost, were never sequenced, etc. It is important that searches
of FlyBase genes not return an oppressive number of hits to such genes. Hence,
we have developed a complex criterion by which genes can be classified as "uncertain", and such genes are only included in search hits if this is specifically
requested on the Genes query form.
This criterion is purely rule-based, so the set of "uncertain" genes is recomputed
at each genes update. The rules that comprise the criterion may be modified
in the future, in the light of experience of how well they describe only the
appropriate genes. The current criterion is that a gene is marked uncertain
if and only if:
(it is a Drosophila melanogaster standard gene, not a virus, transposable
element, etc.)
AND ( (it appeared in a prior, but not the current, release of the genome)
OR ( (it has no references dated post-1989 except for Lindsley
and Zimm and/or FlyBase curation)
AND (it has no GO (*d, *f or *F) data)
AND (it has no DNA/RNA or protein sequence or gene
order data)
AND (it has no alleles in any stock lists held by
FlyBase, either held by public stock centers or the community)
AND (its most specific mutant phenotype is shared
by alleles of at least nine other genes)
AND ( ( (it has no complementation data against aberrations)
AND ( (it has no cytological or within-chromosome
meiotic mapping data)
OR ( (its cytological range
of uncertainty exceeds two lettered subdivisions)
AND (its most
recent reference is pre-1970) ) ) )
OR
( (its gene symbol is an anonymous lethal or sterile)
AND ( (its cytological range of
uncertainty exceeds two lettered subdivisions)
OR (its most recent reference is
pre-1970) ) ) ) ) )
*t multicopy_xxx (where "xxx" is another *t). Some genes are present in the
Drosophila genome as clusters of genes, whose products are so similar that they
are traditionally referred to by a single name. This is true of various
RNA-encoding genes such as 5SrRNA and bb, and also of the histones in 39D. It is
necessary in some circumstances to refer to individual members of such clusters.
Hence, the "gene" 5SrRNA is given the gene class "multicopy_cytosolic_ribosomal_RNA_gene" to indicate its composite nature, and
individual members of the 5SrRNA cluster are given the gene class "cytosolic_ribosomal_RNA_gene". The individual genes, as and when they are
instantiated, are given symbols of the form "x:y", where "x" is the symbol of the multicopy gene and "y" is a unique identifier, e.g. "5SrRNA:CR33353". The
multicopy gene and its member genes are linked by "relationship to other genes"
data of the form "component genes: 5SrRNA:CR33353, ..." and "member gene of:
5SrRNA".
*t xxx_cassette (where "xxx" is another *t). There are various types of
"composite gene" which are defined as such not because all their
members are virtually identical, but because of some functional or
structural relationship.
Two types of "cassette" are currently defined: a cluster of closely
related genes with similar function and gene expression, for example
the histone complex HIS-C, and a natural transposable element, whose
component genes are those that it carries. (In the case of
transposable elements we retain "transposable_element" as the gene
class, as opposed to "transposable_element_gene_cassette").
As with the multi-copy genes, it is necessary to link the cassette to
its parts, and this is done with "relationship to other genes" data of
the form "encoded by: HMS-Beagle" and "encoded genes: HMS-Beagle\gag,
HMS-Beagle\pol".
Also, it should be noted that "multicopy_xxx" and "xxx_cassette" can be
combined. The existing cases of this are bb, Ybb and HIS-C. For example, bb has
*t multicopy_cytosolic_ribosomal_RNA_gene_cassette and links to the genes
2SrRNA, 5.8SrRNA, 18SrRNA and 28SrRNA by "encoded genes" lines; both bb
and its components also -- potentially -- have member genes. Moreover,
the RNAs are encoded genes of Ybb as well as of bb.
- *A. Alleles.
Each allele record begins with a *A field with the gene and allele symbol.
*e, and *i fields, for the full allele name and synonyms, are used as for
the gene records.
Syntax: *A gene_symbol<up>allele_symbol</up>
*e allele_name
e.g. *A bb<up>G2</up>
*e bobbed of Goldschmidt
For some loci Lindsley and Zimm
(1992) gave only cross-references to Lindsley
and Grell (1968) or Bridges and
Brehme (1942) for lost alleles. FlyBase has included the data as published
in these earlier catalogs.
There is one class of 'allele' that FlyBase treats in a non-traditional way,
that of alleles named as a consequence of a variegating position effect. By
definition, these do not affect the structure of the gene, only its expression.
For this reason position effect alleles are not included in the genes file.
The aberration which gives rise to the position effect is, of course, in the
aberrations file and the fact that it causes a position effect (or not) is
noted in the *V lines of that file.
There are few exceptions to this policy. There are a handful of alleles that
may or may not be due to a position effect, the absence of any cytological
description of their chromosomes makes it impossible to tell. In these cases
their records will include a *k line as follows: *k may be due to position
effect variegation of normal allele.
- *v. Information
on availability. If a publication reports that an allele is lost, that information
is recorded in the *v field. Note that not all such reports in the literature
are authoritative.
- *o, *O, *R.
Origin of alleles. The *o field holds the data on the 'origin' of an allele,
usually the mutagen used to induce it, but the origin may well be 'natural
variant'. A controlled vocabulary is used in *o. This controlled
vocabulary includes the CAS
Registry Numbers of chemicals.
Syntax: *o mutagen
e.g. *o spontaneous *o ethyl methane sulfonate
Where the value in *o begins 'in vitro construct' this field is bipartite,
reflecting the type of in vitro mutagenesis employed to create that allele:
*o in vitro construct | regulatory fusion
*o in vitro construct | site directed
The legal entries
for this field are listed in controlled-vocabularies.txt within the Documents
section, along with all other mutagen terms.
The *O field is for the chromosome on which the mutation was induced or the
progenitor allele name (e.g., for revertants). This field is only used if
the progenitor is relevant to the derivative. The values in this field will
be valid FlyBase allele or aberration or transposon insertion symbols. Where
a *O field houses more than one value, each followed by " \?", this signifies
that the progenitor chromosome is one of the named alternatives.
*R is miscellaneous data about an allele's origin, for example that it was
simultaneously induced with another mutation, or information about the genotype
of the progenitor which is irrelevant to the derivative. This is a formatted
free text field.
- *Q. carries
miscellaneous inter-allele information as free text.
- *C. Cytology
of alleles. The *C field holds the information about the cytology of the allele,
either that the 'Polytene chromosomes are normal' or comments about possible
cytological abnormalities.
- *P. Associated
aberration. Holds the symbol of the aberration for those alleles caused by
an aberration break. If an allele is associated with but separable from an
aberration then that data will be in the *R field. If an allele was induced
in an aberrant chromosome, then that is indicated in the *O field.
- *G. Insertion
chromosome associated with allele. Transposon or transgene construct thought
to be responsible for a mutation are recorded in the *G field. Transposons
and transgene constructs are named according to the rules set out in the FlyBase
nomenclature document.
For example, an unmarked P-element is named P{}, the lArB
transgene construct is P{lArB}, a copia element, copia{}.
Insertions of unidentified transposons have the symbol *{}. Following
the closing brace is the allele symbol (identical to the preceding *A field);
the complete symbol (e.g., P{lArB}wgNZ) is the designation
of the insertion chromosome.
- *N. Synonym
for insertion recorded in *G.
- *I. Transposon
or transgene construct that carries an allele. An allele being carried on
a transposon/transgene construct, as opposed to being caused by its insertion,
is denoted by the symbol of the transposon/transgene construct appearing in
a *I field under the allele, e.g., *I P{lArB} under Adh+t3.2.
- *L. Synonym
for transposon or transgene construct recorded in *I.
- *k. Mutant phenotype.
This holds the phenotypic description of the mutant allele. This description
is restricted to alterations of the anatomy and organismal function of the
mutant, and does not include gene expression pattern data. (This contrasts
with the use of 'phenotype' in the GO term evidence code 'inferred from mutant
phenotype' which does encompass expression pattern data - see *d, *F and *f).
The *k field is free text, except for the following classes of information:
*k Phenotypic class: This field can be multi-component, storing information
about the recessive/dominant and conditional and stage specific aspects of
allele in addition to the phenotypic class into which the allele falls. Vertical
bars separate the components:
*k Phenotypic class: lethal | embryonic | maternal
effect | recessive
An allele can legitimately have multiple '*k Phenotypic class:' lines.
*k Phenotypic class: lethal | recessive
*k Phenotypic class: flightless | dominant
Where a genotype appears in curly brackets at the end of the line, that phenotypic
class of the allele is dependent on the {second site} genotype in the brackets.
*k Phenotypic class: visible | dominant { Scer\GAL4how-24B
}
Where a '(with allele)' statement appears at the beginning of the line that
phenotypic class is particular to the allelic combination of the allele that
is the subject of the report and the allele (of the same gene) stated in the
'(with allele)' statement.
*k Phenotypic class: (with fafFO8)
visible
*k Phenotype manifest in: This field describes the body part affected by the
mutant allele, using the body part terms as listed in the controlled vocabulary.
*k Phenotype manifest in: wing vein L5
Where a genotype appears in brackets at the end of the line, the phenotype
in that body part is dependent on the {second site} genotype in the brackets.
*k Phenotype manifest in: wing { Scer\GAL4dpp.blk1
}
The presence of a term in this field means simply that the named structure
can demonstrate a mutant phenotype as a consequence of the mutant allele.
Thus for maternal effect alleles, the embryo in which the named body part
is affected is not necessarily mutant for that allele in question, though
its mother was. Also, the phenotype need not be 100% penetrant and expressed
for the affected body part to be recorded in a 'Phenotype manifest in:' field.
Terms can be combined using an & symbol:
Phenotype manifest in: cuticle & procephalon
Phenotype manifest in: scutellum & macrochaetae
Where a '(with allele)' statement appears at the beginning of the line that
phenotypic class is particular to the allelic combination of the allele which
is the subject of the report and the allele (of the same gene) stated in the
'(with allele)' statement.
*k Phenotype manifest in: (with fafFO8)
eye
*k Mode of assay: This field is mandatory for all alleles that have '*o in
vitro construct'. The possible entries in this field are:
*k Mode of assay: In transgenic Drosophila
*k Mode of assay: Whole-organism transient assay
*k Mode of assay: Drosophila cell culture
*k Mode of assay: In transgenic Drosophila (allele
of one drosophilid species in genome of another drosophilid)
*k Mode of assay: Whole-organism transient assay
(allele from one drosophilid species assayed in another drosophilid)
*k Mode of assay: In transgenic Drosophila (allele
of foreign species in genome of drosophilid)
*k Mode of assay: Whole-organism transient assay
(allele of foreign species assayed in drosophilid)
The capture, storage and reporting of phenotypic data is discussed in Phenotypic
Data in FlyBase, Drysdale (2001).
- *S. Genetic
interaction information on alleles
*S Genetic interaction (effect, class):
*S Genetic interaction (anatomy, effect):
*S Genetic interaction (effect, class):
*S Genetic interaction (effect, anatomy):
*S Genetic interaction: free text
These 'Genetic interaction' fields store information about phenotypic class
and affected body parts for mutant combinations of genetically interacting
alleles. The interacting allele is indicated in the curly brackets {}. Phenotypic
class and Anatomical term values are as for *k fields.
*S Genetic interaction (class, effect): visible,
enhanceable { ml[1] }
*S Genetic interaction (anatomy, effect): eye,
enhanceable { ml[1] }
*S Genetic interaction (effect, class): enhancer,
visible { S[1] }
*S Genetic interaction (effect, anatomy): enhancer,
eye { S[1] }
The capture, storage and reporting of phenotypic data is discussed in Phenotypic
Data in FlyBase, Drysdale (2001).
- *j. Xenogenetic interaction information on alleles
*j Xenogenetic interaction (class, effect):
*j Xenogenetic interaction (anatomy, effect):
*j Xenogenetic interaction (effect, class):
*j Xenogenetic interaction (effect, anatomy):
*j Xenogenetic interaction: free text
These 'Xenogenetic interaction' fields store information about
phenotypic class and affected body parts for mutant combinations of
genetically interacting alleles where one of the interaction
participants is from a species distinct from either the other of the
interacting pair, or both are distinct from the species in which the
assay is being performed. Examples include tests for functional
complementation between candidate homologs from different species. The
format of these fields is the same as for '*S Genetic interaction'
fields. The interacting allele is indicated in the curly brackets {}.
Phenotypic class and Anatomical term values are as for *k fields.
*j Xenogenetic interaction (class, effect): cell death defective,
suppressible { Cele\ced-9[hs.PH] }
*j Xenogenetic interaction (anatomy, effect): leg, enhanceable {
Mmus\eed[hs.PW] }
*j Xenogenetic interaction (effect, class): suppressor, visible {
Hsap\MAPT[GMR.Ex.PJ] }
*j Xenogenetic interaction (effect, anatomy): enhancer, vMP2 neuron {
Ggal\MLCK[ct.Scer\UAS], Scer\GAL4[ftz.ng] }
- *x, *T, *E.
References. *x fields, in both gene and allele records, are references.
Syntax: *x FBrfnnnnnnn == abbreviated_reference
e.g., *x FBrf0036029 == Saigo et al., 1981, Cold Spring Harbor Symp. Quant.
Biol. 45:815--827
The FBrf number is the unique reference identifier number from references,
which also includes the full reference.
*T lists recent review(s). For each gene, this is the list of all the reviews
published in the last four years which were determined by FlyBase curators
as having that gene as a significant topic, except that the list is truncated
to more recent years when that still leaves at least three references (for
example, if there are two dated 1999, two dated 1998, two dated 1997 and two
dated 1996, then only the two from 1999 and the two from 1998 are listed).
The most recent are placed first.
The *E field is always a duplicate of a *x field within the same record. It
is a device to tie particular data to a particular reference. The data fields
then immediately follow the *E field.
The referenced block of fields is terminated by the next *E or *A field, or
the end of record line (#).
- *w. Discoverer.
This field contains the name of the individual who identified the allele,
or the name of the leader of the group that identified the allele.
B.1.4.
Nontraditional alleles
In addition to 'alleles' in the traditional
sense, FlyBase now names and curates further classes of allele so that phenotypic
or expression pattern data can be captured for in vitro construct alleles and
alleles of reporter (e.g., Ecol\lacZ), effector (e.g., Scer\FLP)
or toxin (e.g., Rcom\DT-A) genes. Since these alleles have not historically
been named by researchers, and have been named by FlyBase, their presentation
in FlyBase requires some explanation:
B.1.4.1.
Alleles of reporter genes
Alleles of reporter genes currently
fall into two main classes, those resulting from enhancer trap experiments,
and those resulting from promoter (or other regulatory region) analysis, where
a fragment is used to drive the expression of a reporter gene. Ecol\lacZ
will be used for illustration.
Enhancer trap results:
- The enhancer trap construct causes
an allele of a gene and is expressed in a pattern consistent with insertion
in that gene. The resulting aberration will be described with the format P{A92}hL43a,
and the Ecol\lacZ allele symbol is of the format Ecol\lacZh-L43a.
- The reporter gene reflects the
expression of a gene without causing a mutant allele of that gene. The resulting
aberration will be described with the format P{PZ}P2023-44, where
P2023-44 reflects the insertion identifier, and the Ecol\lacZ
allele symbol is of the format Ecol\lacZhh-P2023-44.
- The reporter gene reflects the
expression of an undescribed gene/enhancer. The resulting aberration will
be described with the format P{lacW}1.28, and the Ecol\lacZ
allele symbol is of the format Ecol\lacZ1.28.
Promoter analysis results:
- Generally some fragment of a gene
promoter/intron/3'-region is fused to the reporter gene. In this case the
allele symbol is of the form 'gene symbol.fragment descriptor' e.g., Ecol\lacZeve.prox54.
The fragment descriptor reflects that used in the publication, even though
this may be long and cumbersome (this may not be strictly true for such alleles
curated early in the FlyBase project).
- Where a reporter gene is simply
described in a publication as being driven by, e.g., an arm promoter,
the symbol of the Ecol\lacZ allele is 'arm.PI', where I
is the first letter of the surname of the first author of the paper, e.g.,
Ecol\lacZarm.PV for 'Ecol\lacZ arm promoter construct
of Vincent'.
- For logistical reasons some promoter
fusions involving reporter genes such as Ecol\lacZ, though technically
protein fusions, are simply treated as alleles of the reporter gene. The symbol
for the additional gene(s) contributing to the fusion is indicated as part
of a superscript, e.g., Ecol\lacZP\T.A92. In these special
cases there is no distinction made between promoter fusions and protein fusions
in the gene name.
B.1.4.2.
Alleles of ectopically expressed Drosophila gene products
Products of genes may be ectopically
expressed due either to juxtaposition with different regulatory sequences in
the genome (as a result of being inserted into different-than-wild-type locations
by chromosome rearrangement or P element transposition) or due to in vitro construction
creating a different constellation of regulatory sequences than in wild type.
By analogy with alleles of Ecol\lacZ
for enhancer traps, P-element-borne insertions of genes e.g., w or
ve that have a qualitatively distinct _position-dependent_ mutant phenotype
will be curated as new alleles of e.g., w or ve, e.g., veStg
caused by a particular insertion of P{HS-rho}, P{HS-rho}Stg.
The 'in vitro construct' ectopic
expression alleles currently fall into two main classes, one component or two
component systems:
One component systems:
Gene A is expressed from a promoter of gene B. The allele is typically generated
by in vitro construction. In such cases the allele symbol is of the format 'gene-Agene-B.PI',
e.g., phylsev.PC or 'gene-Agene-B.fragment descriptor'
where the author includes a promoter fragment descriptor, e.g., phylninaE.GMR.
An occasional exception is made for
promoter fusions that are widely used to provide essentially wild-type gene
function; these alleles have the mini-gene '+m construct' designation
(see below) prepended to an, e.g., heat shock designation, e.g., w+mW.hs.
It is common that authors report
a construct where e.g., ftz is expressed under a 'heat shock' or Hsp70
promoter, while providing no further details about the nature of the promoter.
For these cases the allele symbol hs.PI is employed, e.g., Antphs.PZ
for 'Antp heat shock construct of Zeng'. An 'hs' designation should be reserved
for when the heat inducible, not just the minimal, promoter fragment is used.
Where the allele is both altered
in its coding region and being expressed from an ectopic promoter the sequence
'alteration.promoter' is used in the allele designation, e.g., tor13D.hs.sev
to denote the coding sequence of tor13D expressed from a
heat shock (undefined) promoter with a sev enhancer. An exception to
this rule is made for Tags, which appear as the last component of the allele
symbol (see below).
Two component systems:
- GAL4-UAS The allele symbol
for the gene whose expression is dependent upon Scer\GAL4 shall include
'Scer\UAS' and an identifier. The identifier should reflect the construct
as named by author e.g., l(1)scDeltaB.Scer\UAS. In the
absence of any other identifier '.cIa' is used, where 'c' stands for construct,
I for the first author's last name initial and 'a' for the first in the series
(subsequent ones will be b, c, etc). e.g., ase Scer\UAS.cBa
for 'Scer\UAS construct a of Brand'.
- FLP-FRT Alleles of Scer\FLP
are named as outlined above for reporter genes, and allele symbols of genes
whose expression is dependent upon that of Scer\FLP include 'Scer\FRT'.
B.1.4.3.
Alleles of ectopically expressed non-Drosophila effector products
A note on ribozymes: FlyBase has
a foreign ribozyme gene, symbol LTSV\RBZ. Alleles of LTSV\RBZ
capture the different variants, e.g., for a heat inducible ftz-targeted
ribozyme: LTSV\RBZhs.ftz (syntax 'promoter.target gene')
will be named.
'+m' minigenes
The minigene allele designation is
used in its narrow sense, i.e., where the only difference between the allele
and the wild type is the removal of more or less non-essential sequences. Thus
the minigene allele symbol designation reserved for those cases where the gene's
own promoter is driving its expression.
The minigene allele symbols begin
with 'm', for minigene, and are followed by the construct symbol used
in the publication. If no construct symbol has been used, the string 'mIa'
where 'm' stands for minigene, 'I' for the first author's
last name initial and 'a' for the first in the series is used. If the
function of the minigene is stated to be indistinguishable from that of the
wild type allele, the 'm' is preceded by a '+'.
Tags Genes can be modified by the
addition of a tag allowing the product to be identified, purified, or targeted
to a particular subcellular distribution. Tagged alleles have the syntax 'gene-symbol
x.T:y' , where x is an identifier and y is
the name of the tag, e.g., Hsap\MYC, T:Ivir\HA1, SV40\nls2,
e.g., CycBB1.T:Hsap\Myc. Where a tag is artificial, the
species prefix Zzzz is used, e.g. T:Zzzz\His6.
B.1.4.4.
Classical alleles engineered into transgene constructs, including rescue constructs
A class of alleles are named to capture
fragments of genomic DNA used in rescue constructs. The symbol for the rescuing
allele symbol begins with '+t'. This is followed by length as stated
by authors, construct symbol if length is not given or '+tIa', where
't' stands for transgene, 'I' for the first author's last
name initial and 'a' for the first in the series (if neither length
nor construct symbol is stated). When rescue is incomplete, the construct is
considered as carrying a mutant allele. Allele designator is construct symbol,
'length of genomic insert.tIa' if no symbol is given or 'tIa'
where neither length nor construct symbol is stated.
When a classic allele, e.g., wa,
is put into a transgene construct it will get a new designation, e.g., wa.tIa,
to reflect its transgenic environment, where 't' stands for transgene,
'I' for the first author's last name initial and 'a' for the
first in the series
FlyBase is, of course, happy to discuss
and advise on use of nomenclature of these non-traditional alleles.
B.1.5.
Protein and transcript symbols and exon naming
FlyBase strives to link curated information
to particular protein and transcript species. In order to maintain the data
in this way, it is necessary to assign different symbols to each gene product.
Proteins, transcripts and exons are symbolized as follows.
Protein symbols are of the form cact[+]P482
where the gene symbol and allele designation are followed by a capital P and
the size of the protein in amino acids. When the size in amino acids is not
known, the size in kiloDaltons is used, e.g. grh[+]P120kD. If no size is known,
the symbol is followed by a capital letter to distinguish products that are
known to be different, e.g. Sh[+]PA, Sh[+]PB. If multiple proteins of the same
size and divergent sequence are characterized, the symbols are followed by different
capital letters, e.g. abc[+]P345A, abc[+]P345B. A generic protein symbol, e.g.
cact[+]P, is used to capture properties that cannot be specifically attributed
to one protein product of a gene.
Transcripts are similarly named.
The gene symbol and allele designation are followed by a capital R and the size
in kb, e.g. cact[+]R2.2. Where possible the size as estimated by northern blot
is used. If not, the size of the longest cDNA is used and this is indicated
in the transcript table. For transcripts of unknown size, the symbol is followed
by a capital letter, e.g. grh[+]RA, grh[+]RB. For multiple transcripts of similar
size and divergent sequence, the symbols are followed by different capital letters,
e.g. abc[+]R1.7A, abc[+]R1.7B. A generic transcript symbol, e.g. cact[+]R, is
used to capture properties that cannot be specifically attributed to one particular
transcript of a gene.
In general, all of the exons comprising
a gene are numbered consecutively from 5' to 3'. Where exons partially overlap,
they are given the same number with a suffix, e.g. 2a,2b.
In some cases, it is not possible
to attribute a characteristic to an individual gene product. For example, expression
pattern data is often obtained with probes or antibodies that recognize more
than one product of a gene. It is not rigorously known where each individual
gene product is expressed. In addition, it is often not possible to determine
which transcript observed on a northern blot corresponds to a particular cDNA.
In these cases, the data is linked to a generic protein or transcript entity
for that gene.
B.1.6.
FlyBase Genes - Interactive Fly Cross Index
FlyBase has developed a hierarchical
view of the Interactive Fly entitled "Interactive Fly
Hierarchy: cross-index to FlyBase genes". This hierarchy is accessible from
both Allied Data and Genes.
The hierarchy provides an overview of the Interactive Fly with links
to the specific Interactive Fly pages, as well as gene lists with links
to the individual gene records in FlyBase and the Interactive Fly.
This permits searches for genes grouped according to developmental and cellular
pathways and functions.
B.1.7.
Differences and omissions from Lindsley and Zimm (1992)
All errors found in Lindsley
and Zimm (1992) have been corrected. A list
of these errors, sorted by page number, is in the file errors.txt in the
Redbook section of FlyBase Documents.
The material in the DELETION
MAP tables in the 'lethals' section of Lindsley
and Zimm (1992) is not included; these tables are available in the Redbook
section of Maps. The
tables of Lindsley and Zimm (1992)
have been broken down and the data incorporated into the text of the relevant
gene record. All references
within the body of a text entry of Lindsley
and Zimm (1992), i.e., not in the references: field, have
been duplicated into the references: field. With a very few
exceptions all references are to be found in the FlyBase Bibliography
and carry FlyBase reference ID numbers. The
molecular map figures in Lindsley
and Zimm (1992) are not included in genes, but are available in Redbook/Images
sections of Documents. Lindsley and Zimm often used introductory sections for groups of genes that are, in some way or other, related (see e.g. the record for ASC, page 50). This structure is not suitable for FlyBase, and this information has, in general, been repeated in each of the relevant individual gene records.
B.2. Synonyms
FlyBase maintains a record of synonyms
for gene, allele, aberration, transposon and transgene construct symbols that
have appeared in the literature and stock center stock lists. Files with tables
of synonyms and their corresponding "valid" symbols are found in the relevant
sections of FlyBase.
Synonyms have several different causes.
Sometimes two workers give the same symbol to two different genes, requiring
one of these to be changed. Sometimes two workers, either by accident or design(1),
give two different symbols to the same gene, then that which has priority should
be used. Many of the synonyms arise, however, as a consequence of minor variation
in the way a gene's or aberration's or transposon's or transgene construct's
symbol is written (e.g., with lower case or capital first letter), or by error,
either in the literature or these tables. In some cases it has been difficult
to decide whether a name is a gene synonym or just an allele name (this is especially
so for lethals). We have taken a very liberal attitude to synonyms and, when
in doubt, have included a name as a synonym even when it may more correctly
be an allele name.
The files are:
- Genes/gene-synonyms
-- For genes and their alleles. This plain-text file contains a list of synonyms
and valid symbols as 'synonym-symbol > valid-symbol', one synonym per line.
There are often many synonyms per valid symbol. Superscripts are indicated
in the text by <up> (beginning of superscript) and </up> (end
of superscript). Greek
letters are also encoded in the text (for example, alpha appears as &agr;).
- Aberrations/aberration-synonyms
-- This plain-text file contains a list of synonyms and valid symbols as 'synonym-symbol
> valid-symbol', one synonym per line.
- Transgene-construct/transposon-synonyms
(not yet available)
1.
"Scientists would rather use each other's toothbrushes than each other's
nomenclature.", Keith Yamamoto.
B.3.
Species other than D. melanogaster
FlyBase includes data on all species
from the family Drosophilidae. The 'default' species is D. melanogaster
and all symbols and names of genes, alleles, aberrations and clones from other
species have a prefix of the form Nnnn\, where N is the initial
letter of the genus (e.g. D for species in the genus Drosophila)
and nnn is normally the first three letters of the specific epithet
(e.g., sim for simulans). In formal terms all symbols and
names from D. melanogaster have the prefix Dmel\, but this
is usually omitted.
Species prefixes are also used for
non-melanogaster genes introduced into D. melanogaster via a transgene
construct, including Ecol\lacZ, Scer\GAL4 and Avic\GFP.
In addition, genes carried by natural transposable elements have the transposon
symbol as a 'species' prefix, for example, P\T, the gene for P-element
transposase. To find genes such as these in a Genes search, change the 'Species'
option from the default 'Dmel' to 'All'.
A list of all of the names
and abbreviations used by FlyBase for species is included in the Nomenclature
section of FlyBase. The species-abbreviations.txt file has the syntax:
taxgroup | abbreviation | genus | species name | common name | comment
At present, four different 'taxgroups'
are recognized:
drosophilid (i.e., species in the
family Drosophilidae), non-drosophilid eukaryote, prokaryote, transposable element
and virus (including prokaryotes viruses), and the file is sorted in this order.
We stress that identity of gene symbol
between two species cannot be used to conclude 'homology' of genes. Where known,
or strongly suspected, information concerning homologous genes within the family
is present in a *M field of the genes file.
FlyBase has made only limited efforts
to curate genes, alleles and aberrations from species other than D. melanogaster
for the period before 1989. We have back curated from D.I.S. and some
primary papers and reviews that have come to hand. For four species we have
incorporated the efforts of others:
- D. ananassae
- From a catalog of mutations and chromosome aberrations of Drosophila
ananassae provided to FlyBase by Y.N. Tobari. This was the text of Chapter
11 'Catalog of mutants' by D. Moriwaki and Y.N. Tobari in Y.N. Tobari (editor)
Drosophila ananassae: Genetical and biological aspects (Japan Scientific
Societies Press, Tokyo and Karger, Basel, 1993). We thank Professor Tobari
for his permission to make these data available in FlyBase and for providing
the data on disk.
- D. buzzatii
- From a catalog of the genes and mutations of Drosophila buzzatii
provided to FlyBase by J.S.F. Barker. This was based on Schafer, Fredline,
Knibb, Green and Barker (1993) Genetics and linkage mapping of Drosophila
buzzatii. J. Hered. 84:188--194. Where no phenotypic description
is given, it is similar to that for the mutant of the same name in D.
melanogaster, and is assumed homologous. Unless otherwise specified,
visible mutants were detected through inbreeding to F2 or F3 the progeny of
wild-caught females (Spencer, 1949). Most of the visible mutants are in the
collection of the Tucson Drosophila
Species Stock Center. FlyBase thanks Professor Barker for providing these
data on D. buzzatii.
- D. virilis
- From a list prepared for FlyBase by Professor H. Kress.
- D. subobscura
- From the lists in Krimbas (1993) 'Drosophila subobscura, Biology, Genetics
and Inversion Polymorphism'. Verlag Dr. Kovac, Hamburg.
We
would be happy to hear from colleagues who are able to review records from species
other than D. melanogaster. We thank Jerry Coyne for reviewing the
records for D. simulans, D. mauritiana and D. sechellia.
B.4.
Genetic objects from non-Drosophila species that are included in Drosophila
Sequences from many other organisms
are often included in artificial constructs introduced into the genome of Drosophila.
FlyBase calls these 'foreign genes' and they have symbols that indicate both
the species of origin and the nature of the element, e.g., Hsap\BMP4,
the BMP4 gene from humans. A list
of the species abbreviations used is to be found in the Nomenclature
section.
Just as two or more different Drosophila
genes can be engineered into a gene fusion so can two or more different foreign
gene coding regions. These are called 'foreign fusion' genes, e.g., Avic\GFP::Ecol\lacZ,
a coding fusion of Aequorea victoria GFP and the E. coli lacZ
gene.
Structural and non-coding elements
('SAFE elements', see B.1.3.) from non-Drosophila species are called foreign
SAFE elements. The most common group of foreign SAFE elements are short sequence
tags used to mark genes or their products (including epitope tags). These have
symbols that begin with 'T:', e.g., T:Hsap\MYC, the 'myc' epitope tag.
Artificial sequences are also classed as SAFE elements, e.g., T:Zzzz\His6
for a DNA sequence encoding a run of six histidine residues.
A limited class of regulatory elements
from foreign species are classified as foreign SIRE elements (synthetic and/or
isolated regulatory elements). This class is restricted to regulatory elements
widely used in an isolated context, for example as mobile activating elements.
Examples are the synthetic multiple UAS[[G]] elements, restricted to cases in
which they are used within transgene constructs designed to activate adjacent
endogenous genes.
The class of element is indicated
in a *t line, which, for the objects described in this section, can have the
following values:
- *t foreign_gene
- *t foreign_fusion
- *t safe_element.f
- *t sire_element.f
Each class, or any combination of
classes, can be extracted from the database by using the complex query form
in Genes with the "Class" option changed from the default "all" to one or more
(ctrl+click to add terms) of these categories.
For each class the origin of the
gene is described in star-coded format in a *u line with the following syntax:
*u Foreign sequence; species == <species_name>; gene|sequence|sequence
tag|function tag|epitope tag == <gene symbol>; <database_abbreviation:database_id>.
Attempts are first made to cross-reference
to another genetic database (e.g., OMIM, GDB, MGD). If such a link cannot be
made then we attempt to establish a link with a protein or nucleic acid sequence
database. The database abbreviations used will be found Reference
Manual F: Links To and from FlyBase. The gene name or symbol will be enclosed
with single quotation marks if no cross-reference to another genetic database
can be found. If no cross-reference can be established then a brief literature
reference to the object will be included within the 'comment' field. In the
case of epitope tags the comment field will normally include the 'name' of the
antibody recognizing the epitope and a literature reference.
B.5.
Maps
The Maps
section of FlyBase contains map-based browsing and query tools and data. See
Reference Manual C: Using FlyBase on the Web for
further information on these tools.
FlyBase uses Bridges' revised maps
for the banding patterns of the polytene chromosomes. See:
Bridges, 1938, J. Hered. 29: 11--13
(X chromosome), Bridges and Bridges, 1939, J. Hered. 30: 475--476 (2R), Bridges,
1941, J. Hered. 32: 64--65 (3L), Bridges, 1941, J. Hered. 32: 299--300 (3R),
Bridges, 1942, J. Hered. 33: 403--408 (2L).
B.5.1.
Sequence-based Maps
B.5.1.1.
Genome Browser, GBrowse
GBrowse (a product of the Generic Model Organism Database Project) provides a Web-based view of a specified region of the genome; the location of that region along the chromosome arm is indicated graphically. The region of interest can be specified by gene symbol, CG identifier, a mapped feature (such as a Drosophila Gene Collection cDNA clone, BAC genomic clone, P element insertion, or protein sequence accession in the SPTR database with BLASTX similarity to the genomic sequence), or a coordinate extent on a scaffold accession or chromosome arm. One can also input a sequence string using the Fly BLAST server and from the BLAST results list link to the alignment in the GBrowse view. The extent of the region (from 100 bp to 5 Mbp) can be controlled by the user using the zoom option. Adjacent regions can be viewed using the scroll option. Annotated genes, supporting data, and other sequence-aligned data (eg., P-element insertion sites and Affymetrix oligos) are shown as color-coded features flanking the central sequence axis. Features can be indentifed by mousing over the relevant graphic and viewing the feature name in the status bar; when the view is zoomed in sufficiently, or the gene labelling option is selected, the gene annotations are labelled. Included below the gbrowse view of the region are BAC in situ images. The "Display Settings" panel can be used to control the subset of features displayed, the width of the image, and other display options. For example, one can choose to have gene symbols displayed or can choose to have an expanded view of the aligned data. The data behind the GBrowse view, including cytological locations and GO gene function descriptions, can be downloaded in various flat-file formats: tabulated, FASTA, GAME-XML or GFF formats.
B.5.1.2. Drosophila Genome Overview
The FlyBase tool Drosophila Genome Overview is an extension of GBrowse that allows users to browse entire chromosome arms at once. The default view displays cytological numbered divisions, the tiling BAC genomic clones, and the annotated sequence scaffolds in GenBank. Clicking on the BAC or GenBank scaffolds takes users to the GBrowse view of the region. Users can also choose to display all of the genes along a chromosome arm, as well as cDNAs that align to the genomic sequence, P element insertions, transposable elements, and sequencing gaps. The width of the map can be adjusted, which is necessary when viewing these finer, optional features.
B.5.1.3. Apollo
A more flexible and interactive view
of the same data provided in gbrowse is possible using the Apollo
genome browser and annotator. Use of this tool requires that the Apollo
software be downloaded and installed locally; data are then loaded via a Web
connection from the annotation database. Data can be saved locally in the form
of GAME-XML flat files and subsequently reloaded into Apollo. A detailed and
comprehensive user
guide for Apollo is available. This tool provides several options for viewing
annotations and features down to the sequence level, and allows searches for
specific genomic or amino acid sequence strings. Apollo also provides editing
options, including sequence-level modifications of exon extents, addition of
alternative transcripts, deletion of existing annotations, modifications involving
merging or splitting existing annotations, and addition of comments associated
with specific genes or transcripts. There are many options for customizing the
format of the view and the data sets; these may be saved as user preferences.
B.5.2. Gene Order Maps
Gene
order maps contains maps that communicate both gene order and cytological
location. There are two formats: files whose names end '.ps' are suitable for
downloading and printing on a PostScript printer, while those ending 'txt' are
preferable for viewing in a web browser. Their format is documented in detail
in the file geneorder.doc in
the same folder.
Using the Gene Order Maps
The gene-order map communicates both
gene order and cytological location. This is presentationally rather different
on a genome-wide map than on a small, well-mapped region, and a novel format
has been adopted, which is documented here.
1. Cytological range
Each gene whose cytological location is known with a range of uncertainty less
than about two number divisions is written on a vertical line whose extent is
the range of uncertainty. Overlapping lines are staggered. To this extent, in
other words, the format is as in the EofD. A gene whose symbol exceeds nine
characters may cross more than one line; the line it is attached to always goes
through the second character of the symbol.
Bands are drawn with differing sizes,
but this is not in any way related to amount of DNA per band, as it is on the
EofD. It is only a function of how much data we need to place there.
2. "Limiting" genes
In addition, at either end of the line there is the symbol for a gene that is
known to lie to the indicated side of the gene in the middle of the line. Two
points must be emphasized about these "limiting" genes: they are not being stated
to have the same cytological location as the "limited" gene, and they are not
being stated definitely to be the neighboring gene. They are chosen by pragmatic
criteria as being the most informative genes that are known to lie to the indicated
side. These criteria include cytological location and size of range of uncertainty
of that location. This means that it is common, especially in well-mapped regions,
for a gene to appear more than once. A gene can appear as a limiter of any number
of other genes, but it will only be a limited gene on at most one line.
Limiters are identified only by direct
recombination, complementation or molecular map data; cytology (of genes or
of breakpoints) is never used. If a gene has no limiter on one side (or both),
that means that no gene can be placed to that side using direct genetic or molecular
data.
3. Multiple "limited" genes on a
single line
In the better-characterized regions, gene order is known to a degree that cannot
be clearly represented by cytological range. This is alleviated by placing two
or more genes "limited" on the same line. So as to maintain completeness of
information, a set of genes is only ever limited on the same line if (a) their
relative order is completely known, and (b) they all have identical cytological
ranges. The limiters of a line with more than one gene are known to lie to the
indicated side of all limited genes.
| y
| |
| |
1B5 |
| svr
| |
| elav
|
| |
| |
1B6 |
| |
| Appl
This says:
- the four genes shown are in the
order y, svr, elav, Appl, going from left
to right along the chromosome.
- svr and elav
lie in either 1B5 or 1B6.
It
does not say:
- y and/or Appl
lie in 1B5 or 1B6
- svr lies in 1B5
- etc.
4.
Nested or overlapping genes
The software that analyses map data understands the concept of genes within
genes, but this is hard to depict graphically without a generally more confusing
format. Sometimes, therefore, a gene will be shown as its own limiter, or as
both limited by and limiting (to the same side) another gene.
We have incorporated some molecular
data into this map, and will add much more over the coming year, but the bulk
of the information is based on genetic data. Therefore, the definition of overlap
of two genes is not necessarily that the transcription units overlap. For example,
ftz is shown as embedded in Scr, because Scr[-] ftz[+]
deficiencies exist that delete proximal material (including Antp).
5. Genes with cytological extent
A few dozen genes are stated to be deleted by deficiencies which (according
to our data) do not quite overlap, thus implying that the gene occupies the
whole region between the deficiencies (plus a bit on either side). In most cases
the gap between the genes is only one band, so we have fudged the issue by placing
the gene at the interband, e.g. y in 1B1-2:
|
|
1B1
| arth
| |
y
| y |
| | ac
1B2 ac
| |
| sc
Two files related to the correspondence
of the genetic and cytogenetic maps are also in Maps:
- cytotable.txt
is a table showing the genetic map positions that FlyBase infers from
published cytogenetic positions for genes without a known genetic map position.
These inferences were made using the genetic and cytological locations of
Ising's TE inserts. These can be found in the FlyBase Aberrations section
with symbols of the form "Tp(1;n)TE*" (where "n" is 1, 2 or 3).
B.5.3.
Computed Aberration Breakpoints and Cytological Locations of Genes
If you see computed cytologies in
FlyBase that you think are incorrect, please contact us at flybase-updates
at morgan.harvard.edu (reformat to standard e-mail address).
Five categories of information regarding
the polytene location of genes and aberration breakpoints are captured by FlyBase:
- Polytene data from chromosome
in situ hybridization of clones
- Polytene localization of aberration
breakpoints (orcein data)
- Genetic (recombination) mapping
data on gene order
- Complementation data between alleles
and aberrations
- Genomic molecular data on gene
order and proximity
Recombination,
complementation and molecular information does not reveal polytene locations
directly, but can be combined with orcein and in situ data to derive inferred
polytene locations. This type of analysis is non-trivial when conducted on a
large dataset. FlyBase has produced software which does it automatically, with
some provisos which are explained below (see 'Provisos').
The output of this software is a
'best guess' of the polytene location of each gene or aberration breakpoint
for which any relevant data are known to FlyBase. The guess is presented as
a range of uncertainty, whose ends are either polytene bands (such as 22F1)
or lettered subdivisions (such as 22F). Heterochromatic bands (such as h41)
are also used. This range appears as the polytene location of the gene or breakpoint
in the header section of the gene or aberration report, and is also used as
the underlying data for the various map-based user interfaces, such as the graphical
maps and CytoSearch.
To the extent possible (see 'Provisos'
below), the computed range of uncertainty of a gene or breakpoint is the range
consistent with ALL the data known to FlyBase. Thus, if in one publication a
gene has been reported to lie in 35B1-4, and in another publication it is reported
to lie in 35B3-6, and there is no other relevant information in FlyBase, the
computed location will be 35B3-4. More complex situations arise from complementation
and recombination data. For example, if Df(1)xyz is stated to have
its proximal breakpoint at 15A1-4, and Df(1)pqr is stated to have its
distal breakpoint at 15A3-6, and the Df's are known to overlap (because there
is a gene, abc, that they both delete), then both those breakpoints will be
computed to lie in 15A3-4 -- as will the gene abc itself.
Because of the inherent complexity
of these computations, the basis for the computed range is often far from obvious
at first sight. FlyBase therefore includes, directly following the computed
range in the Full and Abridged (but not Synopsis) gene and aberration reports,
one-line descriptions of the primary data from which each end of the range was
determined. Those from the last example above would be as follows (with arbitrary
data for the other ends of the deficiencies): note that there is no requirement
that any two data items derive from the same reference.
- For gene abc:
- Computed cytological location:
15A3-4
- Left limit from inclusion in Df(1)pqr
(FBrf0012345)
- Right limit from inclusion in
Df(1)xyz (FBrf0054321)
- For Df(1)xyz:
- Computed cytological location:
14D;15A3-4
- Limits of break 1 from polytene
analysis (FBrf0013579)
- Left limit of break 2 from inclusion
of abc (FBrf0056789)
- Right limit of break 2 from polytene
analysis (FBrf0098765)
- For Df(1)pqr:
- Computed cytological location:
15A3-4;15D
- Left limit of break 1 from polytene
analysis (FBrf0034567)
- Limits of break 2 from polytene
analysis (FBrf0097531)
Even this brief explanatory text
is often somewhat opaque, however, so FlyBase is in the process of designing
a 'Map Report', linked from the gene and aberration reports, which explains
in more detail how the various relevant items of data were used in the computation.
B.5.3.1.
Notation
Ranges are written as described elsewhere
in the Nomenclature Guidelines, with two exceptions.
The first exception concerns ranges
which are inferred from recombination data (for genes) or complementation (for
breakpoints). These are enclosed in square brackets when no range (even a wider
one) can be determined by other means. This is most commonly found for breakpoints
of cytologically invisible deficiencies and for genes which were mapped by recombination
but never cloned or mapped by complementation. Note that when an entity has
been localized explicitly (such as by in situ hybridization), but a narrower
range has been computed from other data, this narrower range is NOT bracketed:
thus, brackets specifically denote the unavailability of any direct data.
The other case concerns 'one-ended'
limits. The commonest example of this is when a deficiency is stated to delete
certain genes, thus giving it a minimum extent, but no flanking undeleted genes
are specified so no 'maximum extent' can be computed. In such cases, if there
is also no explicit cytology for the deficiency (and if it is also not stated
to be cytologically invisible -- see below) the 'half-open' range is denoted
by 'less than' and 'greater than' signs, as follows:
- For a deficiency that deletes
three genes, all localized to 28D-E:
- Computed cytological location:
<28E;>28D
- Right limit of break 1 from inclusion
of abc (FBrf0076543)
- Left limit of break 2 from inclusion
of abc (FBrf0056789)
Note that there is no 'limit line'
for the left limit of break 1 or the right limit of break 2. Note also the superficially
odd, but logically sound, mention of 28E for the left break and 28D for the
right break.
B.5.3.2.
Proximity rather than order
There are two cases in which locations
are computed based on close proximity of a pair of objects, rather than on their
chromosomal order. One is when two genes are reported to lie within 20kb or
less on a molecular map. For example, if a gene xyz is stated to lie
in 22F1-2 and a second gene, pqr, is stated to lie a few kilobases
away from xyz (and there is no other relevant information in FlyBase),
the computed location of pqr will be 22F1-2, even if there is no information
on the chromosomal order of the two genes.
The other case concerns cytologically
invisible deficiencies. If a deficiency is stated to be cytologically invisible,
the computation makes the assumption that it is less than a band in extent,
so that the ranges of uncertainty of the left and right breakpoint should be
identical. For example: if the deficiency in the previous example, which deletes
a gene in 28D-E, were said to be cytologically invisible then its computed data
would appear as follows:
- Computed cytological location:
[28D-E];[28D-E]
- Left limit of break 1 from cytological
invisibility (FBrf0002468)
- Right limit of break 1 from inclusion
of abc (FBrf0076543)
- Left limit of break 2 from inclusion
of abc (FBrf0056789)
- Right limit of break 2 from cytological
invisibility (FBrf0002468)
Note the use of square brackets as
described under "Notation", since this is a case where no explicit cytology
is available. A statement that a deficiency is less than 20kb long is, for this
purpose, treated as a statement that it is cytologically invisible.
B.5.3.3.
Provisos
Though we believe that the presentation
of computed map statements is of value to the community, providing an easily
accessible synthesis of the primary data, such statements can -- by their very
brevity -- be interpreted as more authoritative than is really justified. Certain
precautions are advisable.
- Map-based searches of genes and
aberrations, such as by CytoSearch, use only the computed ranges of uncertainty,
not the primary reports. Thus it is always advisable to search using a slightly
broader range than the one of interest, so as to match entities which have
been placed by multiple investigators in slightly varying locations.
- When two reports localize the
same entity to different ranges, but the ranges overlap (such that there is
a narrower range consistent with both reports), that narrower range is what
is presented (as explained above). But when the reported ranges do NOT overlap,
a choice must be made regarding which report to prioritize. This is done case-by-case,
going back to the original literature. Certain guidelines are used: for example,
genetic data on deficiencies are usually favored over cytological data, since
point lesions very near to a deficiency are rare. However, inevitably some
decisions are wrong -- especially when there is nothing to favor one report
over another. Data items that are excluded in this way are never deleted from
FlyBase, but are marked with the phrase '(excluded from computation of map
data)'; this allows them to be restored to the computation if and when the
balance of evidence changes. The "Map Report" currently under development
will include careful explanations of the conflicts (which can sometimes be
highly complex) underlying the suppression of such items. We welcome any community
feedback that can assist in the accuracy of this process.
B.5.3.4.
Genome-Derived Cytology
All the predicted genes have now been incorporated into FlyBase with inferred
cytology. The inference system we have used is based on the estimates that Sorsa
published a few years ago of the size in kb of each polytene band. These estimates
can be summed to give the length (according to Sorsa) in kb of a region between
two very well-mapped entities ('anchors') that are also identified on the genome.
The genome sequence gives a different number for that length, of course. So
we then apply a scaling factor, i.e. we calculate the cytology of each predicted
gene in the region between the anchors by interpolation from its sequence coordinates.
The anchors we use are a set of over 1200 P insertions that have been localised
on the genome by sequencing flanking DNA and on polytenes by Todd Laverty of
the BDGP. The scaling works out slightly different for each inter-anchor region,
of course, but we estimate that even in the middle of a region the error in
the computed location should never be more than a band or so. As the remaining
gaps in the genome sequence are filled, some currently unmappable stretches
of sequence (especially near centromeres) will be joined up with the main sequence,
and that will shift all the coordinates. Smaller changes will occur as a result
of other gap-filling in the middle of arms. These will be reflected in updates
to map locations. If you have further questions do not hesitate to mail us at
flybase-help at morgan.harvard.edu (reformat to standard e-mail
address).
B.6.
Wild genotypes and Chromosomes
Information on wild-type
genotypes and chromosomes is kept in the Wild Stocks section of Genes.
The core of wild-stocks.txt is the information on wild-type stocks from Lindsley
and Grell (1968) (itself derived from Bridges
and Brehme, 1942), supplemented with more recent data. The file not only
includes information on stocks, but also on certain chromosomes, extracted from
natural or laboratory populations, whose genetic properties have been studied
- in particular chromosomes found to induce male recombination or other phenomena
related to the activity of naturally-occurring transposable elements.
The fields in wild-stocks are:
*a Name or symbol of stock or chromosome
*c Description of cytological features
*d Date of origin as a laboratory stock or chromosome
*e Full name
*i Synonym(s)
*o Origin
*p Phenotypic characteristics and properties
*q Notes on how stock or chromosome is maintained
*s Molecular characteristics, including information on transposable elements
*w Collector
*x References
*C Class, e.g., wild-type stock; selected wild-type stock; extracted wild-type
chromosome; laboratory stock
*E A duplicate of a *x field, used to tie data to a reference
*R Collection site
B.7.
Function and Structure of Gene Products
'Function'
FlyBase uses the terms of the Gene
Ontology database to describe 'functional' attributes of gene products.
Three classes of attribute are used, function, process and cellular location.
The information is provided in three formats:
html tables sorted alphabetically by GO term
text tables sorted alphabetically by GO term
tab delimited tables with the following syntax:
DB Gene_id Gene_symbol
[NOT] GOid DB:ref evidence with aspect
In
the case where NOT is written in the '[NOT]' column then the GO term does not
apply
to
the gene it is attached to. This field is used rarely for cases of conflicting/unexpected
data.
'with'
can be used to qualify one of the following evidences:
IGI, IPI, ISS and is in the format:
database:gene_symbol (or protein_symbol or sequence_ID)
or species\gene_symbol (or protein_symbol)
'aspect'
is one of: P (process), F (function) or C (cellular compartment)
'evidence'
is one of:
IMP
= inferred from mutant phenotype
IGI
= inferred from genetic interaction
IPI
= inferred from physical interaction
ISS
= inferred from sequence similarity
IDA
= inferred from direct assay
IEP
= inferred from expression pattern
IEA
= inferred from electronic annotation
TAS
= traceable author statement
NAS
= non-traceable author statement
'Structure'
The "structure" tables includes all genes from Drosophila known to encode a
product with known protein features - for example a zinc finger domain. These
data are from two different databases. The first of these is the INTERPRO
database, a database of protein sequence domains and motifs. INTERPRO
is, in effect, a union of six different protein domain/motif databases: PROSITE,
ProDom,
SMART, TIGRFAMs,
Pfam and PRINTS.
SCOP is a database of protein
structures.
Syntax: domain <== INTERPRO_identifier>: gene_symbol<; gene_symbol>
Syntax: domain <== SCOP_identifier>: gene_symbol<; gene_symbol>
B.8.
Aberrations
Information on chromosomal aberrations
is found in the Aberrations section
of FlyBase. The initial data set was produced by merging the data in the "Chromosomes"
and "Special Chromosomes" sections of the Red Book (Lindsley
and Zimm, 1992) with Ashburner's files (compiled between 1989 and 1992)
and the "TE" transposable elements of Ising, which we feel are most naturally
considered as aberrations. In the process of this merge, a great number of synonyms
and typographical errors in aberration names were identified. New aberration
records are added through FlyBase's curation of the literature.
The representation of aberrations
from species other than D. melanogaster is the same as that for genes,
that is to say the aberration symbol will have the syntax <Nnnn\>symbol,
where Nnnn is an abbreviation
of the species. The default species will always be D. melanogaster,
in which case the species abbreviation will not be shown.
B.8.1.
List of Aberrations field descriptions
*a aberration symbol
*b genetic map position (for some small insertions and transposons/transgene
constructs)
*c comments on cytology
*e full name
*g nucleic acid sequence accession numbers
*i symbol synonym(s)
*n position-effect variegation information
*o origin/mutagen [cv]
*p phenotypic data
*q genetic data with respect to genes
*s molecular data
*u other information
*v information on availability
*w discoverer(s)
*x reference(s)
*y secondary FlyBase aberration identifier number
*z FlyBase aberration identifier number
*A associated allele
*B breakpoints
*C class of aberration [cv]
*E a duplicate of a *x field, used to tie data to
a reference
*F Breakpoints inherited from progenitor(s)
*G formal description of genetic data
*H date record entered or updated
*I genotype variant symbol
*J revised cytological data
*N new cytological order
*O progenitor genotype if relevant to aberration
*P transposon/transgene construct insertion(s)
*Q name synonym
*R comments on origin, including progenitor genotype if
irrelevant to aberration
*S alleles
*T genetic data with respect to other aberrations
*U aberration nickname or balancer short genotype
*V position effect variegation information
*W source of cytological description
*Y separable component
B.8.2.
Detailed description of the Aberrations fields
- *H. Dating
of records and updates.
All aberration records have two date fields. The first, 'Date entered', is
the date an aberration record was entered into the Sybase tables. The second
is 'Last updated', the date the record was last updated. When entered the
two dates will be the same. The 'zero' date of all records then extant
was 16 May 1994. FlyBase dates are represented as dd mm yy, mm being the initial
3-letter abbreviation of the month, and yy being the last two digits of the
year (e.g., 01 Jul 94).
- *I
,*S and *U. Genotype variant symbol, allele constitution and balancer
short genotype or aberration nickname.
Many aberrations, especially balancers, are listed in Lindsley
and Zimm under multiple names, each of which carries the same chromosomal
rearrangement but which differ genetically. We have preserved this principle,
but since this is a data set of aberrations we have chosen to list them all
in the same record. Such aberration records thus have a hierarchical structure,
like the gene/allele juxtaposition in the genes file. In the case of balancers,
every variant genotype is assigned a unique symbol (*I), if not by the author
then by FlyBase. The symbol of a particular genotype introduces a block of
data specific to that genotype, terminated either by the end of the whole
record (a # character) or by another genotype symbol. The alleles included
in that genotype are listed in *S. Again in the case of balancers,*U holds
a short genotype appropriate for use in stock lists. Included in the short
genotype is a core balancer symbol (with a very few exceptions these are limited
to the balancer symbols used in Lindsley
and Zimm) plus the additional alleles, transposons/transgene constructs
or aberrations that distinguish this genotype from others with the same core
balancer aberrations and alleles.
- *i and
*Y. Synonyms and separations.
A major effort has been made to tighten up the manner in which aberration
names are termed equivalent. We have defined the following two classes of
sense in which two aberrations can be said to be related or equivalent:
(1) Genuine synonymy. The same rearrangement is referred to under both names
in the literature. One name is chosen as the valid symbol of the aberration,
and the other is made a synonym, in a *i field.
(2) Meaningful separability of components. A rearrangement was isolated which
has been divided by non-mutagenic means into two components. These components
have properties when separate that they lack when in combination. (The commonest
case of this, of course, is when a transposition or distal translocation is
divided into aneuploid components.) Thus all three (or sometimes only two,
if only one component can be isolated) get their own valid names. However,
the components' cytology is defined by that of the original aberration, so
it would be against the principle of good data management to duplicate that
data in two or three records. Accordingly, the components are listed in the
same record as the original, using the same hierarchical structure as that
for genotype names described above. Each block begins with the symbol of the
component.
It should particularly be noted that many three-break rearrangements which
are, for example, both deficiencies and translocations, have tended in the
past to be referred to under whichever name is the more appropriate to the
work in hand, with the result that many have been listed twice with no indication
that they are the same thing. They are now collapsed to a single record with
the "losing" name as a synonym, as they can certainly never be separated without
further mutagenesis. Similarly, many transposition segregants which had been
correspondingly orphaned have been restored to separable components of their
progenitor.
- *Q Name synonym. This field records full names that correspond to
symbols that have become synonyms of aberrations. No effort
is made to represent the relationships between symbol synonms and their
corresponding name synonyms. Not all symbol synonyms have a name
synonym, and vice versa.
- *C. Aberration
class (Note: see the FlyBase Nomenclature
Document for details of aberration nomenclature).
The new cytological order of highly complex aberrations can only be formally
described (if known!) by a pseudo-pictorial notation such as that used in
Lindsley and Zimm. We have retained
such new orders as and when they are necessary. However, they have a drawback
common to many of the datasets we are incorporating into FlyBase, viz. that
they inherently duplicate data present elsewhere in the record -- in this
case, the cytological locations of the breaks. They also have failings of
expressive power: Lindsley and Zimm's
notation, for example, fails to distinguish between a breakpoint range and
a deficient segment, with the result that yet further duplication of data
must be introduced to remove the ambiguity. In order to minimize this problem,
and also in order to render the data more easily manipulable by software,
we have identified a few classes of aberration which are usually represented
by new orders in Lindsley and Zimm
but which are conceptually describable in words, just like the very simple
classes. The class always appears in the line immediately following the list
of aberration breakpoints to which it refers. Here is the list of classes
that appear in the file: all the three-break classes are explained in detail,
and an example is mentioned in which the entry there gives an explicit new
order.
Two-break classes:
- Deficiency
- Tandem duplication
- Inversion
- Translocation
- Ring
- Autosynaptic
- Dextrosynaptic
- Laevosynaptic
- Free duplication
- Free ring duplication, e.g. Dp(2;f)rl+
Three-break classes:
- Deficient translocation, e.g.
T(1;3)ct268-21A translocation in which one of the four
broken ends loses a segment before re-joining.
- Deficient inversion, e.g. In(1)N264-108
Three breaks in the same chromosome; one central region lost, the other
inverted. The lost section is that between the first two breaks listed in
the breakpoints line (*B).
- Inversion-cum-translocation, e.g.
T(1;2)C324 The first two breaks are in the same chromosome, and the
region between them is rejoined in inverted order to the other side of the
first break, such that both sides of break one are present on the same chromosome.
The remaining free ends are joined as a translocation with those resulting
from the third break.
- Bipartite duplication, e.g. Dp(1;2)K1
The (large) region between the first two breaks listed is lost, and the two
flanking segments (one of them centric) are joined as a translocation to the
free ends resulting from the third break.
- Cyclic translocation, e.g. T(1;2;3)OR14
Three breaks in three different chromosomes. The centric segment resulting
from the first break listed is joined to the acentric segment resulting from
the second, rather than the third.
- Bipartite inversion, e.g. In(3LR)BTD7
Three breaks in the same chromosome; both central segments are inverted in
place (i.e., they are not transposed).
- Uninverted insertional duplication,
e.g. Dp(1;1)hdp-b2 A copy of the segment between the first two breaks
listed is inserted at the third break; the insertion is in cytologically the
same orientation as its flanking segments.
- Uninverted insertional transposition,
e.g. Tp(1;1)B263-48 The segment between the first two
breaks listed is removed and inserted at the third break; the insertion is
in cytologically the same orientation as its flanking segments.
- Inverted insertional duplication,
e.g. Dp(1;1)ybl A copy of the segment between the first
two breaks listed is inserted at the third break; the insertion is in cytologically
inverted orientation with respect to its flanking segments.
- Inverted insertional transposition,
e.g. In(2R)C72 The segment between the first two breaks listed is
removed and inserted at the third break; the insertion is in cytologically
inverted orientation with respect to its flanking segments.
- Unoriented insertional duplication,
e.g. Dp(1;1)hdp-b4 A copy of the segment between the first two breaks
listed is inserted at the third break; the orientation of the insertion with
respect to its flanking segments is not recorded.
- Unoriented insertional transposition,
e.g. Tp(1;2)v+75d The segment between the first two breaks
listed is removed and inserted at the third break; the orientation of the
insertion with respect to its flanking segments is not recorded.
Occasionally an author must
report an aberration whose cytology is ambiguous and/or incompletely characterized.
These aberrations are named as Ab(N)identifier or, when associated
with a named allele, Ab(N)gene[allele]. N may be the chromosome
arm that includes the breaks, or the chromosome number in the case of a breakpoint
within the heterochromatin, when it is not known to which side of the centromere
the break maps. If more than one chromosome is suspected of being involved
then this is indicated with a '?'. e.g. Ab(3)ME178, Ab(2L;?)cli[eya-X9].
- *A. Associated
allele.
When an aberration is associated with one or more mutant alleles (as opposed
to being simply deficient for a gene), a *A field appears which contains a
cross-reference to the allele in Genes.
The allele name is preceded by its FlyBase allele ID, which is listed in the
genes file and will not change in the future even if the allele designation
does (such as because of newly-discovered allelism). In due course, aberrations
will also have FlyBase IDs and this cross-reference will be made bidirectional.
- *x, *E.
References. *x fields, in both gene and allele records, are references.
Syntax: *x FBrfnnnnnnn == abbreviated_reference
e.g. *x FBrf0036029 == Saigo et al., 1981, Cold Spring Harbor Symp. Quant.
Biol. 45: 815--827
The FBrf number is the unique reference identifier number from the references
table, which also includes the full reference.
The *E field is always a duplicate of a *x field within the same record. It
is a device to tie particular data to a particular reference. The data fields
then immediately follow the *E field.
The referenced block of fields is terminated by the next *E, *Y or *I field,
or the end of record line (#).
Publications that discuss a given aberration are listed in the same way as
in genes data, with FlyBase IDs cross-referencing them to the Bibliography
file. Any publication that reports mapping of one or more breakpoints to a
clone is marked out as a "Ref. with molecular data". In most cases, no information
is reported as to where the break lies on the clone; however, the information
that that reference maps the break can then be used, among other things, to
find nearby cloned genes by searching for the same reference in Genes.
In cases where actual distances in kilobases are reported, the fact is given
(with attribution to the reference) in the "molecular data" field (*s).
- *G. Formalized
genetic data.
These lines are computed from the synthesis of map data that underlies all
the cytogenetic map positions reported by the map-based tools. The symbol
"<<" should be read as "lies to the left of". The genes are chosen to
be the most informative based on available data; there is no certainty that
they are definitely the genes flanking the breakpoints, but they are the ones
whose deduced cytological locations provide the tightest localization of the
break.
- *q. Genetic
data with respect to genes.
These fields store, largely in structured form, conclusions about the relationship
between the aberration breakpoints and specified genes, based on genetic complementation
data. These data are used in the generation of the genome map.
- *T.
genetic data with respect
to other aberrations.
Phenotypic data on the interaction of combinations of aberrations when present
in the same fly, when those data do not allow attribution of the phenotype
to particular disrupted genes.
-
*V. Position effect information.
Many aberrations cause position effect variegation at one or more genes. This
information is noted in *V fields. which are of three classes:
*V position effect variegation for: [gene_symbol]; [gene_symbol]
*V no position effect variegation for: [gene_symbol]; [gene_symbol]
*V dominant position effect variegation for: [gene_symbol]; [gene_symbol]
If there is some reason to doubt whether or not a statement is true for any
particular gene, then the gene_symbol is qualified by ' \?'.
Free text information may also be added in a *p field.
- *z and
*y. FlyBase aberration identifier numbers.
These fields are for primary and secondary FlyBase aberration identifier numbers
(see section F.1. of Reference Manual F: Links To
and From FlyBase).
- *P. Transposon/transgene
construct insertions.
Natural or synthetic transposable elements carried on an aberration are recorded
here.
- *O. Progenitor
genotypes.
The *O field is for the chromosome on which the aberration was induced. This
field is only used if the progenitor is relevant to the derivative. The values
in this field will be valid FlyBase allele or aberration names. Where a *O
field houses more than one value, each followed by " \?", this signifies that
the progenitor chromosome is one of the named alternatives.
- *F. Breakpoints
inherited from progenitors.
This field may contain multiple lines. The syntax is of the form:
*F 22D1-2;33F5-34A1 (from In(2L)Cy)
*F 21B;40 (from In(2L)DTD27)
- *R. Data
about an aberrations's origin.
For example that it was simultaneously induced with another mutation/aberration,
or information about the genotype of the progenitor which is irrelevant to
the derivative. This is a formatted free text field.
- *v. Information
on availability.
If a publication reports that an aberration is lost, that information is recorded
in the *v field. Note that not all such reports in the literature are authoritative.
B.9.
Transgene constructs and insertions
The Transgene
Constructs section of FlyBase contains information on engineered
or synthetic transposons and insertions of natural and synthetic transposons,
related cosmids and plasmids, and cell culture vectors. Data on transgenic constructs
are almost exclusively derived from the literature. Sequence database entries
and personal communications from investigators provide secondary sources of
information. The data sets described below are not yet up to date, and will
be expanding rapidly in the future.
Transgene constructs
Reports on transgene constructs,
including transformation vectors, enhancer traps, and Scer\GAL4/Scer\UAS
constructs, are available through the Transgene
Construct Search page. See Reference Manual C: Using
FlyBase on the Web for information on searching the Transgene
Construct data.
The data categories in these reports
include:
- Synonyms - those used in the literature
are reported; synonyms that are the result of typos or were previously used
by FlyBase are 'silent': they do not appear in reports, but will be seen by
search routines.
- Characteristics - the most significant
characteristics of a transposon or transgene construct are captured in controlled-vocabulary
fields to facilitate searches. Such fields include uses (e.g., 'cloning vector',
'reporter construct'), features (e.g., 'selectable marker', 'complete rescue'),
cloning sites, and progenitors. Links to progenitors, descendants, and related
constructs are provided.
- Associated alleles - links to
transgenic allele records are provided. These allele reports include a brief
molecular description of that particular component of the construct. A given
allele may be associated with more than one construct, if the same fragment
of DNA is carried in each of those constructs.
- Map and sequence data - a subset
of the transgene construct reports, primarily those of general interest such
as transformation vectors and enhancer traps, include map and sequence data.
FlyBase has compiled sequence data for many constructs that are not in the
sequence databases; incomplete sequences are presented if significant portions
are known. Each sequence is broken down into segments of natural contiguous
sequence and junctions of engineered sequence that join such segments. A complete
description of each component segment is provided, including length, links
to sequence database entries, location of endpoints in the database entry,
identity of endpoints (such as restriction sites), and biological features
(such as transcription start sites, transposon termini, etc.).
Transposon
and Transgene Construct Insertions
Transposon and Transgene Construct
Insertions data include insertions of natural and synthetic transposons. Insertion
Reports can be accessed via the Insertions
Search page using a symbol-based query or a browseable listing of insertions
by cytological location.
The data categories in the Insertion
Reports include:
- Cytogenetic location
(when known), allowing access via cytology-based queries, such as CytoSearch.
Locations are based upon explicit in situ hybridization localization or inferred
location based on allelism of insertion-associated mutations with mutations
in genes that already have assigned cytogenetic locations.
- Identity of inserted transposon
or transgene construct.
- Identity of gene affected,
for those insertions that disrupt gene function.
- For enhancer traps, expression
pattern of the reporter gene, using whenever possible an extensive
descriptive controlled vocabulary. Such controlled data capture has been developed
to facilitate searches based upon some aspect of the expression pattern.
Insertion
Reports are extensively hyperlinked, including links to:
- Transgene Construct Reports:
Description of the structure and properties of the inserted transposon; these
reports include, when available, annotated maps and compiled sequence.
- Allele Reports:
Descriptions of phenotypes and other mutational aspects of insertions disrupting
gene function.
- Gene Reports:
General information on affected gene, for those insertions that disrupt a
known gene.
- BFD Reports:
Descriptions of those enhancer trap and activating element (P{EP})
insertions characterized by the Berkeley Genome Project. The BFD reports include
GTS insertion site sequence tag data, if available.
- Balancer Reports:
Descriptions of the properties of balancers containing a specific insertion,
such as for the "blue balancers."
- Stock Reports:
Descriptions of the stocks containing insertions from the public stock centers.
- References.
FlyBase
is developing comprehensive Insertion Reports that will place all relevant data
in one report.
B.10.
Stocks
The Stocks section of FlyBase includes stock lists from both public and private collections
of Drosophila. The Stocks directory contains search options, links to stock center web sites, stock order forms, and help files. Stocks should be requested from individual labs only if a comparable stock is not available from one of the public stock centers.
When the stock description provided by a public center is other than a genotype composed of valid symbols or the name of a wild-type strain, FlyBase creates a genotype where possible based on symbol synonyms. Laboratory stock lists in standardized formats are incorporated into FlyBase as is; FlyBase does not edit laboratory lists to create valid symbols. Laboratory stock lists in non-standard formats are simply posted and are available for browsing. The contents of individual laboratory stock lists are the responsibilities of the laboratories concerned and not of FlyBase. Contact Kathy Matthews (matthewk at indiana.edu, reformat to standard e-mail address) to contribute your own stock list to FlyBase.
Stock center stock information is available through Gene, Allele, Aberration and Transgene Insertion reports as well as directly from the Stocks data section. Laboratory stocks are linked to Gene, Allele, Aberration and Insertion reports when valid symbols are present in a genotype. Recently added stock center stocks may appear in the Stocks section before the links to Alleles, etc. have been updated. See Reference Manual
C: Using FlyBase on the Web for help with stock list searches.
- Stock Centers
- Laboratory lists
- The files in the Labs section are laboratory stock lists contributed to FlyBase. See lab-info.html for information on each list, including contact information for requesting stocks.
- Ordering stocks
- Requests for stocks held at public
stock centers can be submitted to the appropriate stock center using forms available
in Stocks or from the center's web site. Stock ordering options are also built into Stock reports accessed through Allele,
Insertion and Aberration reports, and CytoSearch
results.
B.11.
Genomic Clones and STSs
Genomic clone
data are archived on FlyBase as a set of text files.The Drosophila
Resources list includes information on how to request clones from the various
projects included here. Questions about these data and materials should be directed
to the genome projects themselves.
B.11.1.
Cosmids and cosmid STSs
The cosmids are those from the European
Drosophila Genome Project. The cosmid library was prepared from a Sau3A partial
digest of Oregon-R adults and is in the Lorist 6 vector. The sequence of the
Lorist 6 vector can be obtained by FTP from genome.wustl.edu, the file is in
/pub/gsc1/sequence/vector/lorist6.seq. A full description of the techniques,
and of the project as a whole, can be found in the following references:
- Sidén-Kiamos, I., R.D.C.
Saunders, L. Spanos, T. Majerus, J. Trenear, C. Savakis, C. Louis, D.M. Glover,
M. Ashburner and F.C. Kafatos. 1990. Towards a physical map of the Drosophila
melanogaster genome: Mapping of cosmid clones within defined genomic
divisions. Nucleic Acids Research 18:6261-6270.
- Kafatos, F.C., C. Louis, C. Savakis,
D.M. Glover, M. Ashburner, A.J. Link, I. Sidén-Kiamos and R.D.C. Saunders.
1991. Integrated maps of the Drosophila genome: Progress and prospects.
Trends in Genetics 7:155-161.
- Madueno, E. et al. 1995.
A physical map of the X chromosome of Drosophila melanogaster: Cosmid
contigs and sequence tagged sites. Genetics 139:1631--1647.
STS sequences of many cosmids have
been determined from either (or both) the SP6 or T7 promoters flanking the cloning
site. These sequences are available from the EMBL/GenBank/DDBJ nucleic acid
sequence data libraries. These sequences are also available from dbSTS, the
NCBI STS database. The dbSTS records may include information from more recent
matches of the STS sequences against other sequences than are available from
the EMBL/GenBank/DDBJ accessions.
See the file Drosophila
Resources for information on obtaining cosmids.
The following fields are included
in cosmids-sts.txt:
- Cosmid: The name of the cosmid.
- Contig: Information on the contig
which contains the cosmid.
- Polytene: The polytene chromosome
range of any primary in situ hybridization signal.
- Primary_sites: A list of in situ
hybridization signals interpreted as being primary sites.
- Secondary_sites: mapped secondary
in situ hybridization sites.
- Repetitive_sites: A (rough) estimate
of the number of repetitive sites.
- Chromocentral_sites: Hybridization
to chromocenter. Abbreviations are: BH beta-heterochromatin; AH alpha-heterochromatin;
NO nucleolus organizer.
- Aberr_mapping: Describes location
of in situ site with respect to aberrations or genes.
- Aberration: Similar data to Aberr_mapping
but contributed by other workers.
- STS: Name of STS.
- EMBL_AC: EMBL database accession
number of STS.
- dbSTS_AC: NCBI dbSTS database
accession number of STS.
- DB_searched: Database searched
for sequence similarities.
- DB_version: Version of database
searched.
- Search_date: Date of databases
search.
- P1: Berkeley P1 clone that is
said to include or overlap cosmid.
- YAC: St. Louis YAC clone that
is said to include or overlap cosmid.
- Accession_of_N_hit: EMBL Accession
number of nucleic acid sequence match.
- BLAST_comment: A comment on the
BLAST match(es).
- HSP_score_of_hit: HSP score from
BLAST search.
- Gene: Gene included in cosmid
(for D. melanogaster) or species: gene or protein matched in a database
search.
- Accession_of_X_hit: SWISS-PROT
accession number of protein sequence match.
This file is an output from the European
Cosmid mapping Consortium's working database, and for this reason includes internal
notes.
B.11.2.
P1 clones and P1 STSs
The P1 library of D. melanogaster
are largely obsolete and the Berkeley Drosophila
Genome Project is discouraging the use of P1 clones. See the FlyBase file
Drosophila Resources for
additional information.
B.11.3.
BAC clones and BAC STSs
Three libraries of BAC clones are
now available. These were all made from DNA of the same y[1]
; cn[1] bw[1] sp[1] stock as was used
for the Berkeley Drosophila Genome Project P1 clones.
The libraries are BACR made for the
BDGP by K. Osoegawa and P. de Jong (Roswell Park), BACE and BACH made for the
EDGP by Alain Billaud at CEPH (Centre d'Etude du Polymorphisme Humaine) with
funding provided by a MRC project grant to D.M. Glover and M. Ashburner.
The BACR library is 18,432 clones
in pBACe3.6 and the average clone size is 160-Kb. The BACE and BACH libraries
are in pBeloBAC11 and consist of 23,400 clones of size range 75 - 150-Kb.
Information about obtaining BAC clones
is included in the FlyBase file Drosophila
Resources. STS sequences of many BACs have been determined from either (or
both) the TET3 or T7 promoters flanking the cloning site. These sequences are
available from the EMBL/GenBank/DDBJ nucleic acid sequence data libraries. These
sequences are also available from dbSTS, the NCBI STS database. The dbSTS records
may include information from more recent matches of the STS sequences against
other sequences than are available from the EMBL/GenBank/DDBJ accessions.
B.11.4.
Drosophila virilis P1 Clones
The data on P1 clones from D.
virilis were provided by D. Hartl. The clones are described in:
- Lozovskaya, E.R., D.A. Petrov
and D.L. Hartl. 1993. A combined molecular and cytogenetic approach to genome
evolution in Drosophila using large-fragment cloning. Chromosoma
102:253-266.
B.11.5.
YACs
The YACS are those from the St. Louis
and Harvard projects. References for the YACs:
- Garza, D., J.W. Ajioka, D.T. Burke
and D.L. Hartl. 1989. Mapping the Drosophila genome with yeast artificial
chromosomes. Science 246:641--646.
- Ajioka, J.W., D.A. Smoller, R.W.
Jones, J. P. Carulli, A.E.C. Vellek, D. Garza, A.J. Link, I.W. Duncan and
D.L. Hartl. 1991. Drosophila genome project: One-hit coverage in
yeast artificial chromosomes. Chromosoma 100:495--509.
- Cai, H., P. Kiefel, J. Yee and
I.W. Duncan. 1994. A yeast artificial chromosome clone map of the Drosophila
genome. Genetics 136:1385--1401.
- Hartl, D. L. and Lozovskaya, E.
R., 1995, The Drosophila Genome Map: A Practical Guide. R. G. Landes, Georgetown,
Texas.
A complete set of YAC clones is maintained
by Ian Duncan and clones may be requested from him. See Drosophila
Resources for contact information.
B.12.
References - the Drosophila Bibliography
The References
section of FlyBase holds as complete a bibliography of papers, books, etc.,
concerned with the biology and genetics of Drosophila that we can assemble.
The sources of these references are given in section B.12.4.
of the FlyBase Reference Manual. A variety of search options are available (see
Reference Manual C: Using FlyBase on the Web for
information on FlyBase searches) in References and in the
All Searches section.
Reference reports include the bibliographic
citation, the National Library of Medicine's PubMed
abstract if available, and a linked list of genes, alleles and aberrations for
which the paper includes data that have been curated by FlyBase. See for example
the report of Yasuda et al., 1995. Users
should be aware that not all papers in the FlyBase bibliography have been curated
using current practice, thus a sparse list of FlyBase data items does not necessarily
indicate a lack of content in the paper.
B.12.1.
Reference formats
The bibliographic file is distributed
in four different formats:
- *.rpt - a human-readable report
file used in searches (archived)
- *.star - a field delimited text
file
- *.refer - a text file in REFER
format to allow direct import into reference handling software (archived)
- *.csv - a comma-separated-values
format for spreadsheets (archived)
There
are six groups of files for each format, sorted by decade (earlier than 1950,
1950-1959, 1960-1969, 1970-1979, 1980-1989, 1990-present). The archived
files (rpt, refer and csv formats) are available by ftp from the Indiana
server.
references-obsolete.txt
is a list of deleted FlyBase FBrf identifier numbers, with a note on whether
the reference to which this refers has been deleted from the files or merged
with another record.
Files with the extension rpt are
the report format files used for searches. Here is a typical entry:
- Title :Secretion antigens of salivary
glands of larval Drosophila melanogaster.
- Authors :Karakin,E.I.
:Lerner,T.Y.
:Kokoza,V.A.
:Sviridov,S.M.
- Year :1977
- Volume :233
- Pages :698--701
- Languages :Russian
- Issue :1
- Journal :Dokl. Akad. Nauk SSSR
- FlyBase_ID :FBrf0030018
- Also In :FBrf0030017
Complete
information for the journal abbreviation is available through the Journal/Book
Abbreviations Search or the file references-abbreviations.rpt.
The Also In field provides the FlyBase ID of any other appearances
of this paper in the literature.
references.*.star
are field delimited text files. Each record is terminated by a # character on
a line of its own, and all other lines have an * as the first character, followed
by a field-identifier letter, a space, and then the field value starting in
column 4. There are no trailing spaces -- in particular there is no space in
column 3 unless there is something in the field. # and * do not appear anywhere
other than in column 1.
- Field ordering: * means zero or
more
Uab[cd[ef[gh[ij[kl[mn[op(qr)*]]]]]]]tuvwxyzSYLATPBMQIDECZJ#
Fields *U, *v, *w, *x, *y, *z are always present even if null; others are
either absent or non-null.
- Field allocations:
*U Unique FlyBase reference identifier. Never blank.
*a ..*r Authors. Each author's surname is on one line and initials on another.
For historical reasons the first author gets surname before initials and the
rest are the other way round, so the surnames are in fields *a, *d, *f, *h,
*l, *n, *p and *r. Papers with ten or more authors get fields *q and *r
repeated as often as necessary.
*t Year of publication. This is never blank, but can be a range.
*u Title of publication. Never blank.
*v Title of part if one of a series. Blank otherwise.
*w Title of journal or book in which publication appears, unless the whole
book is the publication.
*x Publisher, if *w is not a journal.
*y volume of journal, or number of chapter in book. If spread over more than
one, these are separated by semicolons. The volume numbers can have letters
in them.
*z Page range. If *y has more than one, so does this, also separated by semicolons.
Within a volume, a page range is either a single page, a contiguous range
written as first--last, or a series of contiguous ranges separated by commas.
The page numbers quite often have letters in them.
*S Series of a journal etc.
*Y Issue number (can include letters).
*L Language(s) of publication.
*A Additional language(s), e.g. of abstracts.
*T Type of publication (book, abstract, thesis etc, see below for list). [cv]
*P Place of publication, if a book.
*B BIOSIS identifier number.
*M Medline identifier number.
*Q Zoological Record identification number.
*I ISBN (books) or ISSN (journals) number.
*D Journal CODEN.
*E for "related publication" (usually, but not always, errata).
*C A FlyBase reference ID identifier to another publication of the same article
(this could, for example, refer to a translation).
*Z FlyBase reference ID of any previously released record that has been made
obsolete by this record.
*J Indicates availability of publication in Cambridge; default is 'no'.
This is an example:
*U FBrf0030018
*a Karakin
*b E.I.
*c T.Y.
*d Lerner
*e V.A.
*f Kokoza
*g S.M.
*h Sviridov
*t 1977
*u Secretion antigens of salivary glands of larval Drosophila melanogaster.
*v
*w Dokl. Akad. Nauk SSSR
*x
*y 233
*z 698--701
*Y 1
*L Russian
#
references.*.refer
files are formatted in the Unix REFER format to allow direct import into Refer,
EndNote, Pro-Cite and other reference handling software. This format is a text
file with tags that each begin with the % symbol. Records are separated by a
blank line. In this file we use the EndNote tags. Not all the tags are used.
Note, also, that empty fields are absent from a record.
%A author(s)
%B secondary title
%C place published
%D year
%E secondary author
%F FlyBase reference ID
%G type of publication
%H ISBN (for books) or ISSN (for serials)
%I publisher
%J journal or book reference
%K keyword [not used]
%L journal CODEN
%N issue of journal
%O Medline identifier; BIOSIS identifier; language
%P pages
%Q author
%R title
%S tertiary title
%T title
%U series of journal
%V Volume
%W also published as
%X abstract
%Y tertiary author
%Z errata or reference ID(s) of relevant obsolete records
<blank line>
An example of a reference in REFER
format is:
%A E.I. Karakin
%A Lerner, T.Y.
%A Kokoza, V.A.
%A Sviridov, S.M.
%D 1977
%T Secretion antigens of salivary glands of larval Drosophila melanogaster.
%D 1977
%V 233
%P 698--701
%O Languages: Russian
%N 1
%J Dokl. Akad. Nauk SSSR
%F FBrf0030018
%W also in FBrf0030017
<blank line>
references.*.csv
files in comma-separated-values format, that can be used by many spreadsheet
and database programs. The format is:
- primary_author, other_authors,
pub_title, year, volume, publisher, pubplace, pages, volumetitle, language,
language2, series, issue, type, med_uid, biosis, ISBN or ISSN, errata, journal_abbrev,
CODEN, FlyBase_id, also_published_in, relevant_obsolete_id
primary_author
:primary author
other_authors :subsequent authors, semicolon separated
pub_title :full title of the publication
year :year of publication
volume :volume number
publisher :publisher
pubplace :place of publication
pages :page range
volumetitle :title of part if one of a series
language :language that publication is written in
language2 :any alternate languages
series :series of journal
issue :issue of journal
type :type of publication, can be Book, Abstract, etc
med_uid :Medline identifier
biosis :Biosis identifier
ISBN or ISSN :ISBN (for books) or ISSN (for serials)
CODEN :CODEN (for periodicals)
errata :if this entry is an errata (signified by a type of 'E') this field will
provide the FlyBase identifier for publication to be corrected
journal_abbrev :journal abbreviation or book reference
FlyBase_id :unique FlyBase identifier
also_published_in :papers which appear in more than one place will have FlyBase
UIDs of the other publications given here
An example of a reference in csv
format is:
"Karakin,E.I.","Lerner,T.Y.; Kokoza,V.A.;
Sviridov,S.M.","Secretion antigens of salivary glands of larval Drosophila melanogaster.","1977","233","","","698--701","","Russian","","","1","","","","0","","Dokl.
Akad. Nauk SSSR", "FBrf0030018","FBrf0030017"
B.12.2.
Reference classes
The bibliographic records fall into
several different classes. The great majority are papers in journals, but there
are also papers in edited publications, theses, manuscripts, other electronic
databases and, even, the odd film, archival material and newspaper article.
The following classes are recognized by FlyBase and encoded in the *T field
[cv]:
- abstract
- archive
- audiotape
- bibliographic list
- book
- book review
- booklet
- CD-ROM
- chart
- computer file
- database
- demonstration
- editorial
- erratum
- film
- film strip
- leaflet
- letter
- manuscript
- microfiche
- microscope slides
- newspaper article
- note
- obituary
- patent
- personal communication
- poem
- poster
- press release
- recording
- report
- review
- slides
- spoof
- stock list
- T-shirt
- thesis
- transcript of broadcast
- unpublished
- video
The
default type is a journal article or book chapter (i.e., paper).
B.12.3.
Journals and multi-author works
Because we have collected data for
the reference file from a number of different sources a variety of abbreviations
have often been used for the same journal or publication. FlyBase is totally
consistent in how it refers to any particular journal or any other publication
for which there is more (at least potentially) than one record in the bibliography
itself. It does this by maintaining a file of reference abbreviations. This
includes not only the abbreviations of journals, but also information on any
work, e.g., edited book, symposium volume, conference proceedings, abstract
book, that includes more than one independently authored contribution.
- Journals
The abbreviations used are those of the World List of Scientific Periodicals
Published in the Years 1900-1960 (4th edition, 1965) by P. Brown and
G. B. Stratton (London, Butterworths Scientific Publications) for those published
before 1960 and World List of Scientific Periodicals: New Periodical Titles
1960-1968 by K. J. Porter and C. J. Koster (London, 1970, Butterworths)
for 1960 until 1968. We have tried to use the same conventions for the abbreviations
of titles as followed in the World List for titles published after
1968, except that we have tried to be less imperialistic, and have made much
use of the List of Serials Indexed for Online Users (National Library
of Medicine, Washington 1992), the British-Union-Catalogue of Periodicals
(London, 1955-1958) and its supplements, the Serial Publications in the
British Museum (Natural History) Library (3rd ed. 1980), the Union
List of Serials (3rd ed. 1965, H. W. Wilson, NY) and its successor the
Library of Congress. New Serial Titles volumes (1950-) and Ulrich's
International Periodical Directory 31st edition (1992-1993. R.R. Bowker,
New Providence). We have used the Directory of Japanese Scientific Periodicals
(National Diet Library, Tokyo, 1979) for the names of Japanese journals and
Half a Century of Soviet Periodicals (R. Smits, Library of Congress,
Washington 1968) for many of those of the Soviet Union (as was) (this publication
includes US library holdings of these journals).
- Edited works
Here FlyBase includes not only edited books, but also proceedings of conferences,
abstract books and a variety of other publications that are not in journals
yet include more than one contribution. Where known, we include not only the
title, place and date of publication, but also the name(s) of the editor(s).
If the publication is also a part of a journal series, or of some other series,
then these data are also included in the record.
- Medline, BIOSIS, Zoological
Record, International Series and International Book numbers.
All references that we have found in BIOSIS, Zoological Record or Medline
databases have the identifier numbers of these databases attached to them.
Records of books have their ISBNs attached, if published since these were
introduced. Records of journals have their ISSNs and CODENs attached.
The
great majority of journal titles and titles of other publications have been
verified by reference to the on-line catalogs of the Library of Congress, University
of California (Melville) or the University of Cambridge.
Many journals have titles in more
than one language. In such cases the title in the second language is enclosed
within square brackets.
The file references-abbreviations.csv
lists alphabetically the journal abbreviations used, and gives the full name(s)
of the journals, place(s) of publication and, where possible, dates and volume
numbers. [The information on volume numbers and dates of publication are useful
in detecting obvious errors in citations.] This file also includes information
on all other multi-author or edited works. These are referred to in the bibliography
itself as if they were journals. Maintaining these references as abbreviations
in this file ensures total consistency. Entries are sorted alphabetically by
their abbreviation. The fields used are:
- *U Unique FlyBase reference identifier.
Never blank.
- *a .. *r Authors. Each author's
surname is on one line and initials on another. For historical reasons the
first author gets surname before initials and the rest are the other way round,
so the surnames are in fields *a, *d, *f, *h, , *l, *n, *p and *r. Papers
with ten or more authors get fields *q and *r repeated as often as necessary.
- *s Abbreviation used in *w of
references.star files.
- *u Full title.
- *v Series, and/or volume (or part)
number within a series.
- *S Series abbreviation, appears
in *S of references.star files.
- *T Full name of series.
- *t Date, or date range, of publication.
- *V Volume number, or volume number
range.
- *z Number of pages.
- *x Publisher.
- *P Place(s) of publication.
- *w Parent journal/series that
*s is an issue of. Abbreviations appearing in this field will have a full
entry (as *s) in their own right in this file.
- *Q Series of *w in which *s appeared.
- *y Volume number of *w in which
*s appeared.
- *Y Issue number of *w in which
*s appeared.
- *I ISBN (for books) or ISSN (for
serials).
- *D CODEN (for journals).
This
file is also available in csv and rpt formats.
There remain a few edited publications
and a few journals whose full details have so far proved impossible to find.
These can be recognized by only having an abbreviated title, and (usually) no
other information in references-abbreviations.csv. Any help in tracking these
down will be appreciated.
B.12.4.
Reference sources
See Reference
sources for a list of the major sources that have been incorporated into
the FlyBase Bibliography.
B.12.5.
Copyright statements
The following statement is with respect
to the copyright of bibliographic entries taken from BIOSIS:
"This database is copyrighted
by Biological Abstracts Inc. (BIOSIS®). All rights reserved.
No part of the information may be reproduced in hard copy, machine-readable
form or other form without advance written permission from BIOSIS. Information
has been obtained from public sources believed to be reliable. BIOSIS makes
a diligent effort to provide complete and accurate representation of the bioscientific
and other literature in its publications and services. However, BIOSIS does
not guarantee the accuracy, adequacy, or completeness of any information and
BIOSIS makes no warranties or representations of any kind, express or implied,
including but not limited to warranties of merchantability or fitness for particular
purpose. BIOSIS disclaims all liability for errors or omissions that may exist
and shall not be liable for any incidental, consequential or other damages (whether
resulting from negligence or otherwise) including, without limitation, exemplary
damages or lost profits arising out of or in connection with the use of this
database. Errors or omissions may be reported to Biological Abstracts Inc.,
2100 Arch Street, Philadelphia, PA 19103-1399."
The following statements are with
respect to the copyright of Parts 5 and 6 of Herskowitz's bibliography:
"Bibliography on the genetics
of Drosophila: Part 5, by Irwin H. Herskowitz is reproduced with the permission
of Macmillan Publishing Company. Copyright ©1969 by Macmillan Publishing
Company. "
"Bibliography on the genetics
of Drosophila: Part 6, by Irwin H. Herskowitz is reproduced with the permission
of Macmillan Publishing Company. Copyright ©1974 by Macmillan Publishing
Company."
B.13.
People
The People
section of FlyBase provides address and e-mail contacts for Drosophila workers.
The original list of contact information was compiled from five sources - an
E-mail address list compiled and maintained by Dr. John Haynie, the records
of the Bloomington Drosophila Stock Center, the distribution list of Drosophila
Information Newsletter, a subset of the Genetics Society of America's mailing
and membership list, and the mailing list for the European Drosophila Research
Conference.
The People
list is now user maintained via addition and correction forms available in the
People section. The file of updates
is searched along with the master file so new information is immediately available
to FlyBase users. FlyBase encourages you to keep your FlyBase contact information
up to date. Use the Add a New Address
option if there is no listing for you in the People
list. Use the Update Your Current Address
option if you wish to make corrections to an existing record. Until the next
update of the master files, any updates you provide through the correction form
will appear in search results as additional, updated, records, rather than modifying
or replacing the existing record.
The fields in people.*
are:
- Last name
- Given name
- Department
- Institution (e.g., University
or Research Institute)
- Address (e.g., Street #, Building,
Box #, Lab/Office Room #)
- City
- State (or Province/Region)
- Zip (or mail code)
- Country
- E-mail
- Alternate E-mail
- Office phone number
- Lab phone number
- Fax number
- URL (e.g., your group's Web page)
- PI (i.e., is or is not a group
leader)
- ID (FlyBase ID number)
- Date of last update
The
information contained in People is intended for the personal use of the Drosophila
and scientific communities. These lists are the property of the FlyBase Consortium
and they are not to be used for commercial purposes. Permission must be obtained
from FlyBase if they are to be used for any purpose other than that intended
by the Consortium.
B.14.
Anatomy and Images
The Anatomy
and Images section of FlyBase contains tools and data that provide
access to genetic information based on anatomy and development. If you want
to know when and where a gene is expressed (including reporter genes such as
Ecol\lacZ and Scer\GAL4), or which genes can affect a given
body part when mutant, this is the place to start. Controlled vocabularies for
anatomical features and developmental stages link, through FlyBase vocabulary
Term Reports, relevant gene, allele, transcript and protein records to stages
of development, a region of the body or to a specific body part. Miscellaneous
images and quick-time films are also accessible from this section.
- TermLink - Use this tool to search or browse for any term, and its associated
Term Report, in the Anatomy, Developmental Stage or Cellular Location controlled vocabularies. Term Reports provide links to Genes, Alleles, Transcripts, Polypeptides and Images that are associated with the term.
- Anatomy
Images Browser - Thumbnail images are organized
by developmental stage and organ systems. Image reports include an annotated
image and a listing of associated vocabulary terms.
- Life
cycle - Access Term Reports based on stages of the life cycle.
- Glossary
- definitions for selected anatomy and development controlled vocabulary terms
- Miscellaneous
images
- Drosophila
Species - Drawings of Drosophilidae species
- Mutants
- SEMs of bcd and ftz mutant embryos.
- Animation
- Animations of embryogenesis in wild type and mutant embryos, using both
photographic images and drawings. They should be viewable using standard
movie play applications.
The files are:
embryogenesis.mpg
is a cartoon version of embryogenesis using images from The
Atlas of Drosophila Development by Volker Hartenstein, Cold Spring
Harbor Laboratory Press (1993). The individual images used to make the
movie are here.
gastrulation-lateral.mpg,
gastrulation-ventral.mpg,
gastrulation-dorsal.mpg,
and head-involution.mpg
are animations of gastrulation in wild-type embryos generated from scanning
electron micrographs. The images used to generate these animations are
here.
ftz-gastrulation.mpg
and bcd-gastrulation.mpg
are animations of gastrulation in ftz
and bcd mutant embryos,
respectively. The images used to generate these animations are here.
credits-movies.doc
explains how the animations were generated and by whom.
- Contributed
images - Images dealing with Drosophila that have been contributed
by users of FlyBase. Each directory has its own documentation, and only
short descriptions of the contents of the directories is included here.
The subdirectories are:
brain-k-ito
- a directory of scanned photographs of serial sections of wild-type adult
fly brains in three directions (frontal, horizontal, and sagittal) annotated
with names of major brain structures. It was deposited by Kei Ito (Mitsubishi
Kasei Inst.).
csomes-weeks-etal
- a directory that includes images of figures from the paper by Weeks
et al. (1993) Genes & Development 7:2329--2344
and an image of a portion of the X4m chromosome from the Bg9.61 strain
of John Lis et al. (in the files HS-602A20.*). These images were deposited
by John Weeks (Duke Univ. Medical Center).
dissect-may-etal
- includes images illustrating the dissection of the Drosophila brain
and whole mount staining and mounting of adults. These were deposited
by Sean May (Univ. of Warwick).
Your contributions of images
and pictures dealing with Drosophila are welcome if they will be of interest
to other fly biologists. We recommend jpeg or gif formats for photographic
images. For line drawings, Postscript and Mac pict formats may be more suitable.
Images at 300 dpi and 640 x 480 pixels are preferable, but other formats
can be accommodated. If you have high quality images of Drosophila phenotypes,
chromosomes, gene maps, or other objects of scientific interest please contact
Thom Kaufman at kaufman at bio.indiana.edu, reformat to standard
e-mail address). Provide a description of the images you are interested
in contributing to FlyBase including the format(s) of the images and a brief
verbal description of their scientific content. Please do not send the images
themselves until you have heard from Thom.