RefMan Sections RefMan Table of Contents FlyBase Documents

FlyBase Reference Manual B. Detailed Descriptions of FlyBase Structure and Data
This section Last Updated: 10 November 2005

B.1. Genes
B.2. Synonyms
B.3. Species other than D. melanogaster
B.4. Genetic objects from non-Drosophila species that are included in Drosophila
B.5. Maps
B.6. Wild genotypes and Chromosomes
B.7. Function and Structure of Gene Products
B.8. Aberrations
- B.8.1. List of Aberrations field descriptions
- B.8.2. Detailed description of the Aberrations fields
B.9. Transgene constructs and insertions
B.10. Stocks
B.11. Genomic Clones and STSs
B.12. References - the Drosophila Bibliography
B.13. People
B.14. Anatomy and Images

B.1. Genes

B.1.1. General description of Genes data
B.1.2. List of Genes field descriptions
B.1.3. Detailed description of the Genes fields
B.1.4. Nontraditional alleles
B.1.5. Protein and transcript symbols and exon naming
B.1.6. FlyBase Genes - Interactive Fly Cross Index
B.1.7. Differences and omissions from Lindsley and Zimm (1992)

The Genes section of FlyBase contains information on Drosophila genes that has been curated from the literature and sequence databases. Data from all species of the family Drosophilidae are included. The initial data set was produced by merging the genes data in the text of Lindsley and Zimm (1992) with the old LOCI table of Ashburner, and Merriam's Genevent database. Information from all three sources has, however, been considerably revised and reformatted. New gene and allele records are added through FlyBase's curation of the literature and sequence databases. The curation of phenotypic data, a particularly complex class of Genes data, is discussed in Phenotypic Data in FlyBase, Drysdale (2001).

Some of the records in Genes will be transient. As more data become available some gene records will merge with others. Furthermore, some of these records are based on minimal data, for example, the annotation to an EMBL or GenBank sequence record. Our policy is to include data wherever we can. As records merge (or split) they will always be traceable by their secondary gene identifier numbers and by their synonyms.

One of the major differences between Lindsley and Zimm (1992) on the one hand, and Lindsley and Grell (1968) and Bridges and Brehme (1944), on the other, is that the 1944 and 1968 books were very much catalogs of mutations, rather than of genes. Bridges and Brehme (1944) and Lindsley and Grell (1968) were allele based, while Lindsley and Zimm (1992) is largely, although not entirely, gene based. FlyBase is a gene based database, and Genes reflects this change. Having said that, it will be apparent that the transition is by no means complete in genes. For the majority of genes, mutant phenotypes are described in the respective allele records. In many cases, where, as far as we know, all mutant alleles have a similar phenotype, then this description will be found in the record for the first allele in genes. Many genes in Lindsley and Zimm (1992) had no alleles specified, although it is clear that these genes were identified by one or more mutant alleles. In these cases we have arbitrarily designated an allele with the superscript 1. (Likewise, where an allele is referred to in text with a gene designation, we have regarded this as implying allele 1, where this seems reasonable, and made the change to state allele 1 explicitly). There remain, in Genes, many cases where phenotypic information is to be found within the gene record itself. This is especially so for genes for which there is a great amount of data.

Errors in Genes.

Genes data will not be free of errors, typographical, of fact, or of interpretation. Please inform FlyBase when you find any error in these data. It will then be corrected. E-mail to flybase-updates at morgan.harvard.edu (reformat to standard e-mail address) or contact a member of the FlyBase group, whose addresses and phone/fax numbers are given in Reference Manual I: The FlyBase Project.

B.1.1. General description of Genes data

The Genes file contains a set of Drosophila gene records, the data of each record being organized into many different fields. As far as possible, we have implemented controlled vocabularies for the descriptions. These are indicated by [cv]. The controlled vocabularies are to be found in controlled-vocabularies.txt. This process is by no means complete, except for some of the simpler fields, such as mutagen. For example all X ray induced alleles are described as 'X ray' (without the quotes) in the allele origin field, never 'X rays', 'X-ray' or 'X-rays'.

The use of controlled vocabularies will increase in the future. This will allow users to more easily search the database and retrieve genes or alleles with particular properties.

Overall syntax: The maximum line length is 255 characters; there are no blank lines; all lines begin with either * or #; lines that begin with # have no other characters; lines that begin with * have a letter in column 2, a space in column 3 and at least one more character beginning in column 4. The character # appears nowhere else in the file. The character * does, unfortunately, but the string *[A-Z,a-z] does not.

Record structure: The lines that are just '#' identify the end of record for a gene. All other lines hold data for a gene, each field is one or more lines that have the same character in field 2. This character identifies the field and, sometimes, its position within a record (see below).

B.1.2. List of Genes field descriptions

These are the current field designations in alphabetical order:

*a gene symbol
*b genetic location
*c cytological location
*d biological role of gene product [cv]
*e full name of gene or allele
*f cellular compartment of which gene product is a component [cv]
*g nucleic acid sequence databank and other DNA accession number
*h polymorphism data
*i symbol synonym(s)
*j xenogenetic interaction information on alleles
*k phenotypic information on alleles
*l transposable element data
*m protein database accession number
*n aberrations causing position-effect variegation of gene [cv]
*o origin/mutagen [cv]
*p phenotypic information on genes
*q information concerning functional relationships between genes
*r information on wild-type biological role
*s molecular information for genes and alleles
*t class of gene [cv]
*u miscellaneous information on genes and alleles
*v information on availability
*w discoverer
*x reference(s)
*y secondary FlyBase identifier number(s)
*z primary FlyBase identifier number
*A allele symbol
*B alternative genetic location
*C comments on cytology associated with allele
*D comments on cytological location
*E a duplicate of a *x field, used to tie data to a reference
*F function of gene product [cv]
*G insertion chromosome associated with allele
*H date record entered or updated
*I transgene construct that carries allele
*J protein domain information
*K arguably most useful aneuploids for this gene
*L synonym for transgene construct symbol
*M probable ortholog in reference species of drosophilid
*N synonym for insertion symbol
*O progenitor allele or chromosome if relevant to allele
*P aberration causing the allele
*Q complementation information concerning alleles
*R comments on origin, including progenitor genotype if irrelevant to allele
*S genetic interaction information on alleles
*T recent review article that discusses this gene
*U nickname
*V name synonym
*Y name of gene product

Field structure: The first line of each record is the *a field. There is only one of these per record. Other fields may appear in any order, and most can appear more than once, not necessarily consecutively. All fields before the first *A field (if any *A) refer to the gene. All fields between two *A fields (or between and *A field and a #) refer to the immediately preceding allele. Thus, for example, *b fields always appear before any *A fields, but *e fields can appear anywhere (e.g., "*e white" and "*e white-apricot"). Fields before the first *A are in a defined order:

aHiezyCbcwBDdJUltrfvFghmnpqsuxE

In pretty outputs the *-codes are replaced by a text term describing the field.

Special characters: There are no special characters used in this file. Superscripts are enclosed between square brackets []; subscripts between double square brackets [[]]. Greek letters are written out, e.g. alpha, beta.

B.1.3. Detailed description of the Genes fields

In this description the fields are grouped logically, rather than alphabetically. Links in the list of field designations in section B.1.2. above go to the relevant detailed field descriptions below.

*H. Dating of records and updates. All gene records have two date fields. The first, 'Date entered', is the date a gene record was entered into the Sybase tables. The second is 'Last updated', the date the record was last updated. When entered the two dates will be the same. The 'zero' date of all records then extant was 16 May 1994. FlyBase dates are represented as dd mm yy, mm being the initial 3-letter abbreviation of the month, and yy being the last two digits of the year (e.g., 01 Jul 94).

*z, *y. Each gene and allele record in FlyBase has a unique identifier number (see section F.1. of Reference Manual F: Links To and From FlyBase). The primary identifier number is in the *z field, secondary identifier numbers are in *y fields.
Syntax: *z FBgn_integer
e.g., *z FBgn0001234

*a. This is the standard abbreviation (gene symbol) for the name of the gene. In the genes file, gene records are sorted alphabetically. The order of precedence is: all-Greek symbols (in alphabetical order), symbols that begin with a number (in numerical order, secondarily sorted on suffix, i.e., 1, 2, 2a, 2b, 3), symbols that begin with a letter, lower case having precedence over upper, and numerals precedence over letters, i.e., b, B, b1, ba).
Syntax: *a <Nnnn>\symbol
e.g., *a bb
*a Dhyd\Minos

Nnnn is an abbreviation for the species. The default species is D. melanogaster, in which case there is no species abbreviation. If a gene is from another species of drosophilid then this is indicated by Nnnn, where N is normally the initial letter of the genus, and nnn are normally the first three letters of the specific epithet. A list of species abbreviations is in the Nomenclature section of FlyBase.
Syntax: *e <Nnnn\>name
e.g., *e bobbed
*e Dhyd\Minos

Genes encoded by the mitochondrial genome all have the prefix Nnnn\mt:. The D. melanogaster gene encoding the cytochrome oxidase subunit II is, therefore, mt:CoII, the D. simulans gene encoding the mitochondrial proline tRNA is Dsim\mt:tRNA:P. The record MT:DNA is used for data concerning the mitochondrial genome and its products that cannot be assigned to any single mitochondrial gene. The symbol mt:ori is used for the non-coding A+T rich region of the mitochondrial origin of replication.

FlyBase includes data on artificial gene constructs, for example fusions between different genes. Fusion genes are named using the gene symbols of their components separated by a double colon, e.g., Antp::Scr. The components are listed in alphabetical order. When a component of a construct is from a species other than D. melanogaster then its symbol is prefixed by Nnnn\ to indicate the species of origin. For example the lexA gene from E. coli has the symbol Ecol\lexA. A list of the species abbreviations used is to be found in the Nomenclature section of FlyBase.

*e. This is the full name of the gene or allele. FlyBase takes a minimalist definition of a gene. As an example, Notch is regarded as a gene, but facet, Confluens, split etc. are not. These phenotypically distinct allelic forms that have, in the past, been named as if they were genetic loci are included as gene synonyms.

FlyBase is not entirely consistent in the way directly duplicated genes are handled: for example the five HSP70 encoding genes at Hsp70A and Hsp70B and the five larval cuticle protein encoding genes at 44D are all listed independently but the five major histone protein coding regions, tandemly repeated at the base of 2L, are each listed as a separate gene, but only once.

Some loci have only been identified by molecular methods, not having been mapped. Such loci are included in genes. Other "loci" included in this file have not been genetically mapped or characterized but are assumed to exist on the basis of, for example, a purified protein. Some loci have been impossible to name in any logical way, due to a lack of data. As a temporary expedient these are named as anon-*, where the * indicates a code. These loci will be renamed as and when more data becomes available.

STS sequences identified by Drosophila genome projects appear in the nuceic acid sequence data archive, and in the NCBI's dbSTS database. These short sequences are routinely matched against the universe of public sequence data and often have 'significant' matches to genes identified in species other than Drosophila. Such matches are clues that similar genes may occur in D. melanogaster. For this reason STS sequences with significant matches are identified as 'genes' in this file, and have the temporary name ESTSn (for STS sequences from the European project) or BSTSn (for those from Berkeley), where n is the code used by the Genome Project (e.g., ESTS100F7T, BSTSDm0092). STS sequences that match known Drosophila genes will be linked to the relevant gene record by their accession numbers in the GenBank/EMBL/DDBJ and dbSTS data archives. STS sequences that have no matches whatsoever are only linked to their parental clone in the clones tables. All STSs with matches are similarly linked to their parental clones in these tables.

*b, *B. Genetic map position. Given as Chromosome number-map position, e.g. 3-10. If a gene has not been mapped within a chromosome, then only the chromosome is indicated as, for example, 2-. This implies '2- (not located)'. Many genes have been mapped cytogenetically but not genetically. Their map positions have been estimated and are enclosed in []. (Not {} as in Lindsley and Zimm (1992).) The published map positions of some genes are clearly at variance with their cytogenetic positions. In such cases we have estimated their genetic position and indicate this by enclosing the estimate in []. *B is used to store comments on genetic map positions, including unresolved differences between some genetic map positions in Lindsley and Zimm (1992) and those in Ashburner's original files.

To estimate genetic map positions from cytogenetic we use a standard table made by plotting all of the available data and then interpolating. Estimated genetic positions are normally only made to the nearest whole number. The exceptions to this rule are in regions of very low recombination relative to the cytogenetic map. The table of cytogenetic vs. genetic map positions used is available in the Maps section of FlyBase.
Syntax: *b chromosome_symbol-number
e.g., *b 1-66.0

*c. Cytogenetic map positions. These are given as extreme left and right hand limits. In the case where one of these limits is said to be a doublet, e.g., "35D1,2", then only the outermost band (in this case 35D1 if this was the left-hand end of the range) is given. The limits are separated by a hyphen.
Syntax: *c left_hand_limit--right_hand_limit
e.g., *c 25C--25D
Many genes have been mapped genetically but not cytogenetically. Their map positions have been estimated and are enclosed in [].
Following the cytogenetic range there may be a statement regarding how it was established, e.g., by in situ hybridization. When a cytogenetic range or a statement of how it was derived appears "unattributed", i.e., not in a block headed "Data from ref. nnnn", it is computed from all available data and the tightest deducible range is shown. In cases where different reports give conflicting data, FlyBase has made a decision to mark one or more statements as suspect by prefacing them with "???". Such statements are excluded from the computations that give rise to CytoSearch data. If you find that an error has been made in this process, please inform us by email to flybase-updates at morgan.harvard.edu.

*D. *D is used to store comments on cytological map positions. This may include text giving, for example, information that a weaker in situ signal was seen elsewhere.
*K. Arguably most useful aneuploids for this gene. This is the algorithm for identifying the listed aberrations:
1) Admissible aberrations are ones that have no progenitor (too hard to work out what's missing) and whose class is one of Deficiency, Deficient translocation, Deficiency (first two listed breaks) plus Inversion, Tandem duplication or the three insertional duplication classes, plus separable components of aberrations that have no progenitor and whose class is one of the insertional transposition classes (this may be extended to inversion recombinants and translocation segregants in the future).
2) Aberrations are first prioritized into the following categories:
- those available at Bloomington
- those with a 2000 reference
- those with a 1990-9 ref not including L&Z
- those with a 1980-9 ref
- those with a 1970-9 ref
- those with a 1960-9 ref not including L&G
and then each category is sorted by distance between first two listed breaks (number of bands, smallest aberration first, taking the minimum size). This is the "league table" of aberrations.
3) The first aberration in the league table that is stated (in the aberration record) to delete the relevant gene is listed as:
*K Deficiency: <Df symbol>
Similarly the first ab in the league table that is stated (in the aberration record) to be duplicated for the relevant gene is listed as:
*K Duplication: <Dp symbol>
4) The first aberration in the league table whose minimum deleted region extends at least two bands either side of the gene's region of uncertainty is listed as:
*K Deficiency: <Df symbol> (inferred from cytology)
but only if it appears earlier in the league table than the one (if any) listed in step 3. Similarly for duplications, as:
*K Duplication: <Dp symbol> (inferred from cytology)

*i. Symbol synonyms. As mentioned above FlyBase takes a very liberal view of synonyms, and the table gene-synonyms.txt in the Genes section is provided as a tool to allow the identification of the name, and symbol, that FlyBase uses for each gene or allele. In Genes these data are kept in the *i field, for both gene and allele synonyms.
Syntax: *i synonym_symbol: synonym name <text, e.g. a reference>
e.g., *i ho: heldout
*U. Nickname. Nicknames are valid alternative symbols for a gene or allele. Nicknames support the use in Drosophila genotypes of foreign gene symbols sans the species identifier, for example, lacZ rather than Ecol\lacZ. Nicknames are assigned only to foreign genes that frequently appear in Drosophila transgene constructs.
*V. Name synonyms. This field records full names that correspond to symbols that have become synonyms of both genes and alleles. No effort is made to represent the relationships between symbol synonyms and their corresponding name synonyms. Not all symbol synonyms have a name synonym, and vice versa.
*Y. Name of the gene product. This field is moderately controlled. The suffix '-like' is used to indicate that a gene product has been named by similarity.

*d. Biological role of gene product. This field gives information concerning the biological role(s) of the gene product. The terms used are from the process ontology of the Gene Ontology Consortium database and include the GO identifier number. The 'evidence' for an attribution may follow the term as a 'pipe' (i.e., after the character |). Statements of evidence are drawn from a small controlled vocabulary:
     inferred from mutant phenotype
     inferred from genetic interaction
     inferred from physical interaction
     inferred from sequence similarity
     inferred from direct assay
     inferred from expression pattern
     inferred from electronic annotation
     traceable author statement
     non-traceable author statement

Note about 'inferred from mutant phenotype': The GO consortium regards alterations of gene expression as 'phenotype' in the context of this evidence code. The description of mutant phenotypes in the FlyBase Allele data (see section on *k), however, is restricted to alterations of the anatomy or organismal function of the mutant, and does not include expression pattern data. For more about the GO evidence codes see http://www.geneontology.org/doc/GO.terms_and_ids.

*F. Function of gene product. This field gives information about the function(s) of the gene product. The terms used are from the function ontology of the Gene Ontology Consortium database and include the GO identifier number. GO function terms also include cross-reference to the ENZYME database. Statements of evidence are drawn from a small controlled vocabulary:
     inferred from mutant phenotype
     inferred from genetic interaction
     inferred from physical interaction
     inferred from sequence similarity
     inferred from direct assay
     inferred from expression pattern
     inferred from electronic annotation
     traceable author statement
     non-traceable author statement

Note about 'inferred from mutant phenotype': The GO consortium regards alterations of gene expression as 'phenotype' in the context of this evidence code. The description of mutant phenotypes in the FlyBase Allele data (see section on *k), however, is restricted to alterations of the anatomy or organismal function of the mutant, and does not include expression pattern data. For more about the GO evidence codes see http://www.geneontology.org/doc/GO.terms_and_ids.

*J. Description of the structural features of gene products. These data are not curated by FlyBase but are from the InterPro database. InterPro provides an integrated view of the commonly used protein domain or signature databases. Release 3.1 (May 2001) was built from Pfam 6.0, PRINTS 30.0, PROSITE 16.35, ProDom 2001.1, SMART 3.1 and the current SWISS-PROT + TrEMBL data.

Syntax for InterPro cross references:
*J InterPro_number == InterPro_accession_name
e.g., *J IPR000014 == PAS domain.

*f. Cellular compartment of which gene product is a component. This field gives information about the cellular compartment(s) of which the gene product is a component. These include not only the obvious parts of a cell (nucleus, mitochondrion), but also all defined supra-molecular complexes (e.g., small ribosomal subunit, proteasome. The terms used are from the cellular component ontology of the Gene Ontology Consortium database and include the GO identifier number. Statements of evidence are drawn from a small controlled vocabulary:
     inferred from mutant phenotype
     inferred from genetic interaction
     inferred from physical interaction
     inferred from sequence similarity
     inferred from direct assay
     author said so
     not available

*g. Nucleic acid sequences. In these fields FlyBase stores pointers to nucleic acid sequence data, usually in the form of EMBL/Genbank/DDBJ/NCBI accession (AC) numbers. If a sequence has been published but is not yet in one of these data banks a brief journal reference is given instead (the full reference will be found in References). Data from the three nucleic sequence databases are received on a daily basis by FlyBase.

FlyBase is also cross-referenced to a number of other sequence databases. These cross-references are stored in the *g line (if nucleic acid) or *m line (if protein). These other databases and the database code used in FlyBase to identify links to those databases are listed in Reference Manual F.3. The accession numbers for all external sequence links are listed in the file external-databases.txt. The EMBL/NCBI/DDBJ sequence accession numbers have no database code prefix.
Syntax: *g <database_code/>accession_number
e.g., *g X12345 *g EPD/23023

If the nucleic acid sequence accession includes coding regions then each coding region has a unique PID number. These are appended to the nucleic acid sequence accession number, following a semi-colon, e.g.,
*g U42989; g1150983

Note that the number of PIDs attached to a sequence record may be more than one for two reasons. The first is that the EBI and NCBI often assign PID numbers independently to the same object; the other is that there is more than one protein product from a single gene (as the result, for example, of alternative splicing).
*r. The *r field is used for information about the wild-type biological role of a gene. The objective is for each gene record to have a *r field in which information about the gene's biological role is summarized. The present situation, however, is that for the majority of genes this information is still to be found in the *p field of the gene record. FlyBase is systematically rewriting these *p fields (historically derived from the 'Phenotype' field of Lindsley and Zimm (1992)) so that the summary of wild-type function is moved to the *r field.
*n. Aberrations causing position-effect variegation of gene. This is a controlled field to indicate aberrations that cause position-effect variegation of a gene.
Syntax: *n recessive PEV in: <aberration_symbol>
*n dominant PEV in: <aberration_symbol>
*n no PEV in: <aberration_symbol>

*m. Protein sequence data. The *m field stores pointers to protein sequence data, usually in the form of SWISS-PROT/TREMBL/PIR protein sequence databank accession (AC) numbers. Because of potential clashes between the accession numbers between databases the AC numbers are prefixed "SWP/", "TREMBL/" or "PIR/".

These fields are also used for cross-references between FlyBase and structural data on Drosophila proteins held on PDB (Protein Data Bank, Brookhaven), the NRL_3D databank and the G protein-coupled receptor database (GCRDb). These records have the prefixes PDB/, NRL_3D/ and GCR/ respectively. Cross-references to the 'factors' table of the TRANSFAC database (E. Wingender, J. Biotechnol. 35:273-280, 1994) have the prefix TF/.
Syntax: *m database_code/accession_number
e.g. *m SWP/P12428

*M. Probable ortholog in other species of drosophilid. The *M field is a pointer between "orthologous" genes in another species of drosophilid. A single species (D. melanogaster when possible) is treated as the "reference" for a given gene, and links are made with *M fields between the gene of the reference species and probable orthologs. No direct *M links are made between the non-reference genes.

Links are only made where there is good genetic or phenotypic (including sequence) evidence for homology of entire genes. It is not uncommon for a gene to be present once in species a but twice (or more) in species b (e.g., Adh in D. melanogaster vs. D. mulleri). In such cases all possible pair-wise links are made via *M fields.
Syntax: *M <Nnnn>\gene_symbol

Although genes in different species of Drosophila characterized by sequencing generally have the same gene symbol as the presumed homolog in D. melanogaster this is by no means true for genes characterized by mutations in these species. In these instances 'homology' is usually deduced from mutant phenotype and linkage group. No attempt has (yet) been made to impose homologies, over and above suggestions made in the literature.

*p. Phenotype. The *p field holds phenotypic information about a gene (or, as explained above, about its mutant alleles in some cases). This field is free text and, by and large, has not yet been standardized with respect to its vocabulary. One special use of the *p field is to hold information on gene interactions. These are expressed as follows:
*p Interacts genetically with: [gene_symbol]
*u. The *u field is for miscellaneous information concerning a gene, as free text. Notes concerning the identification of the gene, or the derivation of the gene symbol/name are stored following the corresponding 'Identification:' or 'Etymology:' prefix.
*s. Molecular data. These fields keep molecular data about genes and alleles. The *s field at the gene level is subdivided into five categories. In addition to the free text category there are four additional categories distinguished by a set of controlled prefixes:
Gene order: Accommodates gene order/orientation data derived by molecular, rather than genetic, means. The data will be presented in the format 'Gene order: In direction of increasing cytology: Dredd- su(s)+' where + indicates 5'-3' proceeds with increasing cytological location, - the opposite, and ? where the direction of transcription is not declared. Where orientation with respect to the chromosome is not known, gene sequence is preceded by the statement "Overall orientation not stated" and + and - simply reflect orientation of the transcripts with respect to each other. Where a 'Gene order' line begins or ends with an ellipsis (...) this indicates that the complete gene order described in the publication is more extensive than this subset reported for the gene in question. Gene reports for genes at either end of the reported line will continue the molecular gene order over a greater extent.
Maps to clone: Accommodates positive relationships between a gene and clones (P1, BAC, YAK) as used by large scale public genome projects.
Does not map to clone: Accommodates negative relationships between a gene and clones (P1, BAC, YAK) as used by large scale public genome projects.
Identified with: Accommodates relationships between a gene and ESTs or STSs as generated by large scale public genome projects.
The *s field at the allele level is free text but for the following three controlled prefixes.
Construct: Used to denote an 'allele' engineered in vitro by recombinant DNA technology and assayed in the genome after germline transformation or in transient assays in the whole organism or cell culture.
Amino acid replacement: prefixes a standard format statement about the nature of the mutation. Format is 'letterNletter' where each letter refers to the standard amino acid single letter code, and N is the residue of the encoded protein that is altered. Thus C67Y denotes that the cysteine at position 67 is replaced by a tyrosine. Stop codons are represented by @. Question marks ? represent uncertainty or lack of information about the amino acid or position in question.
Nucleotide substitution: prefixes a standard format statement about the nature of the mutation. Format is 'letterNletter' where each letter refers to the nucleotide, and N is the position of the affected nucleotide. Thus C313T denotes that the C at position 313 is replaced by a T. Note that the numbers in "Nucleotide substitution" data reflect author statement and do not necessarily have any significance with respect to "Nucleotide substitution" statements from other authors.
*q. The *q field holds data about genes or groups of alleles that pertain to the relationship between that gene and other genes. For example, statements that alleles of gene A complement alleles of gene B, that, in addition to explicitly named alleles of this locus, a further ten alleles had been isolated, or that the gene may be the same as another, would be kept in this field. This field accommodates data stored with several controlled prefixes:
"Source for merge: gene1 gene2" statements mark publications as containing the evidence that the named gene1 and gene2, previously recorded as being distinct, correspond to the same gene, giving rise to the merging of the two gene records in FlyBase into one.
Other controlled prefixes for this field deal with functional complementation relationships between the gene in question and genes of other species. Prefixes are:
Functionally complemented by:
Does not functionally complement:
Is not functionally complemented by:
Partially functionally complements:
Partially functionally complemented by:
Gain of function effect when expressed in:
No gain of function effect when expressed in:
*l. Information about the nature and molecular characteristics of transposable elements is contained in *l field.
*l element type:
*l terminal repeat length in bp:
*l total length in bp:
*l target site duplication length in bp:
*l number of copies in genome:
*l component genes:

The allowed values of 'element type:' are:
LINE, LINE-like retrotransposons
SINE, SINE-like elements
LTR, retroviral-like elements with long terminal repeats
IR, elements with inverted repeat termini
FB, fold-back elements
*h. Polymorphism data. The *h fields store data from population studies. These data are subdivided into categories.
variability: a (more or less) quantitative statement of variability at the locus.
sampled from: the geographic locations of the populations sampled.
sample size: the number of populations/strains analyzed.
no. of KB assayed: the extent of the region assayed.
type of assay: method used to measure variability (see CV).
comments: comments on the results and conclusions of the analysis.
*t. Class of gene. This field holds information about the class of the genetic element. The default is a protein-coding gene carried by the nuclear genome of a species of drosophilid.

The following classes of nuclear non-protein-coding gene are recognized:
*t nuclear_non-protein-coding_RNA_gene: the parent class of the following:
*t cytosolic_tRNA_gene: for tRNA encoding genes.
*t cytosolic_ribosomal_RNA_gene: for rRNA encoding genes.
*t nuclear_small_nucleolar_RNA_gene: for snoRNA encoding genes.
*t nuclear_snRNA_gene: for small-nuclear (snRNP) encoding genes.
*t nuclear_untranslated_RNA_gene: for other nuclear chromosomal genes none of whose transcripts encode a protein.
*t small_intermediate_RNA_encoding_gene: for genes reported to encode siRNAs.
*t microRNA_encoding_gene: for miRNA encoding genes.

Mitochondrial genes. Genes encoded by the mitochondrial genome have the symbol prefix 'mt:' or 'Nnnn\mt:' if from a species other than D. melanogaster.
The following classes of mitochondrial_gene are recognized:
*t mitochondrial_gene: the parent class of the following and used only for generic MT:DNA records and for the mitochondrial replication origin, mt:ori.
*t mitochondrial_protein-coding_gene: for protein coding genes of the mitochondrial genome.
*t mitochondrial_non-protein-coding_gene: the parent class of the following:
     *t mitochondrial_tRNA _gene: for mitochondrial encoded tRNA genes.
     *t mitochondrial_ribosomal_RNA_gene: for mitochondrial encoded rRNA genes.

*t pseudogene: Nonfunctional loci with sequence identity to a functional gene.

*t microsatellite: Loci composed of tandem repeats of short (1 to 10 bps) nucleotide sequences.

*t transposable_element. A natural transposable element of a drosophilid. Information concerning the class of the element is held in the *l field.

*t transposable_element_gene. A gene carried by a natural transposable element of a drosophilid. The symbol of this gene will be of the form 'N\m', where 'N' is the symbol of the transposable element and 'm' is the symbol of the particular gene.

*t repetitive_element. A natural non-coding repetitive element of a drosophilid. This is used for non-coding elements for which evidence that they are transposable is lacking. Includes satellite DNA sequences (satDNA).

*t virus_symbiont_pathogen: Viruses, symbionts, parasites and pathogens of Drosophila. Includes components of such entities.

*t safe_element: Structural and/or non-coding functional elements. Includes telomeres, centromeres, DNA amplification sites, scaffold sites, and boundary elements. Does not include non-coding elements of other classes, e.g., promoters, enhancers, introns, which are considered to be components of the default class of genes.

*t sire_element: Synthetic and/or isolated regulatory elements, restricted to regulatory elements widely used in an isolated context, such as mobile activating elements. Does not include regulatory elements used to drive reporter genes. An example is the synthetic GMR (glass multimer reporter) element, as used in transgene constructs designed to activate adjacent endogenous genes.

*t fusion_gene: Genes synthesized as a fusion of two, or more, coding regions, at least one being a Drosophila gene. Each component of a fusion gene has a single gene entry as either a normal gene, foreign_gene or a fusion_gene.
*t foreign_gene: A gene from a non-drosophilid.
*t foreign_fusion: A fusion gene, as defined above, that includes a coding region from a foreign gene.
*t foreign_transposon: Used for foreign transposons brought into Drosophila for the purposes of analysis or transgene generation.
*t foreign_transposable_element_gene: A gene carried by a transposable element of a non-drosophilid.

*t safe_element.f: A structural and non-coding functional element from a species other than D. melanogaster, frequently used in D. melanogaster transgene constructs.

*t sire_element.f: A SIRE (see definition above) from another species.

*t uncertain: Many genes in FlyBase have information that is only of historical interest, because they were identified by mutations that are now lost, were never sequenced, etc. It is important that searches of FlyBase genes not return an oppressive number of hits to such genes. Hence, we have developed a complex criterion by which genes can be classified as "uncertain", and such genes are only included in search hits if this is specifically requested on the Genes query form.

This criterion is purely rule-based, so the set of "uncertain" genes is recomputed at each genes update. The rules that comprise the criterion may be modified in the future, in the light of experience of how well they describe only the appropriate genes. The current criterion is that a gene is marked uncertain if and only if:
(it is a Drosophila melanogaster standard gene, not a virus, transposable element, etc.)
AND ( (it appeared in a prior, but not the current, release of the genome)
   OR ( (it has no references dated post-1989 except for Lindsley and Zimm and/or FlyBase curation)
     AND (it has no GO (*d, *f or *F) data)
     AND (it has no DNA/RNA or protein sequence or gene order data)
     AND (it has no alleles in any stock lists held by FlyBase, either held by public stock centers or the community)
     AND (its most specific mutant phenotype is shared by alleles of at least nine other genes)
     AND ( ( (it has no complementation data against aberrations)
       AND ( (it has no cytological or within-chromosome meiotic mapping data)
         OR ( (its cytological range of uncertainty exceeds two lettered subdivisions)
           AND (its most recent reference is pre-1970) ) ) )
             OR ( (its gene symbol is an anonymous lethal or sterile)
        AND ( (its cytological range of uncertainty exceeds two lettered subdivisions)
        OR (its most recent reference is pre-1970) ) ) ) ) )

*t multicopy_xxx (where "xxx" is another *t). Some genes are present in the Drosophila genome as clusters of genes, whose products are so similar that they are traditionally referred to by a single name. This is true of various RNA-encoding genes such as 5SrRNA and bb, and also of the histones in 39D. It is necessary in some circumstances to refer to individual members of such clusters. Hence, the "gene" 5SrRNA is given the gene class "multicopy_cytosolic_ribosomal_RNA_gene" to indicate its composite nature, and individual members of the 5SrRNA cluster are given the gene class "cytosolic_ribosomal_RNA_gene". The individual genes, as and when they are instantiated, are given symbols of the form "x:y", where "x" is the symbol of the multicopy gene and "y" is a unique identifier, e.g. "5SrRNA:CR33353". The multicopy gene and its member genes are linked by "relationship to other genes" data of the form "component genes: 5SrRNA:CR33353, ..." and "member gene of: 5SrRNA".

*t xxx_cassette (where "xxx" is another *t). There are various types of "composite gene" which are defined as such not because all their members are virtually identical, but because of some functional or structural relationship.

Two types of "cassette" are currently defined: a cluster of closely related genes with similar function and gene expression, for example
the histone complex HIS-C, and a natural transposable element, whose component genes are those that it carries. (In the case of
transposable elements we retain "transposable_element" as the gene class, as opposed to "transposable_element_gene_cassette").
As with the multi-copy genes, it is necessary to link the cassette to its parts, and this is done with "relationship to other genes" data of
the form "encoded by: HMS-Beagle" and "encoded genes: HMS-Beagle\gag, HMS-Beagle\pol".

Also, it should be noted that "multicopy_xxx" and "xxx_cassette" can be combined. The existing cases of this are bb, Ybb and HIS-C. For example, bb has *t multicopy_cytosolic_ribosomal_RNA_gene_cassette and links to the genes 2SrRNA, 5.8SrRNA, 18SrRNA and 28SrRNA by "encoded genes" lines; both bb and its components also -- potentially -- have member genes. Moreover, the RNAs are encoded genes of Ybb as well as of bb.
*A. Alleles. Each allele record begins with a *A field with the gene and allele symbol. *e, and *i fields, for the full allele name and synonyms, are used as for the gene records.
Syntax: *A gene_symbol<up>allele_symbol</up>
*e allele_name
e.g. *A bb<up>G2</up>
*e bobbed of Goldschmidt

For some loci Lindsley and Zimm (1992) gave only cross-references to Lindsley and Grell (1968) or Bridges and Brehme (1942) for lost alleles. FlyBase has included the data as published in these earlier catalogs.

There is one class of 'allele' that FlyBase treats in a non-traditional way, that of alleles named as a consequence of a variegating position effect. By definition, these do not affect the structure of the gene, only its expression. For this reason position effect alleles are not included in the genes file. The aberration which gives rise to the position effect is, of course, in the aberrations file and the fact that it causes a position effect (or not) is noted in the *V lines of that file.

There are few exceptions to this policy. There are a handful of alleles that may or may not be due to a position effect, the absence of any cytological description of their chromosomes makes it impossible to tell. In these cases their records will include a *k line as follows: *k may be due to position effect variegation of normal allele.
*v. Information on availability. If a publication reports that an allele is lost, that information is recorded in the *v field. Note that not all such reports in the literature are authoritative.
*o, *O, *R. Origin of alleles. The *o field holds the data on the 'origin' of an allele, usually the mutagen used to induce it, but the origin may well be 'natural variant'. A controlled vocabulary is used in *o. This controlled vocabulary includes the CAS Registry Numbers of chemicals.
Syntax: *o mutagen
e.g. *o spontaneous *o ethyl methane sulfonate

Where the value in *o begins 'in vitro construct' this field is bipartite, reflecting the type of in vitro mutagenesis employed to create that allele:
*o in vitro construct | regulatory fusion
*o in vitro construct | site directed

The legal entries for this field are listed in controlled-vocabularies.txt within the Documents section, along with all other mutagen terms.

The *O field is for the chromosome on which the mutation was induced or the progenitor allele name (e.g., for revertants). This field is only used if the progenitor is relevant to the derivative. The values in this field will be valid FlyBase allele or aberration or transposon insertion symbols. Where a *O field houses more than one value, each followed by " \?", this signifies that the progenitor chromosome is one of the named alternatives.

*R is miscellaneous data about an allele's origin, for example that it was simultaneously induced with another mutation, or information about the genotype of the progenitor which is irrelevant to the derivative. This is a formatted free text field.
*Q. carries miscellaneous inter-allele information as free text.
*C. Cytology of alleles. The *C field holds the information about the cytology of the allele, either that the 'Polytene chromosomes are normal' or comments about possible cytological abnormalities.
*P. Associated aberration. Holds the symbol of the aberration for those alleles caused by an aberration break. If an allele is associated with but separable from an aberration then that data will be in the *R field. If an allele was induced in an aberrant chromosome, then that is indicated in the *O field.
*G. Insertion chromosome associated with allele. Transposon or transgene construct thought to be responsible for a mutation are recorded in the *G field. Transposons and transgene constructs are named according to the rules set out in the FlyBase nomenclature document. For example, an unmarked P-element is named P{}, the lArB transgene construct is P{lArB}, a copia element, copia{}. Insertions of unidentified transposons have the symbol *{}. Following the closing brace is the allele symbol (identical to the preceding *A field); the complete symbol (e.g., P{lArB}wg^NZ) is the designation of the insertion chromosome.
*N. Synonym for insertion recorded in *G.
*I. Transposon or transgene construct that carries an allele. An allele being carried on a transposon/transgene construct, as opposed to being caused by its insertion, is denoted by the symbol of the transposon/transgene construct appearing in a *I field under the allele, e.g., *I P{lArB} under Adh^+t3.2.
*L. Synonym for transposon or transgene construct recorded in *I.
*k. Mutant phenotype. This holds the phenotypic description of the mutant allele. This description is restricted to alterations of the anatomy and organismal function of the mutant, and does not include gene expression pattern data. (This contrasts with the use of 'phenotype' in the GO term evidence code 'inferred from mutant phenotype' which does encompass expression pattern data - see *d, *F and *f). The *k field is free text, except for the following classes of information:

*k Phenotypic class: This field can be multi-component, storing information about the recessive/dominant and conditional and stage specific aspects of allele in addition to the phenotypic class into which the allele falls. Vertical bars separate the components:
     *k Phenotypic class: lethal | embryonic | maternal effect | recessive

An allele can legitimately have multiple '*k Phenotypic class:' lines.
     *k Phenotypic class: lethal | recessive
     *k Phenotypic class: flightless | dominant

Where a genotype appears in curly brackets at the end of the line, that phenotypic class of the allele is dependent on the {second site} genotype in the brackets.
     *k Phenotypic class: visible | dominant { Scer\GAL4^how-24B }

Where a '(with allele)' statement appears at the beginning of the line that phenotypic class is particular to the allelic combination of the allele that is the subject of the report and the allele (of the same gene) stated in the '(with allele)' statement.
     *k Phenotypic class: (with faf^FO8) visible

*k Phenotype manifest in: This field describes the body part affected by the mutant allele, using the body part terms as listed in the controlled vocabulary.
     *k Phenotype manifest in: wing vein L5

Where a genotype appears in brackets at the end of the line, the phenotype in that body part is dependent on the {second site} genotype in the brackets.
     *k Phenotype manifest in: wing { Scer\GAL4^dpp.blk1 }

The presence of a term in this field means simply that the named structure can demonstrate a mutant phenotype as a consequence of the mutant allele. Thus for maternal effect alleles, the embryo in which the named body part is affected is not necessarily mutant for that allele in question, though its mother was. Also, the phenotype need not be 100% penetrant and expressed for the affected body part to be recorded in a 'Phenotype manifest in:' field. Terms can be combined using an & symbol:
     Phenotype manifest in: cuticle & procephalon
     Phenotype manifest in: scutellum & macrochaetae

Where a '(with allele)' statement appears at the beginning of the line that phenotypic class is particular to the allelic combination of the allele which is the subject of the report and the allele (of the same gene) stated in the '(with allele)' statement.
     *k Phenotype manifest in: (with faf^FO8) eye

*k Mode of assay: This field is mandatory for all alleles that have '*o in vitro construct'. The possible entries in this field are:
     *k Mode of assay: In transgenic Drosophila
     *k Mode of assay: Whole-organism transient assay
     *k Mode of assay: Drosophila cell culture
     *k Mode of assay: In transgenic Drosophila (allele of one drosophilid species in genome of another drosophilid)
     *k Mode of assay: Whole-organism transient assay (allele from one drosophilid species assayed in another drosophilid)
     *k Mode of assay: In transgenic Drosophila (allele of foreign species in genome of drosophilid)
     *k Mode of assay: Whole-organism transient assay (allele of foreign species assayed in drosophilid)

The capture, storage and reporting of phenotypic data is discussed in Phenotypic Data in FlyBase, Drysdale (2001).
*S. Genetic interaction information on alleles
     *S Genetic interaction (effect, class):
     *S Genetic interaction (anatomy, effect):
     *S Genetic interaction (effect, class):
     *S Genetic interaction (effect, anatomy):
     *S Genetic interaction: free text

These 'Genetic interaction' fields store information about phenotypic class and affected body parts for mutant combinations of genetically interacting alleles. The interacting allele is indicated in the curly brackets {}. Phenotypic class and Anatomical term values are as for *k fields.
     *S Genetic interaction (class, effect): visible, enhanceable { ml[1] }
     *S Genetic interaction (anatomy, effect): eye, enhanceable { ml[1] }
     *S Genetic interaction (effect, class): enhancer, visible { S[1] }
     *S Genetic interaction (effect, anatomy): enhancer, eye { S[1] }

The capture, storage and reporting of phenotypic data is discussed in Phenotypic Data in FlyBase, Drysdale (2001).
*j. Xenogenetic interaction information on alleles
     *j Xenogenetic interaction (class, effect):
     *j Xenogenetic interaction (anatomy, effect):
     *j Xenogenetic interaction (effect, class):
     *j Xenogenetic interaction (effect, anatomy):
     *j Xenogenetic interaction: free text

These 'Xenogenetic interaction' fields store information about phenotypic class and affected body parts for mutant combinations of genetically interacting alleles where one of the interaction participants is from a species distinct from either the other of the interacting pair, or both are distinct from the species in which the assay is being performed. Examples include tests for functional complementation between candidate homologs from different species. The format of these fields is the same as for '*S Genetic interaction' fields. The interacting allele is indicated in the curly brackets {}. Phenotypic class and Anatomical term values are as for *k fields.
      *j Xenogenetic interaction (class, effect): cell death defective, suppressible { Cele\ced-9[hs.PH] }
      *j Xenogenetic interaction (anatomy, effect): leg, enhanceable { Mmus\eed[hs.PW] }
      *j Xenogenetic interaction (effect, class): suppressor, visible { Hsap\MAPT[GMR.Ex.PJ] }
      *j Xenogenetic interaction (effect, anatomy): enhancer, vMP2 neuron { Ggal\MLCK[ct.Scer\UAS], Scer\GAL4[ftz.ng] }

*x, *T, *E. References. *x fields, in both gene and allele records, are references.
Syntax: *x FBrfnnnnnnn == abbreviated_reference
e.g., *x FBrf0036029 == Saigo et al., 1981, Cold Spring Harbor Symp. Quant. Biol. 45:815--827

The FBrf number is the unique reference identifier number from references, which also includes the full reference.

*T lists recent review(s). For each gene, this is the list of all the reviews published in the last four years which were determined by FlyBase curators as having that gene as a significant topic, except that the list is truncated to more recent years when that still leaves at least three references (for example, if there are two dated 1999, two dated 1998, two dated 1997 and two dated 1996, then only the two from 1999 and the two from 1998 are listed). The most recent are placed first.

The *E field is always a duplicate of a *x field within the same record. It is a device to tie particular data to a particular reference. The data fields then immediately follow the *E field.

The referenced block of fields is terminated by the next *E or *A field, or the end of record line (#).
*w. Discoverer. This field contains the name of the individual who identified the allele, or the name of the leader of the group that identified the allele.

B.1.4. Nontraditional alleles

In addition to 'alleles' in the traditional sense, FlyBase now names and curates further classes of allele so that phenotypic or expression pattern data can be captured for in vitro construct alleles and alleles of reporter (e.g., Ecol\lacZ), effector (e.g., Scer\FLP) or toxin (e.g., Rcom\DT-A) genes. Since these alleles have not historically been named by researchers, and have been named by FlyBase, their presentation in FlyBase requires some explanation:

B.1.4.1. Alleles of reporter genes

Alleles of reporter genes currently fall into two main classes, those resulting from enhancer trap experiments, and those resulting from promoter (or other regulatory region) analysis, where a fragment is used to drive the expression of a reporter gene. Ecol\lacZ will be used for illustration.

Enhancer trap results:

The enhancer trap construct causes an allele of a gene and is expressed in a pattern consistent with insertion in that gene. The resulting aberration will be described with the format P{A92}h^L43a, and the Ecol\lacZ allele symbol is of the format Ecol\lacZ^h-L43a.
The reporter gene reflects the expression of a gene without causing a mutant allele of that gene. The resulting aberration will be described with the format P{PZ}P2023-44, where P2023-44 reflects the insertion identifier, and the Ecol\lacZ allele symbol is of the format Ecol\lacZ^hh-P2023-44.
The reporter gene reflects the expression of an undescribed gene/enhancer. The resulting aberration will be described with the format P{lacW}1.28, and the Ecol\lacZ allele symbol is of the format Ecol\lacZ^1.28.

Promoter analysis results:

Generally some fragment of a gene promoter/intron/3'-region is fused to the reporter gene. In this case the allele symbol is of the form 'gene symbol.fragment descriptor' e.g., Ecol\lacZ^eve.prox54. The fragment descriptor reflects that used in the publication, even though this may be long and cumbersome (this may not be strictly true for such alleles curated early in the FlyBase project).
Where a reporter gene is simply described in a publication as being driven by, e.g., an arm promoter, the symbol of the Ecol\lacZ allele is 'arm.PI', where I is the first letter of the surname of the first author of the paper, e.g., Ecol\lacZ^arm.PV for 'Ecol\lacZ arm promoter construct of Vincent'.
For logistical reasons some promoter fusions involving reporter genes such as Ecol\lacZ, though technically protein fusions, are simply treated as alleles of the reporter gene. The symbol for the additional gene(s) contributing to the fusion is indicated as part of a superscript, e.g., Ecol\lacZ^P\T.A92. In these special cases there is no distinction made between promoter fusions and protein fusions in the gene name.

B.1.4.2. Alleles of ectopically expressed Drosophila gene products

Products of genes may be ectopically expressed due either to juxtaposition with different regulatory sequences in the genome (as a result of being inserted into different-than-wild-type locations by chromosome rearrangement or P element transposition) or due to in vitro construction creating a different constellation of regulatory sequences than in wild type.

By analogy with alleles of Ecol\lacZ for enhancer traps, P-element-borne insertions of genes e.g., w or ve that have a qualitatively distinct _position-dependent_ mutant phenotype will be curated as new alleles of e.g., w or ve, e.g., ve^Stg caused by a particular insertion of P{HS-rho}, P{HS-rho}Stg.

The 'in vitro construct' ectopic expression alleles currently fall into two main classes, one component or two component systems:

One component systems:
Gene A is expressed from a promoter of gene B. The allele is typically generated by in vitro construction. In such cases the allele symbol is of the format 'gene-A^gene-B.PI', e.g., phyl^sev.PC or 'gene-A^{gene-B.fragment descriptor}' where the author includes a promoter fragment descriptor, e.g., phyl^ninaE.GMR.

An occasional exception is made for promoter fusions that are widely used to provide essentially wild-type gene function; these alleles have the mini-gene '+m construct' designation (see below) prepended to an, e.g., heat shock designation, e.g., w^+mW.hs.

It is common that authors report a construct where e.g., ftz is expressed under a 'heat shock' or Hsp70 promoter, while providing no further details about the nature of the promoter. For these cases the allele symbol hs.PI is employed, e.g., Antp^hs.PZ for 'Antp heat shock construct of Zeng'. An 'hs' designation should be reserved for when the heat inducible, not just the minimal, promoter fragment is used.

Where the allele is both altered in its coding region and being expressed from an ectopic promoter the sequence 'alteration.promoter' is used in the allele designation, e.g., tor^13D.hs.sev to denote the coding sequence of tor^13D expressed from a heat shock (undefined) promoter with a sev enhancer. An exception to this rule is made for Tags, which appear as the last component of the allele symbol (see below).

Two component systems:

GAL4-UAS The allele symbol for the gene whose expression is dependent upon Scer\GAL4 shall include 'Scer\UAS' and an identifier. The identifier should reflect the construct as named by author e.g., l(1)sc^{DeltaB.Scer\UAS}. In the absence of any other identifier '.cIa' is used, where 'c' stands for construct, I for the first author's last name initial and 'a' for the first in the series (subsequent ones will be b, c, etc). e.g., ase^Scer\UAS.cBa for 'Scer\UAS construct a of Brand'.
FLP-FRT Alleles of Scer\FLP are named as outlined above for reporter genes, and allele symbols of genes whose expression is dependent upon that of Scer\FLP include 'Scer\FRT'.

B.1.4.3. Alleles of ectopically expressed non-Drosophila effector products

A note on ribozymes: FlyBase has a foreign ribozyme gene, symbol LTSV\RBZ. Alleles of LTSV\RBZ capture the different variants, e.g., for a heat inducible ftz-targeted ribozyme: LTSV\RBZ^hs.ftz (syntax 'promoter.target gene') will be named.

'+m' minigenes

The minigene allele designation is used in its narrow sense, i.e., where the only difference between the allele and the wild type is the removal of more or less non-essential sequences. Thus the minigene allele symbol designation reserved for those cases where the gene's own promoter is driving its expression.

The minigene allele symbols begin with 'm', for minigene, and are followed by the construct symbol used in the publication. If no construct symbol has been used, the string 'mIa' where 'm' stands for minigene, 'I' for the first author's last name initial and 'a' for the first in the series is used. If the function of the minigene is stated to be indistinguishable from that of the wild type allele, the 'm' is preceded by a '+'.

Tags Genes can be modified by the addition of a tag allowing the product to be identified, purified, or targeted to a particular subcellular distribution. Tagged alleles have the syntax 'gene-symbol ^x.T:y' , where x is an identifier and y is the name of the tag, e.g., Hsap\MYC, T:Ivir\HA1, SV40\nls2, e.g., CycB^{B1.T:Hsap\Myc}. Where a tag is artificial, the species prefix Zzzz is used, e.g. T:Zzzz\His6.

B.1.4.4. Classical alleles engineered into transgene constructs, including rescue constructs

A class of alleles are named to capture fragments of genomic DNA used in rescue constructs. The symbol for the rescuing allele symbol begins with '+t'. This is followed by length as stated by authors, construct symbol if length is not given or '+tIa', where 't' stands for transgene, 'I' for the first author's last name initial and 'a' for the first in the series (if neither length nor construct symbol is stated). When rescue is incomplete, the construct is considered as carrying a mutant allele. Allele designator is construct symbol, 'length of genomic insert.tIa' if no symbol is given or 'tIa' where neither length nor construct symbol is stated.

When a classic allele, e.g., w^a, is put into a transgene construct it will get a new designation, e.g., w^a.tIa, to reflect its transgenic environment, where 't' stands for transgene, 'I' for the first author's last name initial and 'a' for the first in the series

FlyBase is, of course, happy to discuss and advise on use of nomenclature of these non-traditional alleles.

B.1.5. Protein and transcript symbols and exon naming

FlyBase strives to link curated information to particular protein and transcript species. In order to maintain the data in this way, it is necessary to assign different symbols to each gene product. Proteins, transcripts and exons are symbolized as follows.

Protein symbols are of the form cact[+]P482 where the gene symbol and allele designation are followed by a capital P and the size of the protein in amino acids. When the size in amino acids is not known, the size in kiloDaltons is used, e.g. grh[+]P120kD. If no size is known, the symbol is followed by a capital letter to distinguish products that are known to be different, e.g. Sh[+]PA, Sh[+]PB. If multiple proteins of the same size and divergent sequence are characterized, the symbols are followed by different capital letters, e.g. abc[+]P345A, abc[+]P345B. A generic protein symbol, e.g. cact[+]P, is used to capture properties that cannot be specifically attributed to one protein product of a gene.

Transcripts are similarly named. The gene symbol and allele designation are followed by a capital R and the size in kb, e.g. cact[+]R2.2. Where possible the size as estimated by northern blot is used. If not, the size of the longest cDNA is used and this is indicated in the transcript table. For transcripts of unknown size, the symbol is followed by a capital letter, e.g. grh[+]RA, grh[+]RB. For multiple transcripts of similar size and divergent sequence, the symbols are followed by different capital letters, e.g. abc[+]R1.7A, abc[+]R1.7B. A generic transcript symbol, e.g. cact[+]R, is used to capture properties that cannot be specifically attributed to one particular transcript of a gene.

In general, all of the exons comprising a gene are numbered consecutively from 5' to 3'. Where exons partially overlap, they are given the same number with a suffix, e.g. 2a,2b.

In some cases, it is not possible to attribute a characteristic to an individual gene product. For example, expression pattern data is often obtained with probes or antibodies that recognize more than one product of a gene. It is not rigorously known where each individual gene product is expressed. In addition, it is often not possible to determine which transcript observed on a northern blot corresponds to a particular cDNA. In these cases, the data is linked to a generic protein or transcript entity for that gene.

B.1.6. FlyBase Genes - Interactive Fly Cross Index

FlyBase has developed a hierarchical view of the Interactive Fly entitled "Interactive Fly Hierarchy: cross-index to FlyBase genes". This hierarchy is accessible from both Allied Data and Genes. The hierarchy provides an overview of the Interactive Fly with links to the specific Interactive Fly pages, as well as gene lists with links to the individual gene records in FlyBase and the Interactive Fly. This permits searches for genes grouped according to developmental and cellular pathways and functions.

B.1.7. Differences and omissions from Lindsley and Zimm (1992)

All errors found in Lindsley and Zimm (1992) have been corrected. A list of these errors, sorted by page number, is in the file errors.txt in the Redbook section of FlyBase Documents. The material in the DELETION MAP tables in the 'lethals' section of Lindsley and Zimm (1992) is not included; these tables are available in the Redbook section of Maps. The tables of Lindsley and Zimm (1992) have been broken down and the data incorporated into the text of the relevant gene record. All references within the body of a text entry of Lindsley and Zimm (1992), i.e., not in the references: field, have been duplicated into the references: field. With a very few exceptions all references are to be found in the FlyBase Bibliography and carry FlyBase reference ID numbers. The molecular map figures in Lindsley and Zimm (1992) are not included in genes, but are available in Redbook/Images sections of Documents. Lindsley and Zimm often used introductory sections for groups of genes that are, in some way or other, related (see e.g. the record for ASC, page 50). This structure is not suitable for FlyBase, and this information has, in general, been repeated in each of the relevant individual gene records.

B.2. Synonyms

FlyBase maintains a record of synonyms for gene, allele, aberration, transposon and transgene construct symbols that have appeared in the literature and stock center stock lists. Files with tables of synonyms and their corresponding "valid" symbols are found in the relevant sections of FlyBase.

Synonyms have several different causes. Sometimes two workers give the same symbol to two different genes, requiring one of these to be changed. Sometimes two workers, either by accident or design⁽¹⁾, give two different symbols to the same gene, then that which has priority should be used. Many of the synonyms arise, however, as a consequence of minor variation in the way a gene's or aberration's or transposon's or transgene construct's symbol is written (e.g., with lower case or capital first letter), or by error, either in the literature or these tables. In some cases it has been difficult to decide whether a name is a gene synonym or just an allele name (this is especially so for lethals). We have taken a very liberal attitude to synonyms and, when in doubt, have included a name as a synonym even when it may more correctly be an allele name.

The files are:

Genes/gene-synonyms -- For genes and their alleles. This plain-text file contains a list of synonyms and valid symbols as 'synonym-symbol > valid-symbol', one synonym per line. There are often many synonyms per valid symbol. Superscripts are indicated in the text by <up> (beginning of superscript) and </up> (end of superscript). Greek letters are also encoded in the text (for example, alpha appears as &agr;).
Aberrations/aberration-synonyms -- This plain-text file contains a list of synonyms and valid symbols as 'synonym-symbol > valid-symbol', one synonym per line.
Transgene-construct/transposon-synonyms (not yet available)

1. "Scientists would rather use each other's toothbrushes than each other's nomenclature.", Keith Yamamoto.

B.3. Species other than D. melanogaster

FlyBase includes data on all species from the family Drosophilidae. The 'default' species is D. melanogaster and all symbols and names of genes, alleles, aberrations and clones from other species have a prefix of the form Nnnn\, where N is the initial letter of the genus (e.g. D for species in the genus Drosophila) and nnn is normally the first three letters of the specific epithet (e.g., sim for simulans). In formal terms all symbols and names from D. melanogaster have the prefix Dmel\, but this is usually omitted.

Species prefixes are also used for non-melanogaster genes introduced into D. melanogaster via a transgene construct, including Ecol\lacZ, Scer\GAL4 and Avic\GFP. In addition, genes carried by natural transposable elements have the transposon symbol as a 'species' prefix, for example, P\T, the gene for P-element transposase. To find genes such as these in a Genes search, change the 'Species' option from the default 'Dmel' to 'All'.

At present, four different 'taxgroups' are recognized:

drosophilid (i.e., species in the family Drosophilidae), non-drosophilid eukaryote, prokaryote, transposable element and virus (including prokaryotes viruses), and the file is sorted in this order.

We stress that identity of gene symbol between two species cannot be used to conclude 'homology' of genes. Where known, or strongly suspected, information concerning homologous genes within the family is present in a *M field of the genes file.

FlyBase has made only limited efforts to curate genes, alleles and aberrations from species other than D. melanogaster for the period before 1989. We have back curated from D.I.S. and some primary papers and reviews that have come to hand. For four species we have incorporated the efforts of others:

D. ananassae - From a catalog of mutations and chromosome aberrations of Drosophila ananassae provided to FlyBase by Y.N. Tobari. This was the text of Chapter 11 'Catalog of mutants' by D. Moriwaki and Y.N. Tobari in Y.N. Tobari (editor) Drosophila ananassae: Genetical and biological aspects (Japan Scientific Societies Press, Tokyo and Karger, Basel, 1993). We thank Professor Tobari for his permission to make these data available in FlyBase and for providing the data on disk.
D. buzzatii - From a catalog of the genes and mutations of Drosophila buzzatii provided to FlyBase by J.S.F. Barker. This was based on Schafer, Fredline, Knibb, Green and Barker (1993) Genetics and linkage mapping of Drosophila buzzatii. J. Hered. 84:188--194. Where no phenotypic description is given, it is similar to that for the mutant of the same name in D. melanogaster, and is assumed homologous. Unless otherwise specified, visible mutants were detected through inbreeding to F2 or F3 the progeny of wild-caught females (Spencer, 1949). Most of the visible mutants are in the collection of the Tucson Drosophila Species Stock Center. FlyBase thanks Professor Barker for providing these data on D. buzzatii.
D. virilis - From a list prepared for FlyBase by Professor H. Kress.
D. subobscura - From the lists in Krimbas (1993) 'Drosophila subobscura, Biology, Genetics and Inversion Polymorphism'. Verlag Dr. Kovac, Hamburg.

We would be happy to hear from colleagues who are able to review records from species other than D. melanogaster. We thank Jerry Coyne for reviewing the records for D. simulans, D. mauritiana and D. sechellia.

B.4. Genetic objects from non-Drosophila species that are included in Drosophila

Sequences from many other organisms are often included in artificial constructs introduced into the genome of Drosophila. FlyBase calls these 'foreign genes' and they have symbols that indicate both the species of origin and the nature of the element, e.g., Hsap\BMP4, the BMP4 gene from humans. A list of the species abbreviations used is to be found in the Nomenclature section.

Just as two or more different Drosophila genes can be engineered into a gene fusion so can two or more different foreign gene coding regions. These are called 'foreign fusion' genes, e.g., Avic\GFP::Ecol\lacZ, a coding fusion of Aequorea victoria GFP and the E. coli lacZ gene.

Structural and non-coding elements ('SAFE elements', see B.1.3.) from non-Drosophila species are called foreign SAFE elements. The most common group of foreign SAFE elements are short sequence tags used to mark genes or their products (including epitope tags). These have symbols that begin with 'T:', e.g., T:Hsap\MYC, the 'myc' epitope tag. Artificial sequences are also classed as SAFE elements, e.g., T:Zzzz\His6 for a DNA sequence encoding a run of six histidine residues.

A limited class of regulatory elements from foreign species are classified as foreign SIRE elements (synthetic and/or isolated regulatory elements). This class is restricted to regulatory elements widely used in an isolated context, for example as mobile activating elements. Examples are the synthetic multiple UAS[[G]] elements, restricted to cases in which they are used within transgene constructs designed to activate adjacent endogenous genes.

The class of element is indicated in a *t line, which, for the objects described in this section, can have the following values:

*t foreign_gene
*t foreign_fusion
*t safe_element.f
*t sire_element.f

Each class, or any combination of classes, can be extracted from the database by using the complex query form in Genes with the "Class" option changed from the default "all" to one or more (ctrl+click to add terms) of these categories.

For each class the origin of the gene is described in star-coded format in a *u line with the following syntax:
*u Foreign sequence; species == <species_name>; gene|sequence|sequence tag|function tag|epitope tag == <gene symbol>; <database_abbreviation:database_id>.

Attempts are first made to cross-reference to another genetic database (e.g., OMIM, GDB, MGD). If such a link cannot be made then we attempt to establish a link with a protein or nucleic acid sequence database. The database abbreviations used will be found Reference Manual F: Links To and from FlyBase. The gene name or symbol will be enclosed with single quotation marks if no cross-reference to another genetic database can be found. If no cross-reference can be established then a brief literature reference to the object will be included within the 'comment' field. In the case of epitope tags the comment field will normally include the 'name' of the antibody recognizing the epitope and a literature reference.

B.5. Maps

B.5.1. Sequence-based Maps
B.5.2. Gene Order Maps
B.5.3. Computed Aberration Breakpoints and Cytological Locations of Genes

B.5.3.1. Notation
B.5.3.2. Proximity rather than Order
B.5.3.3. Provisos
B.5.3.4. Genome-derived Cytology

The Maps section of FlyBase contains map-based browsing and query tools and data. See Reference Manual C: Using FlyBase on the Web for further information on these tools.

FlyBase uses Bridges' revised maps for the banding patterns of the polytene chromosomes. See:

Bridges, 1938, J. Hered. 29: 11--13 (X chromosome), Bridges and Bridges, 1939, J. Hered. 30: 475--476 (2R), Bridges, 1941, J. Hered. 32: 64--65 (3L), Bridges, 1941, J. Hered. 32: 299--300 (3R), Bridges, 1942, J. Hered. 33: 403--408 (2L).

B.5.1. Sequence-based Maps

B.5.1.1. Genome Browser, GBrowse

GBrowse (a product of the Generic Model Organism Database Project) provides a Web-based view of a specified region of the genome; the location of that region along the chromosome arm is indicated graphically. The region of interest can be specified by gene symbol, CG identifier, a mapped feature (such as a Drosophila Gene Collection cDNA clone, BAC genomic clone, P element insertion, or protein sequence accession in the SPTR database with BLASTX similarity to the genomic sequence), or a coordinate extent on a scaffold accession or chromosome arm. One can also input a sequence string using the Fly BLAST server and from the BLAST results list link to the alignment in the GBrowse view. The extent of the region (from 100 bp to 5 Mbp) can be controlled by the user using the zoom option. Adjacent regions can be viewed using the scroll option. Annotated genes, supporting data, and other sequence-aligned data (eg., P-element insertion sites and Affymetrix oligos) are shown as color-coded features flanking the central sequence axis. Features can be indentifed by mousing over the relevant graphic and viewing the feature name in the status bar; when the view is zoomed in sufficiently, or the gene labelling option is selected, the gene annotations are labelled. Included below the gbrowse view of the region are BAC in situ images. The "Display Settings" panel can be used to control the subset of features displayed, the width of the image, and other display options. For example, one can choose to have gene symbols displayed or can choose to have an expanded view of the aligned data. The data behind the GBrowse view, including cytological locations and GO gene function descriptions, can be downloaded in various flat-file formats: tabulated, FASTA, GAME-XML or GFF formats.

B.5.1.2. Drosophila Genome Overview

The FlyBase tool Drosophila Genome Overview is an extension of GBrowse that allows users to browse entire chromosome arms at once. The default view displays cytological numbered divisions, the tiling BAC genomic clones, and the annotated sequence scaffolds in GenBank. Clicking on the BAC or GenBank scaffolds takes users to the GBrowse view of the region. Users can also choose to display all of the genes along a chromosome arm, as well as cDNAs that align to the genomic sequence, P element insertions, transposable elements, and sequencing gaps. The width of the map can be adjusted, which is necessary when viewing these finer, optional features.

B.5.1.3. Apollo

A more flexible and interactive view of the same data provided in gbrowse is possible using the Apollo genome browser and annotator. Use of this tool requires that the Apollo software be downloaded and installed locally; data are then loaded via a Web connection from the annotation database. Data can be saved locally in the form of GAME-XML flat files and subsequently reloaded into Apollo. A detailed and comprehensive user guide for Apollo is available. This tool provides several options for viewing annotations and features down to the sequence level, and allows searches for specific genomic or amino acid sequence strings. Apollo also provides editing options, including sequence-level modifications of exon extents, addition of alternative transcripts, deletion of existing annotations, modifications involving merging or splitting existing annotations, and addition of comments associated with specific genes or transcripts. There are many options for customizing the format of the view and the data sets; these may be saved as user preferences.

B.5.2. Gene Order Maps

Gene order maps contains maps that communicate both gene order and cytological location. There are two formats: files whose names end '.ps' are suitable for downloading and printing on a PostScript printer, while those ending 'txt' are preferable for viewing in a web browser. Their format is documented in detail in the file geneorder.doc in the same folder.

Using the Gene Order Maps

The gene-order map communicates both gene order and cytological location. This is presentationally rather different on a genome-wide map than on a small, well-mapped region, and a novel format has been adopted, which is documented here.

1. Cytological range
Each gene whose cytological location is known with a range of uncertainty less than about two number divisions is written on a vertical line whose extent is the range of uncertainty. Overlapping lines are staggered. To this extent, in other words, the format is as in the EofD. A gene whose symbol exceeds nine characters may cross more than one line; the line it is attached to always goes through the second character of the symbol.

Bands are drawn with differing sizes, but this is not in any way related to amount of DNA per band, as it is on the EofD. It is only a function of how much data we need to place there.

2. "Limiting" genes
In addition, at either end of the line there is the symbol for a gene that is known to lie to the indicated side of the gene in the middle of the line. Two points must be emphasized about these "limiting" genes: they are not being stated to have the same cytological location as the "limited" gene, and they are not being stated definitely to be the neighboring gene. They are chosen by pragmatic criteria as being the most informative genes that are known to lie to the indicated side. These criteria include cytological location and size of range of uncertainty of that location. This means that it is common, especially in well-mapped regions, for a gene to appear more than once. A gene can appear as a limiter of any number of other genes, but it will only be a limited gene on at most one line.

Limiters are identified only by direct recombination, complementation or molecular map data; cytology (of genes or of breakpoints) is never used. If a gene has no limiter on one side (or both), that means that no gene can be placed to that side using direct genetic or molecular data.

3. Multiple "limited" genes on a single line
In the better-characterized regions, gene order is known to a degree that cannot be clearly represented by cytological range. This is alleviated by placing two or more genes "limited" on the same line. So as to maintain completeness of information, a set of genes is only ever limited on the same line if (a) their relative order is completely known, and (b) they all have identical cytological ranges. The limiters of a line with more than one gene are known to lie to the indicated side of all limited genes.

 |      y 
 |      | 
 |      | 
1B5     | 
 |     svr 
 |      | 
 |    elav 
        | 
 |      | 
 |      | 
1B6     | 
 |      | 
 |    Appl

This says:

the four genes shown are in the order y, svr, elav, Appl, going from left to right along the chromosome.
svr and elav lie in either 1B5 or 1B6.

It does not say:

y and/or Appl lie in 1B5 or 1B6
svr lies in 1B5
etc.

4. Nested or overlapping genes
The software that analyses map data understands the concept of genes within genes, but this is hard to depict graphically without a generally more confusing format. Sometimes, therefore, a gene will be shown as its own limiter, or as both limited by and limiting (to the same side) another gene.

We have incorporated some molecular data into this map, and will add much more over the coming year, but the bulk of the information is based on genetic data. Therefore, the definition of overlap of two genes is not necessarily that the transcription units overlap. For example, ftz is shown as embedded in Scr, because Scr[-] ftz[+] deficiencies exist that delete proximal material (including Antp).

5. Genes with cytological extent
A few dozen genes are stated to be deleted by deficiencies which (according to our data) do not quite overlap, thus implying that the gene occupies the whole region between the deficiencies (plus a bit on either side). In most cases the gap between the genes is only one band, so we have fudged the issue by placing the gene at the interband, e.g. y in 1B1-2:

 | 
 | 
1B1 
 |         arth
 |          | 
            y 
 |    y     | 
 |    |     ac 
1B2   ac 
 |    | 
 |    sc

Two files related to the correspondence of the genetic and cytogenetic maps are also in Maps:

cytotable.txt is a table showing the genetic map positions that FlyBase infers from published cytogenetic positions for genes without a known genetic map position. These inferences were made using the genetic and cytological locations of Ising's TE inserts. These can be found in the FlyBase Aberrations section with symbols of the form "Tp(1;n)TE*" (where "n" is 1, 2 or 3).

B.5.3. Computed Aberration Breakpoints and Cytological Locations of Genes

If you see computed cytologies in FlyBase that you think are incorrect, please contact us at flybase-updates at morgan.harvard.edu (reformat to standard e-mail address).

Five categories of information regarding the polytene location of genes and aberration breakpoints are captured by FlyBase:

Polytene data from chromosome in situ hybridization of clones
Polytene localization of aberration breakpoints (orcein data)
Genetic (recombination) mapping data on gene order
Complementation data between alleles and aberrations
Genomic molecular data on gene order and proximity

Recombination, complementation and molecular information does not reveal polytene locations directly, but can be combined with orcein and in situ data to derive inferred polytene locations. This type of analysis is non-trivial when conducted on a large dataset. FlyBase has produced software which does it automatically, with some provisos which are explained below (see 'Provisos').

The output of this software is a 'best guess' of the polytene location of each gene or aberration breakpoint for which any relevant data are known to FlyBase. The guess is presented as a range of uncertainty, whose ends are either polytene bands (such as 22F1) or lettered subdivisions (such as 22F). Heterochromatic bands (such as h41) are also used. This range appears as the polytene location of the gene or breakpoint in the header section of the gene or aberration report, and is also used as the underlying data for the various map-based user interfaces, such as the graphical maps and CytoSearch.

To the extent possible (see 'Provisos' below), the computed range of uncertainty of a gene or breakpoint is the range consistent with ALL the data known to FlyBase. Thus, if in one publication a gene has been reported to lie in 35B1-4, and in another publication it is reported to lie in 35B3-6, and there is no other relevant information in FlyBase, the computed location will be 35B3-4. More complex situations arise from complementation and recombination data. For example, if Df(1)xyz is stated to have its proximal breakpoint at 15A1-4, and Df(1)pqr is stated to have its distal breakpoint at 15A3-6, and the Df's are known to overlap (because there is a gene, abc, that they both delete), then both those breakpoints will be computed to lie in 15A3-4 -- as will the gene abc itself.

Because of the inherent complexity of these computations, the basis for the computed range is often far from obvious at first sight. FlyBase therefore includes, directly following the computed range in the Full and Abridged (but not Synopsis) gene and aberration reports, one-line descriptions of the primary data from which each end of the range was determined. Those from the last example above would be as follows (with arbitrary data for the other ends of the deficiencies): note that there is no requirement that any two data items derive from the same reference.

For gene abc:: Computed cytological location: 15A3-4; Left limit from inclusion in Df(1)pqr (FBrf0012345); Right limit from inclusion in Df(1)xyz (FBrf0054321)
For Df(1)xyz:: Computed cytological location: 14D;15A3-4; Limits of break 1 from polytene analysis (FBrf0013579); Left limit of break 2 from inclusion of abc (FBrf0056789); Right limit of break 2 from polytene analysis (FBrf0098765)
For Df(1)pqr:: Computed cytological location: 15A3-4;15D; Left limit of break 1 from polytene analysis (FBrf0034567); Limits of break 2 from polytene analysis (FBrf0097531)

Even this brief explanatory text is often somewhat opaque, however, so FlyBase is in the process of designing a 'Map Report', linked from the gene and aberration reports, which explains in more detail how the various relevant items of data were used in the computation.

B.5.3.1. Notation

Ranges are written as described elsewhere in the Nomenclature Guidelines, with two exceptions.

The first exception concerns ranges which are inferred from recombination data (for genes) or complementation (for breakpoints). These are enclosed in square brackets when no range (even a wider one) can be determined by other means. This is most commonly found for breakpoints of cytologically invisible deficiencies and for genes which were mapped by recombination but never cloned or mapped by complementation. Note that when an entity has been localized explicitly (such as by in situ hybridization), but a narrower range has been computed from other data, this narrower range is NOT bracketed: thus, brackets specifically denote the unavailability of any direct data.

The other case concerns 'one-ended' limits. The commonest example of this is when a deficiency is stated to delete certain genes, thus giving it a minimum extent, but no flanking undeleted genes are specified so no 'maximum extent' can be computed. In such cases, if there is also no explicit cytology for the deficiency (and if it is also not stated to be cytologically invisible -- see below) the 'half-open' range is denoted by 'less than' and 'greater than' signs, as follows:

For a deficiency that deletes three genes, all localized to 28D-E:: Computed cytological location: <28E;>28D; Right limit of break 1 from inclusion of abc (FBrf0076543); Left limit of break 2 from inclusion of abc (FBrf0056789)

Note that there is no 'limit line' for the left limit of break 1 or the right limit of break 2. Note also the superficially odd, but logically sound, mention of 28E for the left break and 28D for the right break.

B.5.3.2. Proximity rather than order

There are two cases in which locations are computed based on close proximity of a pair of objects, rather than on their chromosomal order. One is when two genes are reported to lie within 20kb or less on a molecular map. For example, if a gene xyz is stated to lie in 22F1-2 and a second gene, pqr, is stated to lie a few kilobases away from xyz (and there is no other relevant information in FlyBase), the computed location of pqr will be 22F1-2, even if there is no information on the chromosomal order of the two genes.

The other case concerns cytologically invisible deficiencies. If a deficiency is stated to be cytologically invisible, the computation makes the assumption that it is less than a band in extent, so that the ranges of uncertainty of the left and right breakpoint should be identical. For example: if the deficiency in the previous example, which deletes a gene in 28D-E, were said to be cytologically invisible then its computed data would appear as follows:

Computed cytological location: [28D-E];[28D-E]

Left limit of break 1 from cytological invisibility (FBrf0002468)

Right limit of break 1 from inclusion of abc (FBrf0076543)

Left limit of break 2 from inclusion of abc (FBrf0056789)

Right limit of break 2 from cytological invisibility (FBrf0002468)

Note the use of square brackets as described under "Notation", since this is a case where no explicit cytology is available. A statement that a deficiency is less than 20kb long is, for this purpose, treated as a statement that it is cytologically invisible.

B.5.3.3. Provisos

Though we believe that the presentation of computed map statements is of value to the community, providing an easily accessible synthesis of the primary data, such statements can -- by their very brevity -- be interpreted as more authoritative than is really justified. Certain precautions are advisable.

Map-based searches of genes and aberrations, such as by CytoSearch, use only the computed ranges of uncertainty, not the primary reports. Thus it is always advisable to search using a slightly broader range than the one of interest, so as to match entities which have been placed by multiple investigators in slightly varying locations.
When two reports localize the same entity to different ranges, but the ranges overlap (such that there is a narrower range consistent with both reports), that narrower range is what is presented (as explained above). But when the reported ranges do NOT overlap, a choice must be made regarding which report to prioritize. This is done case-by-case, going back to the original literature. Certain guidelines are used: for example, genetic data on deficiencies are usually favored over cytological data, since point lesions very near to a deficiency are rare. However, inevitably some decisions are wrong -- especially when there is nothing to favor one report over another. Data items that are excluded in this way are never deleted from FlyBase, but are marked with the phrase '(excluded from computation of map data)'; this allows them to be restored to the computation if and when the balance of evidence changes. The "Map Report" currently under development will include careful explanations of the conflicts (which can sometimes be highly complex) underlying the suppression of such items. We welcome any community feedback that can assist in the accuracy of this process.

B.5.3.4. Genome-Derived Cytology

All the predicted genes have now been incorporated into FlyBase with inferred cytology. The inference system we have used is based on the estimates that Sorsa published a few years ago of the size in kb of each polytene band. These estimates can be summed to give the length (according to Sorsa) in kb of a region between two very well-mapped entities ('anchors') that are also identified on the genome. The genome sequence gives a different number for that length, of course. So we then apply a scaling factor, i.e. we calculate the cytology of each predicted gene in the region between the anchors by interpolation from its sequence coordinates. The anchors we use are a set of over 1200 P insertions that have been localised on the genome by sequencing flanking DNA and on polytenes by Todd Laverty of the BDGP. The scaling works out slightly different for each inter-anchor region, of course, but we estimate that even in the middle of a region the error in the computed location should never be more than a band or so. As the remaining gaps in the genome sequence are filled, some currently unmappable stretches of sequence (especially near centromeres) will be joined up with the main sequence, and that will shift all the coordinates. Smaller changes will occur as a result of other gap-filling in the middle of arms. These will be reflected in updates to map locations. If you have further questions do not hesitate to mail us at flybase-help at morgan.harvard.edu (reformat to standard e-mail address).

B.6. Wild genotypes and Chromosomes

Information on wild-type genotypes and chromosomes is kept in the Wild Stocks section of Genes. The core of wild-stocks.txt is the information on wild-type stocks from Lindsley and Grell (1968) (itself derived from Bridges and Brehme, 1942), supplemented with more recent data. The file not only includes information on stocks, but also on certain chromosomes, extracted from natural or laboratory populations, whose genetic properties have been studied - in particular chromosomes found to induce male recombination or other phenomena related to the activity of naturally-occurring transposable elements.

The fields in wild-stocks are:
*a Name or symbol of stock or chromosome
*c Description of cytological features
*d Date of origin as a laboratory stock or chromosome
*e Full name
*i Synonym(s)
*o Origin
*p Phenotypic characteristics and properties
*q Notes on how stock or chromosome is maintained
*s Molecular characteristics, including information on transposable elements
*w Collector
*x References
*C Class, e.g., wild-type stock; selected wild-type stock; extracted wild-type chromosome; laboratory stock
*E A duplicate of a *x field, used to tie data to a reference
*R Collection site

B.7. Function and Structure of Gene Products

'Function'

FlyBase uses the terms of the Gene Ontology database to describe 'functional' attributes of gene products. Three classes of attribute are used, function, process and cellular location. The information is provided in three formats:
     html tables sorted alphabetically by GO term
     text tables sorted alphabetically by GO term
     tab delimited tables with the following syntax:
          DB Gene_id Gene_symbol [NOT] GOid DB:ref evidence with aspect
               In the case where NOT is written in the '[NOT]' column then the GO term does not apply
               to the gene it is attached to. This field is used rarely for cases of conflicting/unexpected data.
               'with' can be used to qualify one of the following evidences:
                   IGI, IPI, ISS and is in the format:
                    database:gene_symbol (or protein_symbol or sequence_ID)
                    or species\gene_symbol (or protein_symbol)
               'aspect' is one of: P (process), F (function) or C (cellular compartment)
               'evidence' is one of:
                    IMP = inferred from mutant phenotype
                    IGI = inferred from genetic interaction
                    IPI = inferred from physical interaction
                    ISS = inferred from sequence similarity
                    IDA = inferred from direct assay
                    IEP = inferred from expression pattern
                    IEA = inferred from electronic annotation
                    TAS = traceable author statement
                    NAS = non-traceable author statement

'Structure'

The "structure" tables includes all genes from Drosophila known to encode a product with known protein features - for example a zinc finger domain. These data are from two different databases. The first of these is the INTERPRO database, a database of protein sequence domains and motifs. INTERPRO is, in effect, a union of six different protein domain/motif databases: PROSITE, ProDom, SMART, TIGRFAMs, Pfam and PRINTS. SCOP is a database of protein structures.

Syntax: domain <== INTERPRO_identifier>: gene_symbol<; gene_symbol>
Syntax: domain <== SCOP_identifier>: gene_symbol<; gene_symbol>

B.8. Aberrations

B.8.1. List of Aberrations field descriptions
B.8.2. Detailed description of the Aberrations fields

Information on chromosomal aberrations is found in the Aberrations section of FlyBase. The initial data set was produced by merging the data in the "Chromosomes" and "Special Chromosomes" sections of the Red Book (Lindsley and Zimm, 1992) with Ashburner's files (compiled between 1989 and 1992) and the "TE" transposable elements of Ising, which we feel are most naturally considered as aberrations. In the process of this merge, a great number of synonyms and typographical errors in aberration names were identified. New aberration records are added through FlyBase's curation of the literature.

The representation of aberrations from species other than D. melanogaster is the same as that for genes, that is to say the aberration symbol will have the syntax <Nnnn\>symbol, where Nnnn is an abbreviation of the species. The default species will always be D. melanogaster, in which case the species abbreviation will not be shown.

B.8.1. List of Aberrations field descriptions

*a aberration symbol
*b genetic map position (for some small insertions and transposons/transgene constructs)
*c comments on cytology
*e full name
*g nucleic acid sequence accession numbers
*i symbol synonym(s)
*n position-effect variegation information
*o origin/mutagen [cv]
*p phenotypic data
*q genetic data with respect to genes
*s molecular data
*u other information
*v information on availability
*w discoverer(s)
*x reference(s)
*y secondary FlyBase aberration identifier number
*z FlyBase aberration identifier number
*A associated allele
*B breakpoints
*C class of aberration [cv]
*E a duplicate of a *x field, used to tie data to a reference
*F Breakpoints inherited from progenitor(s)
*G formal description of genetic data
*H date record entered or updated
*I genotype variant symbol
*J revised cytological data
*N new cytological order
*O progenitor genotype if relevant to aberration
*P transposon/transgene construct insertion(s)
*Q name synonym
*R comments on origin, including progenitor genotype if irrelevant to aberration
*S alleles
*T genetic data with respect to other aberrations
*U aberration nickname or balancer short genotype
*V position effect variegation information
*W source of cytological description
*Y separable component

B.8.2. Detailed description of the Aberrations fields

*H. Dating of records and updates.
All aberration records have two date fields. The first, 'Date entered', is the date an aberration record was entered into the Sybase tables. The second is 'Last updated', the date the record was last updated. When entered the two dates will be the same. The 'zero' date of all records then extant was 16 May 1994. FlyBase dates are represented as dd mm yy, mm being the initial 3-letter abbreviation of the month, and yy being the last two digits of the year (e.g., 01 Jul 94).
*I ,*S and *U. Genotype variant symbol, allele constitution and balancer short genotype or aberration nickname.
Many aberrations, especially balancers, are listed in Lindsley and Zimm under multiple names, each of which carries the same chromosomal rearrangement but which differ genetically. We have preserved this principle, but since this is a data set of aberrations we have chosen to list them all in the same record. Such aberration records thus have a hierarchical structure, like the gene/allele juxtaposition in the genes file. In the case of balancers, every variant genotype is assigned a unique symbol (*I), if not by the author then by FlyBase. The symbol of a particular genotype introduces a block of data specific to that genotype, terminated either by the end of the whole record (a # character) or by another genotype symbol. The alleles included in that genotype are listed in *S. Again in the case of balancers,*U holds a short genotype appropriate for use in stock lists. Included in the short genotype is a core balancer symbol (with a very few exceptions these are limited to the balancer symbols used in Lindsley and Zimm) plus the additional alleles, transposons/transgene constructs or aberrations that distinguish this genotype from others with the same core balancer aberrations and alleles.
*i and *Y. Synonyms and separations.
A major effort has been made to tighten up the manner in which aberration names are termed equivalent. We have defined the following two classes of sense in which two aberrations can be said to be related or equivalent:
(1) Genuine synonymy. The same rearrangement is referred to under both names in the literature. One name is chosen as the valid symbol of the aberration, and the other is made a synonym, in a *i field.
(2) Meaningful separability of components. A rearrangement was isolated which has been divided by non-mutagenic means into two components. These components have properties when separate that they lack when in combination. (The commonest case of this, of course, is when a transposition or distal translocation is divided into aneuploid components.) Thus all three (or sometimes only two, if only one component can be isolated) get their own valid names. However, the components' cytology is defined by that of the original aberration, so it would be against the principle of good data management to duplicate that data in two or three records. Accordingly, the components are listed in the same record as the original, using the same hierarchical structure as that for genotype names described above. Each block begins with the symbol of the component.

It should particularly be noted that many three-break rearrangements which are, for example, both deficiencies and translocations, have tended in the past to be referred to under whichever name is the more appropriate to the work in hand, with the result that many have been listed twice with no indication that they are the same thing. They are now collapsed to a single record with the "losing" name as a synonym, as they can certainly never be separated without further mutagenesis. Similarly, many transposition segregants which had been correspondingly orphaned have been restored to separable components of their progenitor.
*Q Name synonym. This field records full names that correspond to symbols that have become synonyms of aberrations. No effort is made to represent the relationships between symbol synonms and their corresponding name synonyms. Not all symbol synonyms have a name synonym, and vice versa.
*C. Aberration class (Note: see the FlyBase Nomenclature Document for details of aberration nomenclature).
The new cytological order of highly complex aberrations can only be formally described (if known!) by a pseudo-pictorial notation such as that used in Lindsley and Zimm. We have retained such new orders as and when they are necessary. However, they have a drawback common to many of the datasets we are incorporating into FlyBase, viz. that they inherently duplicate data present elsewhere in the record -- in this case, the cytological locations of the breaks. They also have failings of expressive power: Lindsley and Zimm's notation, for example, fails to distinguish between a breakpoint range and a deficient segment, with the result that yet further duplication of data must be introduced to remove the ambiguity. In order to minimize this problem, and also in order to render the data more easily manipulable by software, we have identified a few classes of aberration which are usually represented by new orders in Lindsley and Zimm but which are conceptually describable in words, just like the very simple classes. The class always appears in the line immediately following the list of aberration breakpoints to which it refers. Here is the list of classes that appear in the file: all the three-break classes are explained in detail, and an example is mentioned in which the entry there gives an explicit new order.

Two-break classes:
Deficiency
Tandem duplication
Inversion
Translocation
Ring
Autosynaptic
Dextrosynaptic
Laevosynaptic
Free duplication
Free ring duplication, e.g. Dp(2;f)rl⁺

Three-break classes:
Deficient translocation, e.g. T(1;3)ct^268-21A translocation in which one of the four broken ends loses a segment before re-joining.
Deficient inversion, e.g. In(1)N^264-108 Three breaks in the same chromosome; one central region lost, the other inverted. The lost section is that between the first two breaks listed in the breakpoints line (*B).
Inversion-cum-translocation, e.g. T(1;2)C324 The first two breaks are in the same chromosome, and the region between them is rejoined in inverted order to the other side of the first break, such that both sides of break one are present on the same chromosome. The remaining free ends are joined as a translocation with those resulting from the third break.
Bipartite duplication, e.g. Dp(1;2)K1 The (large) region between the first two breaks listed is lost, and the two flanking segments (one of them centric) are joined as a translocation to the free ends resulting from the third break.
Cyclic translocation, e.g. T(1;2;3)OR14 Three breaks in three different chromosomes. The centric segment resulting from the first break listed is joined to the acentric segment resulting from the second, rather than the third.
Bipartite inversion, e.g. In(3LR)BTD7 Three breaks in the same chromosome; both central segments are inverted in place (i.e., they are not transposed).
Uninverted insertional duplication, e.g. Dp(1;1)hdp-b2 A copy of the segment between the first two breaks listed is inserted at the third break; the insertion is in cytologically the same orientation as its flanking segments.
Uninverted insertional transposition, e.g. Tp(1;1)B^263-48 The segment between the first two breaks listed is removed and inserted at the third break; the insertion is in cytologically the same orientation as its flanking segments.
Inverted insertional duplication, e.g. Dp(1;1)y^bl A copy of the segment between the first two breaks listed is inserted at the third break; the insertion is in cytologically inverted orientation with respect to its flanking segments.
Inverted insertional transposition, e.g. In(2R)C72 The segment between the first two breaks listed is removed and inserted at the third break; the insertion is in cytologically inverted orientation with respect to its flanking segments.
Unoriented insertional duplication, e.g. Dp(1;1)hdp-b4 A copy of the segment between the first two breaks listed is inserted at the third break; the orientation of the insertion with respect to its flanking segments is not recorded.
Unoriented insertional transposition, e.g. Tp(1;2)v⁺75d The segment between the first two breaks listed is removed and inserted at the third break; the orientation of the insertion with respect to its flanking segments is not recorded.

Occasionally an author must report an aberration whose cytology is ambiguous and/or incompletely characterized. These aberrations are named as Ab(N)identifier or, when associated with a named allele, Ab(N)gene[allele]. N may be the chromosome arm that includes the breaks, or the chromosome number in the case of a breakpoint within the heterochromatin, when it is not known to which side of the centromere the break maps. If more than one chromosome is suspected of being involved then this is indicated with a '?'. e.g. Ab(3)ME178, Ab(2L;?)cli[eya-X9].
*A. Associated allele.
When an aberration is associated with one or more mutant alleles (as opposed to being simply deficient for a gene), a *A field appears which contains a cross-reference to the allele in Genes. The allele name is preceded by its FlyBase allele ID, which is listed in the genes file and will not change in the future even if the allele designation does (such as because of newly-discovered allelism). In due course, aberrations will also have FlyBase IDs and this cross-reference will be made bidirectional.
*x, *E. References. *x fields, in both gene and allele records, are references.
Syntax: *x FBrfnnnnnnn == abbreviated_reference
e.g. *x FBrf0036029 == Saigo et al., 1981, Cold Spring Harbor Symp. Quant. Biol. 45: 815--827

The FBrf number is the unique reference identifier number from the references table, which also includes the full reference.

The *E field is always a duplicate of a *x field within the same record. It is a device to tie particular data to a particular reference. The data fields then immediately follow the *E field.

The referenced block of fields is terminated by the next *E, *Y or *I field, or the end of record line (#).

Publications that discuss a given aberration are listed in the same way as in genes data, with FlyBase IDs cross-referencing them to the Bibliography file. Any publication that reports mapping of one or more breakpoints to a clone is marked out as a "Ref. with molecular data". In most cases, no information is reported as to where the break lies on the clone; however, the information that that reference maps the break can then be used, among other things, to find nearby cloned genes by searching for the same reference in Genes. In cases where actual distances in kilobases are reported, the fact is given (with attribution to the reference) in the "molecular data" field (*s).
*G. Formalized genetic data.
These lines are computed from the synthesis of map data that underlies all the cytogenetic map positions reported by the map-based tools. The symbol "<<" should be read as "lies to the left of". The genes are chosen to be the most informative based on available data; there is no certainty that they are definitely the genes flanking the breakpoints, but they are the ones whose deduced cytological locations provide the tightest localization of the break.
*q. Genetic data with respect to genes.
These fields store, largely in structured form, conclusions about the relationship between the aberration breakpoints and specified genes, based on genetic complementation data. These data are used in the generation of the genome map.
*T. genetic data with respect to other aberrations.
Phenotypic data on the interaction of combinations of aberrations when present in the same fly, when those data do not allow attribution of the phenotype to particular disrupted genes.
*V. Position effect information.
Many aberrations cause position effect variegation at one or more genes. This information is noted in *V fields. which are of three classes:
*V position effect variegation for: [gene_symbol]; [gene_symbol]
*V no position effect variegation for: [gene_symbol]; [gene_symbol]
*V dominant position effect variegation for: [gene_symbol]; [gene_symbol]

If there is some reason to doubt whether or not a statement is true for any particular gene, then the gene_symbol is qualified by ' \?'.

Free text information may also be added in a *p field.
*z and *y. FlyBase aberration identifier numbers.
These fields are for primary and secondary FlyBase aberration identifier numbers (see section F.1. of Reference Manual F: Links To and From FlyBase).
*P. Transposon/transgene construct insertions.
Natural or synthetic transposable elements carried on an aberration are recorded here.
*O. Progenitor genotypes.
The *O field is for the chromosome on which the aberration was induced. This field is only used if the progenitor is relevant to the derivative. The values in this field will be valid FlyBase allele or aberration names. Where a *O field houses more than one value, each followed by " \?", this signifies that the progenitor chromosome is one of the named alternatives.
*F. Breakpoints inherited from progenitors.
This field may contain multiple lines. The syntax is of the form:
*F 22D1-2;33F5-34A1 (from In(2L)Cy)
*F 21B;40 (from In(2L)DTD27)
*R. Data about an aberrations's origin.
For example that it was simultaneously induced with another mutation/aberration, or information about the genotype of the progenitor which is irrelevant to the derivative. This is a formatted free text field.
*v. Information on availability.
If a publication reports that an aberration is lost, that information is recorded in the *v field. Note that not all such reports in the literature are authoritative.

B.9. Transgene constructs and insertions

The Transgene Constructs section of FlyBase contains information on engineered or synthetic transposons and insertions of natural and synthetic transposons, related cosmids and plasmids, and cell culture vectors. Data on transgenic constructs are almost exclusively derived from the literature. Sequence database entries and personal communications from investigators provide secondary sources of information. The data sets described below are not yet up to date, and will be expanding rapidly in the future.

Transgene constructs

Reports on transgene constructs, including transformation vectors, enhancer traps, and Scer\GAL4/Scer\UAS constructs, are available through the Transgene Construct Search page. See Reference Manual C: Using FlyBase on the Web for information on searching the Transgene Construct data.

The data categories in these reports include:

Synonyms - those used in the literature are reported; synonyms that are the result of typos or were previously used by FlyBase are 'silent': they do not appear in reports, but will be seen by search routines.
Characteristics - the most significant characteristics of a transposon or transgene construct are captured in controlled-vocabulary fields to facilitate searches. Such fields include uses (e.g., 'cloning vector', 'reporter construct'), features (e.g., 'selectable marker', 'complete rescue'), cloning sites, and progenitors. Links to progenitors, descendants, and related constructs are provided.
Associated alleles - links to transgenic allele records are provided. These allele reports include a brief molecular description of that particular component of the construct. A given allele may be associated with more than one construct, if the same fragment of DNA is carried in each of those constructs.
Map and sequence data - a subset of the transgene construct reports, primarily those of general interest such as transformation vectors and enhancer traps, include map and sequence data. FlyBase has compiled sequence data for many constructs that are not in the sequence databases; incomplete sequences are presented if significant portions are known. Each sequence is broken down into segments of natural contiguous sequence and junctions of engineered sequence that join such segments. A complete description of each component segment is provided, including length, links to sequence database entries, location of endpoints in the database entry, identity of endpoints (such as restriction sites), and biological features (such as transcription start sites, transposon termini, etc.).

Transposon and Transgene Construct Insertions

Transposon and Transgene Construct Insertions data include insertions of natural and synthetic transposons. Insertion Reports can be accessed via the Insertions Search page using a symbol-based query or a browseable listing of insertions by cytological location.

The data categories in the Insertion Reports include:

Cytogenetic location (when known), allowing access via cytology-based queries, such as CytoSearch. Locations are based upon explicit in situ hybridization localization or inferred location based on allelism of insertion-associated mutations with mutations in genes that already have assigned cytogenetic locations.
Identity of inserted transposon or transgene construct.
Identity of gene affected, for those insertions that disrupt gene function.
For enhancer traps, expression pattern of the reporter gene, using whenever possible an extensive descriptive controlled vocabulary. Such controlled data capture has been developed to facilitate searches based upon some aspect of the expression pattern.

Insertion Reports are extensively hyperlinked, including links to:

Transgene Construct Reports: Description of the structure and properties of the inserted transposon; these reports include, when available, annotated maps and compiled sequence.
Allele Reports: Descriptions of phenotypes and other mutational aspects of insertions disrupting gene function.
Gene Reports: General information on affected gene, for those insertions that disrupt a known gene.
BFD Reports: Descriptions of those enhancer trap and activating element (P{EP}) insertions characterized by the Berkeley Genome Project. The BFD reports include GTS insertion site sequence tag data, if available.
Balancer Reports: Descriptions of the properties of balancers containing a specific insertion, such as for the "blue balancers."
Stock Reports: Descriptions of the stocks containing insertions from the public stock centers.
References.

FlyBase is developing comprehensive Insertion Reports that will place all relevant data in one report.

B.10. Stocks

The Stocks section of FlyBase includes stock lists from both public and private collections of Drosophila. The Stocks directory contains search options, links to stock center web sites, stock order forms, and help files. Stocks should be requested from individual labs only if a comparable stock is not available from one of the public stock centers.

When the stock description provided by a public center is other than a genotype composed of valid symbols or the name of a wild-type strain, FlyBase creates a genotype where possible based on symbol synonyms. Laboratory stock lists in standardized formats are incorporated into FlyBase as is; FlyBase does not edit laboratory lists to create valid symbols. Laboratory stock lists in non-standard formats are simply posted and are available for browsing. The contents of individual laboratory stock lists are the responsibilities of the laboratories concerned and not of FlyBase. Contact Kathy Matthews (matthewk at indiana.edu, reformat to standard e-mail address) to contribute your own stock list to FlyBase.

Stock center stock information is available through Gene, Allele, Aberration and Transgene Insertion reports as well as directly from the Stocks data section. Laboratory stocks are linked to Gene, Allele, Aberration and Insertion reports when valid symbols are present in a genotype. Recently added stock center stocks may appear in the Stocks section before the links to Alleles, etc. have been updated. See Reference Manual C: Using FlyBase on the Web for help with stock list searches.

Stock Centers
- Bloomington Drosophila Stock Center at Indiana University
  The most up-to-date list of Bloomington stocks is in bloomington.csv, available from Bloomington's homepage. This file will open in Excel or other spreadsheet application. For many purposes the browsing files available from Bloomington will provide a quicker route to useful stocks than will a search of the complete stock list through FlyBase. Deficiency kits, balancers, mapping stocks, lists of GAL4 and UAS insertions, P and other transposable element (TE) insertions sorted by insertion site, stocks for TE mutagenesis and other lists of preselected stocks are available there for browsing.
- The Szeged Drosophila Stock Centre at University of Szeged (Szeged, Hungary)
- The Drosophila Genetic Resource Center (Kyoto, Japan)
- The Tucson Drosophila Species Stock Center at the University of Arizona
Laboratory lists
- The files in the Labs section are laboratory stock lists contributed to FlyBase. See lab-info.html for information on each list, including contact information for requesting stocks.
Ordering stocks
- Requests for stocks held at public stock centers can be submitted to the appropriate stock center using forms available in Stocks or from the center's web site. Stock ordering options are also built into Stock reports accessed through Allele, Insertion and Aberration reports, and CytoSearch results.

Genomic clone data are archived on FlyBase as a set of text files.The Drosophila Resources list includes information on how to request clones from the various projects included here. Questions about these data and materials should be directed to the genome projects themselves.

B.11.1. Cosmids and cosmid STSs

The cosmids are those from the European Drosophila Genome Project. The cosmid library was prepared from a Sau3A partial digest of Oregon-R adults and is in the Lorist 6 vector. The sequence of the Lorist 6 vector can be obtained by FTP from genome.wustl.edu, the file is in /pub/gsc1/sequence/vector/lorist6.seq. A full description of the techniques, and of the project as a whole, can be found in the following references:

Sidén-Kiamos, I., R.D.C. Saunders, L. Spanos, T. Majerus, J. Trenear, C. Savakis, C. Louis, D.M. Glover, M. Ashburner and F.C. Kafatos. 1990. Towards a physical map of the Drosophila melanogaster genome: Mapping of cosmid clones within defined genomic divisions. Nucleic Acids Research 18:6261-6270.
Kafatos, F.C., C. Louis, C. Savakis, D.M. Glover, M. Ashburner, A.J. Link, I. Sidén-Kiamos and R.D.C. Saunders. 1991. Integrated maps of the Drosophila genome: Progress and prospects. Trends in Genetics 7:155-161.
Madueno, E. et al. 1995. A physical map of the X chromosome of Drosophila melanogaster: Cosmid contigs and sequence tagged sites. Genetics 139:1631--1647.

STS sequences of many cosmids have been determined from either (or both) the SP6 or T7 promoters flanking the cloning site. These sequences are available from the EMBL/GenBank/DDBJ nucleic acid sequence data libraries. These sequences are also available from dbSTS, the NCBI STS database. The dbSTS records may include information from more recent matches of the STS sequences against other sequences than are available from the EMBL/GenBank/DDBJ accessions.

See the file Drosophila Resources for information on obtaining cosmids.

The following fields are included in cosmids-sts.txt:

Cosmid: The name of the cosmid.
Contig: Information on the contig which contains the cosmid.
Polytene: The polytene chromosome range of any primary in situ hybridization signal.
Primary_sites: A list of in situ hybridization signals interpreted as being primary sites.
Secondary_sites: mapped secondary in situ hybridization sites.
Repetitive_sites: A (rough) estimate of the number of repetitive sites.
Chromocentral_sites: Hybridization to chromocenter. Abbreviations are: BH beta-heterochromatin; AH alpha-heterochromatin; NO nucleolus organizer.
Aberr_mapping: Describes location of in situ site with respect to aberrations or genes.
Aberration: Similar data to Aberr_mapping but contributed by other workers.
STS: Name of STS.
EMBL_AC: EMBL database accession number of STS.
dbSTS_AC: NCBI dbSTS database accession number of STS.
DB_searched: Database searched for sequence similarities.
DB_version: Version of database searched.
Search_date: Date of databases search.
P1: Berkeley P1 clone that is said to include or overlap cosmid.
YAC: St. Louis YAC clone that is said to include or overlap cosmid.
Accession_of_N_hit: EMBL Accession number of nucleic acid sequence match.
BLAST_comment: A comment on the BLAST match(es).
HSP_score_of_hit: HSP score from BLAST search.
Gene: Gene included in cosmid (for D. melanogaster) or species: gene or protein matched in a database search.
Accession_of_X_hit: SWISS-PROT accession number of protein sequence match.

This file is an output from the European Cosmid mapping Consortium's working database, and for this reason includes internal notes.

B.11.2. P1 clones and P1 STSs

The P1 library of D. melanogaster are largely obsolete and the Berkeley Drosophila Genome Project is discouraging the use of P1 clones. See the FlyBase file Drosophila Resources for additional information.

B.11.3. BAC clones and BAC STSs

Three libraries of BAC clones are now available. These were all made from DNA of the same y[^1] ; cn[¹] bw[^1] sp[^1] stock as was used for the Berkeley Drosophila Genome Project P1 clones.

The libraries are BACR made for the BDGP by K. Osoegawa and P. de Jong (Roswell Park), BACE and BACH made for the EDGP by Alain Billaud at CEPH (Centre d'Etude du Polymorphisme Humaine) with funding provided by a MRC project grant to D.M. Glover and M. Ashburner.

The BACR library is 18,432 clones in pBACe3.6 and the average clone size is 160-Kb. The BACE and BACH libraries are in pBeloBAC11 and consist of 23,400 clones of size range 75 - 150-Kb.

Information about obtaining BAC clones is included in the FlyBase file Drosophila Resources. STS sequences of many BACs have been determined from either (or both) the TET3 or T7 promoters flanking the cloning site. These sequences are available from the EMBL/GenBank/DDBJ nucleic acid sequence data libraries. These sequences are also available from dbSTS, the NCBI STS database. The dbSTS records may include information from more recent matches of the STS sequences against other sequences than are available from the EMBL/GenBank/DDBJ accessions.

B.11.4. Drosophila virilis P1 Clones

The data on P1 clones from D. virilis were provided by D. Hartl. The clones are described in:

Lozovskaya, E.R., D.A. Petrov and D.L. Hartl. 1993. A combined molecular and cytogenetic approach to genome evolution in Drosophila using large-fragment cloning. Chromosoma 102:253-266.

B.11.5. YACs

The YACS are those from the St. Louis and Harvard projects. References for the YACs:

Garza, D., J.W. Ajioka, D.T. Burke and D.L. Hartl. 1989. Mapping the Drosophila genome with yeast artificial chromosomes. Science 246:641--646.
Ajioka, J.W., D.A. Smoller, R.W. Jones, J. P. Carulli, A.E.C. Vellek, D. Garza, A.J. Link, I.W. Duncan and D.L. Hartl. 1991. Drosophila genome project: One-hit coverage in yeast artificial chromosomes. Chromosoma 100:495--509.
Cai, H., P. Kiefel, J. Yee and I.W. Duncan. 1994. A yeast artificial chromosome clone map of the Drosophila genome. Genetics 136:1385--1401.
Hartl, D. L. and Lozovskaya, E. R., 1995, The Drosophila Genome Map: A Practical Guide. R. G. Landes, Georgetown, Texas.

A complete set of YAC clones is maintained by Ian Duncan and clones may be requested from him. See Drosophila Resources for contact information.

B.12. References - the Drosophila Bibliography

B.12.1. Reference formats
B.12.2. Reference classes
B.12.3. Journals and multi-author works
B.12.4. Reference sources
B.12.5. Copyright statements

The References section of FlyBase holds as complete a bibliography of papers, books, etc., concerned with the biology and genetics of Drosophila that we can assemble. The sources of these references are given in section B.12.4. of the FlyBase Reference Manual. A variety of search options are available (see Reference Manual C: Using FlyBase on the Web for information on FlyBase searches) in References and in the All Searches section.

Reference reports include the bibliographic citation, the National Library of Medicine's PubMed abstract if available, and a linked list of genes, alleles and aberrations for which the paper includes data that have been curated by FlyBase. See for example the report of Yasuda et al., 1995. Users should be aware that not all papers in the FlyBase bibliography have been curated using current practice, thus a sparse list of FlyBase data items does not necessarily indicate a lack of content in the paper.

B.12.1. Reference formats

The bibliographic file is distributed in four different formats:

*.rpt - a human-readable report file used in searches (archived)
*.star - a field delimited text file
*.refer - a text file in REFER format to allow direct import into reference handling software (archived)
*.csv - a comma-separated-values format for spreadsheets (archived)

There are six groups of files for each format, sorted by decade (earlier than 1950, 1950-1959, 1960-1969, 1970-1979, 1980-1989, 1990-present). The archived files (rpt, refer and csv formats) are available by ftp from the Indiana server.

references-obsolete.txt is a list of deleted FlyBase FBrf identifier numbers, with a note on whether the reference to which this refers has been deleted from the files or merged with another record.

Files with the extension rpt are the report format files used for searches. Here is a typical entry:

Title :Secretion antigens of salivary glands of larval Drosophila melanogaster.
Authors :Karakin,E.I.
:Lerner,T.Y.
:Kokoza,V.A.
:Sviridov,S.M.
Year :1977
Volume :233
Pages :698--701
Languages :Russian
Issue :1
Journal :Dokl. Akad. Nauk SSSR
FlyBase_ID :FBrf0030018
Also In :FBrf0030017

Complete information for the journal abbreviation is available through the Journal/Book Abbreviations Search or the file references-abbreviations.rpt. The Also In field provides the FlyBase ID of any other appearances of this paper in the literature.

references.*.star are field delimited text files. Each record is terminated by a # character on a line of its own, and all other lines have an * as the first character, followed by a field-identifier letter, a space, and then the field value starting in column 4. There are no trailing spaces -- in particular there is no space in column 3 unless there is something in the field. # and * do not appear anywhere other than in column 1.

Field ordering: * means zero or more
Uab[cd[ef[gh[ij[kl[mn[op(qr)*]]]]]]]tuvwxyzSYLATPBMQIDECZJ#
Fields *U, *v, *w, *x, *y, *z are always present even if null; others are either absent or non-null.
Field allocations:
*U Unique FlyBase reference identifier. Never blank.
*a ..*r Authors. Each author's surname is on one line and initials on another. For historical reasons the first author gets surname before initials and the rest are the other way round, so the surnames are in fields *a, *d, *f, *h, *l, *n, *p and *r. Papers with ten or more authors get fields *q and *r repeated as often as necessary.
*t Year of publication. This is never blank, but can be a range.
*u Title of publication. Never blank.
*v Title of part if one of a series. Blank otherwise.
*w Title of journal or book in which publication appears, unless the whole book is the publication.
*x Publisher, if *w is not a journal.
*y volume of journal, or number of chapter in book. If spread over more than one, these are separated by semicolons. The volume numbers can have letters in them.
*z Page range. If *y has more than one, so does this, also separated by semicolons. Within a volume, a page range is either a single page, a contiguous range written as first--last, or a series of contiguous ranges separated by commas. The page numbers quite often have letters in them.
*S Series of a journal etc.
*Y Issue number (can include letters).
*L Language(s) of publication.
*A Additional language(s), e.g. of abstracts.
*T Type of publication (book, abstract, thesis etc, see below for list). [cv]
*P Place of publication, if a book.
*B BIOSIS identifier number.
*M Medline identifier number.
*Q Zoological Record identification number.
*I ISBN (books) or ISSN (journals) number.
*D Journal CODEN.
*E for "related publication" (usually, but not always, errata).
*C A FlyBase reference ID identifier to another publication of the same article (this could, for example, refer to a translation).
*Z FlyBase reference ID of any previously released record that has been made obsolete by this record.
*J Indicates availability of publication in Cambridge; default is 'no'.

This is an example:

*U FBrf0030018
*a Karakin
*b E.I.
*c T.Y.
*d Lerner
*e V.A.
*f Kokoza
*g S.M.
*h Sviridov
*t 1977
*u Secretion antigens of salivary glands of larval Drosophila melanogaster.
*v
*w Dokl. Akad. Nauk SSSR
*x
*y 233
*z 698--701
*Y 1
*L Russian
#

references.*.refer files are formatted in the Unix REFER format to allow direct import into Refer, EndNote, Pro-Cite and other reference handling software. This format is a text file with tags that each begin with the % symbol. Records are separated by a blank line. In this file we use the EndNote tags. Not all the tags are used. Note, also, that empty fields are absent from a record.

%A author(s)
%B secondary title
%C place published
%D year
%E secondary author
%F FlyBase reference ID
%G type of publication
%H ISBN (for books) or ISSN (for serials)
%I publisher
%J journal or book reference
%K keyword [not used]
%L journal CODEN
%N issue of journal
%O Medline identifier; BIOSIS identifier; language
%P pages
%Q author
%R title
%S tertiary title
%T title
%U series of journal
%V Volume
%W also published as
%X abstract
%Y tertiary author
%Z errata or reference ID(s) of relevant obsolete records
<blank line>

An example of a reference in REFER format is:

%A E.I. Karakin
%A Lerner, T.Y.
%A Kokoza, V.A.
%A Sviridov, S.M.
%D 1977
%T Secretion antigens of salivary glands of larval Drosophila melanogaster.
%D 1977
%V 233
%P 698--701
%O Languages: Russian
%N 1
%J Dokl. Akad. Nauk SSSR
%F FBrf0030018
%W also in FBrf0030017
<blank line>

references.*.csv files in comma-separated-values format, that can be used by many spreadsheet and database programs. The format is:

primary_author, other_authors, pub_title, year, volume, publisher, pubplace, pages, volumetitle, language, language2, series, issue, type, med_uid, biosis, ISBN or ISSN, errata, journal_abbrev, CODEN, FlyBase_id, also_published_in, relevant_obsolete_id

primary_author :primary author
other_authors :subsequent authors, semicolon separated
pub_title :full title of the publication
year :year of publication
volume :volume number
publisher :publisher
pubplace :place of publication
pages :page range
volumetitle :title of part if one of a series
language :language that publication is written in
language2 :any alternate languages
series :series of journal
issue :issue of journal
type :type of publication, can be Book, Abstract, etc
med_uid :Medline identifier
biosis :Biosis identifier
ISBN or ISSN :ISBN (for books) or ISSN (for serials)
CODEN :CODEN (for periodicals)
errata :if this entry is an errata (signified by a type of 'E') this field will provide the FlyBase identifier for publication to be corrected
journal_abbrev :journal abbreviation or book reference
FlyBase_id :unique FlyBase identifier
also_published_in :papers which appear in more than one place will have FlyBase UIDs of the other publications given here

An example of a reference in csv format is:

"Karakin,E.I.","Lerner,T.Y.; Kokoza,V.A.; Sviridov,S.M.","Secretion antigens of salivary glands of larval Drosophila melanogaster.","1977","233","","","698--701","","Russian","","","1","","","","0","","Dokl. Akad. Nauk SSSR", "FBrf0030018","FBrf0030017"

B.12.2. Reference classes

The bibliographic records fall into several different classes. The great majority are papers in journals, but there are also papers in edited publications, theses, manuscripts, other electronic databases and, even, the odd film, archival material and newspaper article. The following classes are recognized by FlyBase and encoded in the *T field [cv]:

abstract
archive
audiotape
bibliographic list
book
book review
booklet
CD-ROM
chart
computer file
database
demonstration
editorial
erratum
film
film strip
leaflet
letter
manuscript
microfiche
microscope slides
newspaper article
note
obituary
patent
personal communication
poem
poster
press release
recording
report
review
slides
spoof
stock list
T-shirt
thesis
transcript of broadcast
unpublished
video

The default type is a journal article or book chapter (i.e., paper).

B.12.3. Journals and multi-author works

Because we have collected data for the reference file from a number of different sources a variety of abbreviations have often been used for the same journal or publication. FlyBase is totally consistent in how it refers to any particular journal or any other publication for which there is more (at least potentially) than one record in the bibliography itself. It does this by maintaining a file of reference abbreviations. This includes not only the abbreviations of journals, but also information on any work, e.g., edited book, symposium volume, conference proceedings, abstract book, that includes more than one independently authored contribution.

Journals
The abbreviations used are those of the World List of Scientific Periodicals Published in the Years 1900-1960 (4th edition, 1965) by P. Brown and G. B. Stratton (London, Butterworths Scientific Publications) for those published before 1960 and World List of Scientific Periodicals: New Periodical Titles 1960-1968 by K. J. Porter and C. J. Koster (London, 1970, Butterworths) for 1960 until 1968. We have tried to use the same conventions for the abbreviations of titles as followed in the World List for titles published after 1968, except that we have tried to be less imperialistic, and have made much use of the List of Serials Indexed for Online Users (National Library of Medicine, Washington 1992), the British-Union-Catalogue of Periodicals (London, 1955-1958) and its supplements, the Serial Publications in the British Museum (Natural History) Library (3rd ed. 1980), the Union List of Serials (3rd ed. 1965, H. W. Wilson, NY) and its successor the Library of Congress. New Serial Titles volumes (1950-) and Ulrich's International Periodical Directory 31st edition (1992-1993. R.R. Bowker, New Providence). We have used the Directory of Japanese Scientific Periodicals (National Diet Library, Tokyo, 1979) for the names of Japanese journals and Half a Century of Soviet Periodicals (R. Smits, Library of Congress, Washington 1968) for many of those of the Soviet Union (as was) (this publication includes US library holdings of these journals).
Edited works
Here FlyBase includes not only edited books, but also proceedings of conferences, abstract books and a variety of other publications that are not in journals yet include more than one contribution. Where known, we include not only the title, place and date of publication, but also the name(s) of the editor(s). If the publication is also a part of a journal series, or of some other series, then these data are also included in the record.
Medline, BIOSIS, Zoological Record, International Series and International Book numbers.
All references that we have found in BIOSIS, Zoological Record or Medline databases have the identifier numbers of these databases attached to them. Records of books have their ISBNs attached, if published since these were introduced. Records of journals have their ISSNs and CODENs attached.

The great majority of journal titles and titles of other publications have been verified by reference to the on-line catalogs of the Library of Congress, University of California (Melville) or the University of Cambridge.

Many journals have titles in more than one language. In such cases the title in the second language is enclosed within square brackets.

The file references-abbreviations.csv lists alphabetically the journal abbreviations used, and gives the full name(s) of the journals, place(s) of publication and, where possible, dates and volume numbers. [The information on volume numbers and dates of publication are useful in detecting obvious errors in citations.] This file also includes information on all other multi-author or edited works. These are referred to in the bibliography itself as if they were journals. Maintaining these references as abbreviations in this file ensures total consistency. Entries are sorted alphabetically by their abbreviation. The fields used are:

*U Unique FlyBase reference identifier. Never blank.
*a .. *r Authors. Each author's surname is on one line and initials on another. For historical reasons the first author gets surname before initials and the rest are the other way round, so the surnames are in fields *a, *d, *f, *h, , *l, *n, *p and *r. Papers with ten or more authors get fields *q and *r repeated as often as necessary.
*s Abbreviation used in *w of references.star files.
*u Full title.
*v Series, and/or volume (or part) number within a series.
*S Series abbreviation, appears in *S of references.star files.
*T Full name of series.
*t Date, or date range, of publication.
*V Volume number, or volume number range.
*z Number of pages.
*x Publisher.
*P Place(s) of publication.
*w Parent journal/series that *s is an issue of. Abbreviations appearing in this field will have a full entry (as *s) in their own right in this file.
*Q Series of *w in which *s appeared.
*y Volume number of *w in which *s appeared.
*Y Issue number of *w in which *s appeared.
*I ISBN (for books) or ISSN (for serials).
*D CODEN (for journals).

This file is also available in csv and rpt formats.

There remain a few edited publications and a few journals whose full details have so far proved impossible to find. These can be recognized by only having an abbreviated title, and (usually) no other information in references-abbreviations.csv. Any help in tracking these down will be appreciated.

B.12.4. Reference sources

See Reference sources for a list of the major sources that have been incorporated into the FlyBase Bibliography.

B.12.5. Copyright statements

The following statement is with respect to the copyright of bibliographic entries taken from BIOSIS:

"This database is copyrighted by Biological Abstracts Inc. (BIOSIS^®). All rights reserved. No part of the information may be reproduced in hard copy, machine-readable form or other form without advance written permission from BIOSIS. Information has been obtained from public sources believed to be reliable. BIOSIS makes a diligent effort to provide complete and accurate representation of the bioscientific and other literature in its publications and services. However, BIOSIS does not guarantee the accuracy, adequacy, or completeness of any information and BIOSIS makes no warranties or representations of any kind, express or implied, including but not limited to warranties of merchantability or fitness for particular purpose. BIOSIS disclaims all liability for errors or omissions that may exist and shall not be liable for any incidental, consequential or other damages (whether resulting from negligence or otherwise) including, without limitation, exemplary damages or lost profits arising out of or in connection with the use of this database. Errors or omissions may be reported to Biological Abstracts Inc., 2100 Arch Street, Philadelphia, PA 19103-1399."

The following statements are with respect to the copyright of Parts 5 and 6 of Herskowitz's bibliography:

"Bibliography on the genetics of Drosophila: Part 5, by Irwin H. Herskowitz is reproduced with the permission of Macmillan Publishing Company. Copyright ©1969 by Macmillan Publishing Company. "

"Bibliography on the genetics of Drosophila: Part 6, by Irwin H. Herskowitz is reproduced with the permission of Macmillan Publishing Company. Copyright ©1974 by Macmillan Publishing Company."

B.13. People

The People section of FlyBase provides address and e-mail contacts for Drosophila workers. The original list of contact information was compiled from five sources - an E-mail address list compiled and maintained by Dr. John Haynie, the records of the Bloomington Drosophila Stock Center, the distribution list of Drosophila Information Newsletter, a subset of the Genetics Society of America's mailing and membership list, and the mailing list for the European Drosophila Research Conference.

The People list is now user maintained via addition and correction forms available in the People section. The file of updates is searched along with the master file so new information is immediately available to FlyBase users. FlyBase encourages you to keep your FlyBase contact information up to date. Use the Add a New Address option if there is no listing for you in the People list. Use the Update Your Current Address option if you wish to make corrections to an existing record. Until the next update of the master files, any updates you provide through the correction form will appear in search results as additional, updated, records, rather than modifying or replacing the existing record.

The fields in people.* are:

Last name
Given name
Department
Institution (e.g., University or Research Institute)
Address (e.g., Street #, Building, Box #, Lab/Office Room #)
City
State (or Province/Region)
Zip (or mail code)
Country
E-mail
Alternate E-mail
Office phone number
Lab phone number
Fax number
URL (e.g., your group's Web page)
PI (i.e., is or is not a group leader)
ID (FlyBase ID number)
Date of last update

The information contained in People is intended for the personal use of the Drosophila and scientific communities. These lists are the property of the FlyBase Consortium and they are not to be used for commercial purposes. Permission must be obtained from FlyBase if they are to be used for any purpose other than that intended by the Consortium.

B.14. Anatomy and Images

The Anatomy and Images section of FlyBase contains tools and data that provide access to genetic information based on anatomy and development. If you want to know when and where a gene is expressed (including reporter genes such as Ecol\lacZ and Scer\GAL4), or which genes can affect a given body part when mutant, this is the place to start. Controlled vocabularies for anatomical features and developmental stages link, through FlyBase vocabulary Term Reports, relevant gene, allele, transcript and protein records to stages of development, a region of the body or to a specific body part. Miscellaneous images and quick-time films are also accessible from this section.

TermLink - Use this tool to search or browse for any term, and its associated Term Report, in the Anatomy, Developmental Stage or Cellular Location controlled vocabularies. Term Reports provide links to Genes, Alleles, Transcripts, Polypeptides and Images that are associated with the term.
Anatomy Images Browser - Thumbnail images are organized by developmental stage and organ systems. Image reports include an annotated image and a listing of associated vocabulary terms.
Life cycle - Access Term Reports based on stages of the life cycle.
Glossary - definitions for selected anatomy and development controlled vocabulary terms
Miscellaneous images
- Drosophila Species - Drawings of Drosophilidae species
- Mutants - SEMs of bcd and ftz mutant embryos.
- Animation - Animations of embryogenesis in wild type and mutant embryos, using both photographic images and drawings. They should be viewable using standard movie play applications.
  The files are:
  embryogenesis.mpg is a cartoon version of embryogenesis using images from The Atlas of Drosophila Development by Volker Hartenstein, Cold Spring Harbor Laboratory Press (1993). The individual images used to make the movie are here.
  gastrulation-lateral.mpg, gastrulation-ventral.mpg, gastrulation-dorsal.mpg, and head-involution.mpg are animations of gastrulation in wild-type embryos generated from scanning electron micrographs. The images used to generate these animations are here.
  ftz-gastrulation.mpg and bcd-gastrulation.mpg are animations of gastrulation in ftz and bcd mutant embryos, respectively. The images used to generate these animations are here.
  credits-movies.doc explains how the animations were generated and by whom.
- Contributed images - Images dealing with Drosophila that have been contributed by users of FlyBase. Each directory has its own documentation, and only short descriptions of the contents of the directories is included here. The subdirectories are:
  brain-k-ito - a directory of scanned photographs of serial sections of wild-type adult fly brains in three directions (frontal, horizontal, and sagittal) annotated with names of major brain structures. It was deposited by Kei Ito (Mitsubishi Kasei Inst.).
  csomes-weeks-etal - a directory that includes images of figures from the paper by Weeks et al. (1993) Genes & Development 7:2329--2344 and an image of a portion of the X4m chromosome from the Bg9.61 strain of John Lis et al. (in the files HS-602A20.*). These images were deposited by John Weeks (Duke Univ. Medical Center).
  dissect-may-etal - includes images illustrating the dissection of the Drosophila brain and whole mount staining and mounting of adults. These were deposited by Sean May (Univ. of Warwick).
Your contributions of images and pictures dealing with Drosophila are welcome if they will be of interest to other fly biologists. We recommend jpeg or gif formats for photographic images. For line drawings, Postscript and Mac pict formats may be more suitable. Images at 300 dpi and 640 x 480 pixels are preferable, but other formats can be accommodated. If you have high quality images of Drosophila phenotypes, chromosomes, gene maps, or other objects of scientific interest please contact Thom Kaufman at kaufman at bio.indiana.edu, reformat to standard e-mail address). Provide a description of the images you are interested in contributing to FlyBase including the format(s) of the images and a brief verbal description of their scientific content. Please do not send the images themselves until you have heard from Thom.