RefMan Sections     RefMan Table of Contents     FlyBase Documents

FlyBase Reference Manual B. Detailed Descriptions of FlyBase Structure and Data
This section Last Updated: 10 November 2005

B.1. Genes

The Genes section of FlyBase contains information on Drosophila genes that has been curated from the literature and sequence databases. Data from all species of the family Drosophilidae are included. The initial data set was produced by merging the genes data in the text of Lindsley and Zimm (1992) with the old LOCI table of Ashburner, and Merriam's Genevent database. Information from all three sources has, however, been considerably revised and reformatted. New gene and allele records are added through FlyBase's curation of the literature and sequence databases. The curation of phenotypic data, a particularly complex class of Genes data, is discussed in Phenotypic Data in FlyBase, Drysdale (2001).

Some of the records in Genes will be transient. As more data become available some gene records will merge with others. Furthermore, some of these records are based on minimal data, for example, the annotation to an EMBL or GenBank sequence record. Our policy is to include data wherever we can. As records merge (or split) they will always be traceable by their secondary gene identifier numbers and by their synonyms.

One of the major differences between Lindsley and Zimm (1992) on the one hand, and Lindsley and Grell (1968) and Bridges and Brehme (1944), on the other, is that the 1944 and 1968 books were very much catalogs of mutations, rather than of genes. Bridges and Brehme (1944) and Lindsley and Grell (1968) were allele based, while Lindsley and Zimm (1992) is largely, although not entirely, gene based. FlyBase is a gene based database, and Genes reflects this change. Having said that, it will be apparent that the transition is by no means complete in genes. For the majority of genes, mutant phenotypes are described in the respective allele records. In many cases, where, as far as we know, all mutant alleles have a similar phenotype, then this description will be found in the record for the first allele in genes. Many genes in Lindsley and Zimm (1992) had no alleles specified, although it is clear that these genes were identified by one or more mutant alleles. In these cases we have arbitrarily designated an allele with the superscript 1. (Likewise, where an allele is referred to in text with a gene designation, we have regarded this as implying allele 1, where this seems reasonable, and made the change to state allele 1 explicitly). There remain, in Genes, many cases where phenotypic information is to be found within the gene record itself. This is especially so for genes for which there is a great amount of data.

Errors in Genes.

Genes data will not be free of errors, typographical, of fact, or of interpretation. Please inform FlyBase when you find any error in these data. It will then be corrected. E-mail to flybase-updates at morgan.harvard.edu (reformat to standard e-mail address) or contact a member of the FlyBase group, whose addresses and phone/fax numbers are given in Reference Manual I: The FlyBase Project.

B.1.1. General description of Genes data

The Genes file contains a set of Drosophila gene records, the data of each record being organized into many different fields. As far as possible, we have implemented controlled vocabularies for the descriptions. These are indicated by [cv]. The controlled vocabularies are to be found in controlled-vocabularies.txt. This process is by no means complete, except for some of the simpler fields, such as mutagen. For example all X ray induced alleles are described as 'X ray' (without the quotes) in the allele origin field, never 'X rays', 'X-ray' or 'X-rays'.

The use of controlled vocabularies will increase in the future. This will allow users to more easily search the database and retrieve genes or alleles with particular properties.

Overall syntax: The maximum line length is 255 characters; there are no blank lines; all lines begin with either * or #; lines that begin with # have no other characters; lines that begin with * have a letter in column 2, a space in column 3 and at least one more character beginning in column 4. The character # appears nowhere else in the file. The character * does, unfortunately, but the string *[A-Z,a-z] does not.

Record structure: The lines that are just '#' identify the end of record for a gene. All other lines hold data for a gene, each field is one or more lines that have the same character in field 2. This character identifies the field and, sometimes, its position within a record (see below).

B.1.2. List of Genes field descriptions

These are the current field designations in alphabetical order:

*a gene symbol
*b genetic location
*c cytological location
*d biological role of gene product [cv]
*e full name of gene or allele
*f cellular compartment of which gene product is a component [cv]
*g nucleic acid sequence databank and other DNA accession number
*h polymorphism data
*i symbol synonym(s)
*j xenogenetic interaction information on alleles
*k phenotypic information on alleles
*l transposable element data
*m protein database accession number
*n aberrations causing position-effect variegation of gene [cv]
*o origin/mutagen [cv]
*p phenotypic information on genes
*q information concerning functional relationships between genes
*r information on wild-type biological role
*s molecular information for genes and alleles
*t class of gene [cv]
*u miscellaneous information on genes and alleles
*v information on availability
*w discoverer
*x reference(s)
*y secondary FlyBase identifier number(s)
*z primary FlyBase identifier number
*A allele symbol
*B alternative genetic location
*C comments on cytology associated with allele
*D comments on cytological location
*E a duplicate of a *x field, used to tie data to a reference
*F function of gene product [cv]
*G insertion chromosome associated with allele
*H date record entered or updated
*I transgene construct that carries allele
*J protein domain information
*K arguably most useful aneuploids for this gene
*L synonym for transgene construct symbol
*M probable ortholog in reference species of drosophilid
*N synonym for insertion symbol
*O progenitor allele or chromosome if relevant to allele
*P aberration causing the allele
*Q complementation information concerning alleles
*R comments on origin, including progenitor genotype if irrelevant to allele
*S genetic interaction information on alleles
*T recent review article that discusses this gene
*U nickname
*V name synonym

*Y name of gene product

Field structure: The first line of each record is the *a field. There is only one of these per record. Other fields may appear in any order, and most can appear more than once, not necessarily consecutively. All fields before the first *A field (if any *A) refer to the gene. All fields between two *A fields (or between and *A field and a #) refer to the immediately preceding allele. Thus, for example, *b fields always appear before any *A fields, but *e fields can appear anywhere (e.g., "*e white" and "*e white-apricot"). Fields before the first *A are in a defined order:

aHiezyCbcwBDdJUltrfvFghmnpqsuxE

In pretty outputs the *-codes are replaced by a text term describing the field.

Special characters: There are no special characters used in this file. Superscripts are enclosed between square brackets []; subscripts between double square brackets [[]]. Greek letters are written out, e.g. alpha, beta.

B.1.3. Detailed description of the Genes fields

In this description the fields are grouped logically, rather than alphabetically. Links in the list of field designations in section B.1.2. above go to the relevant detailed field descriptions below.

B.1.4. Nontraditional alleles

In addition to 'alleles' in the traditional sense, FlyBase now names and curates further classes of allele so that phenotypic or expression pattern data can be captured for in vitro construct alleles and alleles of reporter (e.g., Ecol\lacZ), effector (e.g., Scer\FLP) or toxin (e.g., Rcom\DT-A) genes. Since these alleles have not historically been named by researchers, and have been named by FlyBase, their presentation in FlyBase requires some explanation:

B.1.4.1. Alleles of reporter genes

Alleles of reporter genes currently fall into two main classes, those resulting from enhancer trap experiments, and those resulting from promoter (or other regulatory region) analysis, where a fragment is used to drive the expression of a reporter gene. Ecol\lacZ will be used for illustration.

Enhancer trap results:

Promoter analysis results:

B.1.4.2. Alleles of ectopically expressed Drosophila gene products

Products of genes may be ectopically expressed due either to juxtaposition with different regulatory sequences in the genome (as a result of being inserted into different-than-wild-type locations by chromosome rearrangement or P element transposition) or due to in vitro construction creating a different constellation of regulatory sequences than in wild type.

By analogy with alleles of Ecol\lacZ for enhancer traps, P-element-borne insertions of genes e.g., w or ve that have a qualitatively distinct _position-dependent_ mutant phenotype will be curated as new alleles of e.g., w or ve, e.g., veStg caused by a particular insertion of P{HS-rho}, P{HS-rho}Stg.

The 'in vitro construct' ectopic expression alleles currently fall into two main classes, one component or two component systems:

One component systems:
Gene A is expressed from a promoter of gene B. The allele is typically generated by in vitro construction. In such cases the allele symbol is of the format 'gene-Agene-B.PI', e.g., phylsev.PC or 'gene-Agene-B.fragment descriptor' where the author includes a promoter fragment descriptor, e.g., phylninaE.GMR.

An occasional exception is made for promoter fusions that are widely used to provide essentially wild-type gene function; these alleles have the mini-gene '+m construct' designation (see below) prepended to an, e.g., heat shock designation, e.g., w+mW.hs.

It is common that authors report a construct where e.g., ftz is expressed under a 'heat shock' or Hsp70 promoter, while providing no further details about the nature of the promoter. For these cases the allele symbol hs.PI is employed, e.g., Antphs.PZ for 'Antp heat shock construct of Zeng'. An 'hs' designation should be reserved for when the heat inducible, not just the minimal, promoter fragment is used.

Where the allele is both altered in its coding region and being expressed from an ectopic promoter the sequence 'alteration.promoter' is used in the allele designation, e.g., tor13D.hs.sev to denote the coding sequence of tor13D expressed from a heat shock (undefined) promoter with a sev enhancer. An exception to this rule is made for Tags, which appear as the last component of the allele symbol (see below).

Two component systems:

B.1.4.3. Alleles of ectopically expressed non-Drosophila effector products

A note on ribozymes: FlyBase has a foreign ribozyme gene, symbol LTSV\RBZ. Alleles of LTSV\RBZ capture the different variants, e.g., for a heat inducible ftz-targeted ribozyme: LTSV\RBZhs.ftz (syntax 'promoter.target gene') will be named.

'+m' minigenes

The minigene allele designation is used in its narrow sense, i.e., where the only difference between the allele and the wild type is the removal of more or less non-essential sequences. Thus the minigene allele symbol designation reserved for those cases where the gene's own promoter is driving its expression.

The minigene allele symbols begin with 'm', for minigene, and are followed by the construct symbol used in the publication. If no construct symbol has been used, the string 'mIa' where 'm' stands for minigene, 'I' for the first author's last name initial and 'a' for the first in the series is used. If the function of the minigene is stated to be indistinguishable from that of the wild type allele, the 'm' is preceded by a '+'.

Tags Genes can be modified by the addition of a tag allowing the product to be identified, purified, or targeted to a particular subcellular distribution. Tagged alleles have the syntax 'gene-symbol x.T:y' , where x is an identifier and y is the name of the tag, e.g., Hsap\MYC, T:Ivir\HA1, SV40\nls2, e.g., CycBB1.T:Hsap\Myc. Where a tag is artificial, the species prefix Zzzz is used, e.g. T:Zzzz\His6.

B.1.4.4. Classical alleles engineered into transgene constructs, including rescue constructs

A class of alleles are named to capture fragments of genomic DNA used in rescue constructs. The symbol for the rescuing allele symbol begins with '+t'. This is followed by length as stated by authors, construct symbol if length is not given or '+tIa', where 't' stands for transgene, 'I' for the first author's last name initial and 'a' for the first in the series (if neither length nor construct symbol is stated). When rescue is incomplete, the construct is considered as carrying a mutant allele. Allele designator is construct symbol, 'length of genomic insert.tIa' if no symbol is given or 'tIa' where neither length nor construct symbol is stated.

When a classic allele, e.g., wa, is put into a transgene construct it will get a new designation, e.g., wa.tIa, to reflect its transgenic environment, where 't' stands for transgene, 'I' for the first author's last name initial and 'a' for the first in the series

FlyBase is, of course, happy to discuss and advise on use of nomenclature of these non-traditional alleles.

B.1.5. Protein and transcript symbols and exon naming

FlyBase strives to link curated information to particular protein and transcript species. In order to maintain the data in this way, it is necessary to assign different symbols to each gene product. Proteins, transcripts and exons are symbolized as follows.

Protein symbols are of the form cact[+]P482 where the gene symbol and allele designation are followed by a capital P and the size of the protein in amino acids. When the size in amino acids is not known, the size in kiloDaltons is used, e.g. grh[+]P120kD. If no size is known, the symbol is followed by a capital letter to distinguish products that are known to be different, e.g. Sh[+]PA, Sh[+]PB. If multiple proteins of the same size and divergent sequence are characterized, the symbols are followed by different capital letters, e.g. abc[+]P345A, abc[+]P345B. A generic protein symbol, e.g. cact[+]P, is used to capture properties that cannot be specifically attributed to one protein product of a gene.

Transcripts are similarly named. The gene symbol and allele designation are followed by a capital R and the size in kb, e.g. cact[+]R2.2. Where possible the size as estimated by northern blot is used. If not, the size of the longest cDNA is used and this is indicated in the transcript table. For transcripts of unknown size, the symbol is followed by a capital letter, e.g. grh[+]RA, grh[+]RB. For multiple transcripts of similar size and divergent sequence, the symbols are followed by different capital letters, e.g. abc[+]R1.7A, abc[+]R1.7B. A generic transcript symbol, e.g. cact[+]R, is used to capture properties that cannot be specifically attributed to one particular transcript of a gene.

In general, all of the exons comprising a gene are numbered consecutively from 5' to 3'. Where exons partially overlap, they are given the same number with a suffix, e.g. 2a,2b.

In some cases, it is not possible to attribute a characteristic to an individual gene product. For example, expression pattern data is often obtained with probes or antibodies that recognize more than one product of a gene. It is not rigorously known where each individual gene product is expressed. In addition, it is often not possible to determine which transcript observed on a northern blot corresponds to a particular cDNA. In these cases, the data is linked to a generic protein or transcript entity for that gene.

B.1.6. FlyBase Genes - Interactive Fly Cross Index

FlyBase has developed a hierarchical view of the Interactive Fly entitled "Interactive Fly Hierarchy: cross-index to FlyBase genes". This hierarchy is accessible from both Allied Data and Genes. The hierarchy provides an overview of the Interactive Fly with links to the specific Interactive Fly pages, as well as gene lists with links to the individual gene records in FlyBase and the Interactive Fly. This permits searches for genes grouped according to developmental and cellular pathways and functions.

B.1.7. Differences and omissions from Lindsley and Zimm (1992)

All errors found in Lindsley and Zimm (1992) have been corrected. A list of these errors, sorted by page number, is in the file errors.txt in the Redbook section of FlyBase Documents. The material in the DELETION MAP tables in the 'lethals' section of Lindsley and Zimm (1992) is not included; these tables are available in the Redbook section of Maps. The tables of Lindsley and Zimm (1992) have been broken down and the data incorporated into the text of the relevant gene record. All references within the body of a text entry of Lindsley and Zimm (1992), i.e., not in the references: field, have been duplicated into the references: field. With a very few exceptions all references are to be found in the FlyBase Bibliography and carry FlyBase reference ID numbers. The molecular map figures in Lindsley and Zimm (1992) are not included in genes, but are available in Redbook/Images sections of Documents. Lindsley and Zimm often used introductory sections for groups of genes that are, in some way or other, related (see e.g. the record for ASC, page 50). This structure is not suitable for FlyBase, and this information has, in general, been repeated in each of the relevant individual gene records.

B.2. Synonyms

FlyBase maintains a record of synonyms for gene, allele, aberration, transposon and transgene construct symbols that have appeared in the literature and stock center stock lists. Files with tables of synonyms and their corresponding "valid" symbols are found in the relevant sections of FlyBase.

Synonyms have several different causes. Sometimes two workers give the same symbol to two different genes, requiring one of these to be changed. Sometimes two workers, either by accident or design(1), give two different symbols to the same gene, then that which has priority should be used. Many of the synonyms arise, however, as a consequence of minor variation in the way a gene's or aberration's or transposon's or transgene construct's symbol is written (e.g., with lower case or capital first letter), or by error, either in the literature or these tables. In some cases it has been difficult to decide whether a name is a gene synonym or just an allele name (this is especially so for lethals). We have taken a very liberal attitude to synonyms and, when in doubt, have included a name as a synonym even when it may more correctly be an allele name.

The files are:

1. "Scientists would rather use each other's toothbrushes than each other's nomenclature.", Keith Yamamoto.

B.3. Species other than D. melanogaster

FlyBase includes data on all species from the family Drosophilidae. The 'default' species is D. melanogaster and all symbols and names of genes, alleles, aberrations and clones from other species have a prefix of the form Nnnn\, where N is the initial letter of the genus (e.g. D for species in the genus Drosophila) and nnn is normally the first three letters of the specific epithet (e.g., sim for simulans). In formal terms all symbols and names from D. melanogaster have the prefix Dmel\, but this is usually omitted.

Species prefixes are also used for non-melanogaster genes introduced into D. melanogaster via a transgene construct, including Ecol\lacZ, Scer\GAL4 and Avic\GFP. In addition, genes carried by natural transposable elements have the transposon symbol as a 'species' prefix, for example, P\T, the gene for P-element transposase. To find genes such as these in a Genes search, change the 'Species' option from the default 'Dmel' to 'All'.

A list of all of the names and abbreviations used by FlyBase for species is included in the Nomenclature section of FlyBase. The species-abbreviations.txt file has the syntax:
taxgroup | abbreviation | genus | species name | common name | comment

At present, four different 'taxgroups' are recognized:

drosophilid (i.e., species in the family Drosophilidae), non-drosophilid eukaryote, prokaryote, transposable element and virus (including prokaryotes viruses), and the file is sorted in this order.

We stress that identity of gene symbol between two species cannot be used to conclude 'homology' of genes. Where known, or strongly suspected, information concerning homologous genes within the family is present in a *M field of the genes file.

FlyBase has made only limited efforts to curate genes, alleles and aberrations from species other than D. melanogaster for the period before 1989. We have back curated from D.I.S. and some primary papers and reviews that have come to hand. For four species we have incorporated the efforts of others:

We would be happy to hear from colleagues who are able to review records from species other than D. melanogaster. We thank Jerry Coyne for reviewing the records for D. simulans, D. mauritiana and D. sechellia.

B.4. Genetic objects from non-Drosophila species that are included in Drosophila

Sequences from many other organisms are often included in artificial constructs introduced into the genome of Drosophila. FlyBase calls these 'foreign genes' and they have symbols that indicate both the species of origin and the nature of the element, e.g., Hsap\BMP4, the BMP4 gene from humans. A list of the species abbreviations used is to be found in the Nomenclature section.

Just as two or more different Drosophila genes can be engineered into a gene fusion so can two or more different foreign gene coding regions. These are called 'foreign fusion' genes, e.g., Avic\GFP::Ecol\lacZ, a coding fusion of Aequorea victoria GFP and the E. coli lacZ gene.

Structural and non-coding elements ('SAFE elements', see B.1.3.) from non-Drosophila species are called foreign SAFE elements. The most common group of foreign SAFE elements are short sequence tags used to mark genes or their products (including epitope tags). These have symbols that begin with 'T:', e.g., T:Hsap\MYC, the 'myc' epitope tag. Artificial sequences are also classed as SAFE elements, e.g., T:Zzzz\His6 for a DNA sequence encoding a run of six histidine residues.

A limited class of regulatory elements from foreign species are classified as foreign SIRE elements (synthetic and/or isolated regulatory elements). This class is restricted to regulatory elements widely used in an isolated context, for example as mobile activating elements. Examples are the synthetic multiple UAS[[G]] elements, restricted to cases in which they are used within transgene constructs designed to activate adjacent endogenous genes.

The class of element is indicated in a *t line, which, for the objects described in this section, can have the following values:

Each class, or any combination of classes, can be extracted from the database by using the complex query form in Genes with the "Class" option changed from the default "all" to one or more (ctrl+click to add terms) of these categories.

For each class the origin of the gene is described in star-coded format in a *u line with the following syntax:
*u Foreign sequence; species == <species_name>; gene|sequence|sequence tag|function tag|epitope tag == <gene symbol>; <database_abbreviation:database_id>.

Attempts are first made to cross-reference to another genetic database (e.g., OMIM, GDB, MGD). If such a link cannot be made then we attempt to establish a link with a protein or nucleic acid sequence database. The database abbreviations used will be found Reference Manual F: Links To and from FlyBase. The gene name or symbol will be enclosed with single quotation marks if no cross-reference to another genetic database can be found. If no cross-reference can be established then a brief literature reference to the object will be included within the 'comment' field. In the case of epitope tags the comment field will normally include the 'name' of the antibody recognizing the epitope and a literature reference.

B.5. Maps

The Maps section of FlyBase contains map-based browsing and query tools and data. See Reference Manual C: Using FlyBase on the Web for further information on these tools.

FlyBase uses Bridges' revised maps for the banding patterns of the polytene chromosomes. See:

Bridges, 1938, J. Hered. 29: 11--13 (X chromosome), Bridges and Bridges, 1939, J. Hered. 30: 475--476 (2R), Bridges, 1941, J. Hered. 32: 64--65 (3L), Bridges, 1941, J. Hered. 32: 299--300 (3R), Bridges, 1942, J. Hered. 33: 403--408 (2L).

B.5.1. Sequence-based Maps

B.5.1.1. Genome Browser, GBrowse

GBrowse (a product of the Generic Model Organism Database Project) provides a Web-based view of a specified region of the genome; the location of that region along the chromosome arm is indicated graphically. The region of interest can be specified by gene symbol, CG identifier, a mapped feature (such as a Drosophila Gene Collection cDNA clone, BAC genomic clone, P element insertion, or protein sequence accession in the SPTR database with BLASTX similarity to the genomic sequence), or a coordinate extent on a scaffold accession or chromosome arm. One can also input a sequence string using the Fly BLAST server and from the BLAST results list link to the alignment in the GBrowse view. The extent of the region (from 100 bp to 5 Mbp) can be controlled by the user using the zoom option. Adjacent regions can be viewed using the scroll option. Annotated genes, supporting data, and other sequence-aligned data (eg., P-element insertion sites and Affymetrix oligos) are shown as color-coded features flanking the central sequence axis. Features can be indentifed by mousing over the relevant graphic and viewing the feature name in the status bar; when the view is zoomed in sufficiently, or the gene labelling option is selected, the gene annotations are labelled. Included below the gbrowse view of the region are BAC in situ images. The "Display Settings" panel can be used to control the subset of features displayed, the width of the image, and other display options. For example, one can choose to have gene symbols displayed or can choose to have an expanded view of the aligned data. The data behind the GBrowse view, including cytological locations and GO gene function descriptions, can be downloaded in various flat-file formats: tabulated, FASTA, GAME-XML or GFF formats.

B.5.1.2. Drosophila Genome Overview

The FlyBase tool Drosophila Genome Overview is an extension of GBrowse that allows users to browse entire chromosome arms at once. The default view displays cytological numbered divisions, the tiling BAC genomic clones, and the annotated sequence scaffolds in GenBank. Clicking on the BAC or GenBank scaffolds takes users to the GBrowse view of the region. Users can also choose to display all of the genes along a chromosome arm, as well as cDNAs that align to the genomic sequence, P element insertions, transposable elements, and sequencing gaps. The width of the map can be adjusted, which is necessary when viewing these finer, optional features.

B.5.1.3. Apollo

A more flexible and interactive view of the same data provided in gbrowse is possible using the Apollo genome browser and annotator. Use of this tool requires that the Apollo software be downloaded and installed locally; data are then loaded via a Web connection from the annotation database. Data can be saved locally in the form of GAME-XML flat files and subsequently reloaded into Apollo. A detailed and comprehensive user guide for Apollo is available. This tool provides several options for viewing annotations and features down to the sequence level, and allows searches for specific genomic or amino acid sequence strings. Apollo also provides editing options, including sequence-level modifications of exon extents, addition of alternative transcripts, deletion of existing annotations, modifications involving merging or splitting existing annotations, and addition of comments associated with specific genes or transcripts. There are many options for customizing the format of the view and the data sets; these may be saved as user preferences.

B.5.2. Gene Order Maps

Gene order maps contains maps that communicate both gene order and cytological location. There are two formats: files whose names end '.ps' are suitable for downloading and printing on a PostScript printer, while those ending 'txt' are preferable for viewing in a web browser. Their format is documented in detail in the file geneorder.doc in the same folder.

Using the Gene Order Maps

The gene-order map communicates both gene order and cytological location. This is presentationally rather different on a genome-wide map than on a small, well-mapped region, and a novel format has been adopted, which is documented here.

1. Cytological range
Each gene whose cytological location is known with a range of uncertainty less than about two number divisions is written on a vertical line whose extent is the range of uncertainty. Overlapping lines are staggered. To this extent, in other words, the format is as in the EofD. A gene whose symbol exceeds nine characters may cross more than one line; the line it is attached to always goes through the second character of the symbol.

Bands are drawn with differing sizes, but this is not in any way related to amount of DNA per band, as it is on the EofD. It is only a function of how much data we need to place there.

2. "Limiting" genes
In addition, at either end of the line there is the symbol for a gene that is known to lie to the indicated side of the gene in the middle of the line. Two points must be emphasized about these "limiting" genes: they are not being stated to have the same cytological location as the "limited" gene, and they are not being stated definitely to be the neighboring gene. They are chosen by pragmatic criteria as being the most informative genes that are known to lie to the indicated side. These criteria include cytological location and size of range of uncertainty of that location. This means that it is common, especially in well-mapped regions, for a gene to appear more than once. A gene can appear as a limiter of any number of other genes, but it will only be a limited gene on at most one line.

Limiters are identified only by direct recombination, complementation or molecular map data; cytology (of genes or of breakpoints) is never used. If a gene has no limiter on one side (or both), that means that no gene can be placed to that side using direct genetic or molecular data.

3. Multiple "limited" genes on a single line
In the better-characterized regions, gene order is known to a degree that cannot be clearly represented by cytological range. This is alleviated by placing two or more genes "limited" on the same line. So as to maintain completeness of information, a set of genes is only ever limited on the same line if (a) their relative order is completely known, and (b) they all have identical cytological ranges. The limiters of a line with more than one gene are known to lie to the indicated side of all limited genes.

 |      y 
 |      | 
 |      | 
1B5     | 
 |     svr 
 |      | 
 |    elav 
        | 
 |      | 
 |      | 
1B6     | 
 |      | 
 |    Appl 

This says:

It does not say:

4. Nested or overlapping genes
The software that analyses map data understands the concept of genes within genes, but this is hard to depict graphically without a generally more confusing format. Sometimes, therefore, a gene will be shown as its own limiter, or as both limited by and limiting (to the same side) another gene.

We have incorporated some molecular data into this map, and will add much more over the coming year, but the bulk of the information is based on genetic data. Therefore, the definition of overlap of two genes is not necessarily that the transcription units overlap. For example, ftz is shown as embedded in Scr, because Scr[-] ftz[+] deficiencies exist that delete proximal material (including Antp).

5. Genes with cytological extent
A few dozen genes are stated to be deleted by deficiencies which (according to our data) do not quite overlap, thus implying that the gene occupies the whole region between the deficiencies (plus a bit on either side). In most cases the gap between the genes is only one band, so we have fudged the issue by placing the gene at the interband, e.g. y in 1B1-2:

 | 
 | 
1B1 
 |         arth
 |          | 
            y 
 |    y     | 
 |    |     ac 
1B2   ac 
 |    | 
 |    sc 

Two files related to the correspondence of the genetic and cytogenetic maps are also in Maps:

B.5.3. Computed Aberration Breakpoints and Cytological Locations of Genes

If you see computed cytologies in FlyBase that you think are incorrect, please contact us at flybase-updates at morgan.harvard.edu (reformat to standard e-mail address).

Five categories of information regarding the polytene location of genes and aberration breakpoints are captured by FlyBase:

Recombination, complementation and molecular information does not reveal polytene locations directly, but can be combined with orcein and in situ data to derive inferred polytene locations. This type of analysis is non-trivial when conducted on a large dataset. FlyBase has produced software which does it automatically, with some provisos which are explained below (see 'Provisos').

The output of this software is a 'best guess' of the polytene location of each gene or aberration breakpoint for which any relevant data are known to FlyBase. The guess is presented as a range of uncertainty, whose ends are either polytene bands (such as 22F1) or lettered subdivisions (such as 22F). Heterochromatic bands (such as h41) are also used. This range appears as the polytene location of the gene or breakpoint in the header section of the gene or aberration report, and is also used as the underlying data for the various map-based user interfaces, such as the graphical maps and CytoSearch.

To the extent possible (see 'Provisos' below), the computed range of uncertainty of a gene or breakpoint is the range consistent with ALL the data known to FlyBase. Thus, if in one publication a gene has been reported to lie in 35B1-4, and in another publication it is reported to lie in 35B3-6, and there is no other relevant information in FlyBase, the computed location will be 35B3-4. More complex situations arise from complementation and recombination data. For example, if Df(1)xyz is stated to have its proximal breakpoint at 15A1-4, and Df(1)pqr is stated to have its distal breakpoint at 15A3-6, and the Df's are known to overlap (because there is a gene, abc, that they both delete), then both those breakpoints will be computed to lie in 15A3-4 -- as will the gene abc itself.

Because of the inherent complexity of these computations, the basis for the computed range is often far from obvious at first sight. FlyBase therefore includes, directly following the computed range in the Full and Abridged (but not Synopsis) gene and aberration reports, one-line descriptions of the primary data from which each end of the range was determined. Those from the last example above would be as follows (with arbitrary data for the other ends of the deficiencies): note that there is no requirement that any two data items derive from the same reference.

For gene abc:
Computed cytological location: 15A3-4
Left limit from inclusion in Df(1)pqr (FBrf0012345)
Right limit from inclusion in Df(1)xyz (FBrf0054321)
For Df(1)xyz:
Computed cytological location: 14D;15A3-4
Limits of break 1 from polytene analysis (FBrf0013579)
Left limit of break 2 from inclusion of abc (FBrf0056789)
Right limit of break 2 from polytene analysis (FBrf0098765)
For Df(1)pqr:
Computed cytological location: 15A3-4;15D
Left limit of break 1 from polytene analysis (FBrf0034567)
Limits of break 2 from polytene analysis (FBrf0097531)

Even this brief explanatory text is often somewhat opaque, however, so FlyBase is in the process of designing a 'Map Report', linked from the gene and aberration reports, which explains in more detail how the various relevant items of data were used in the computation.

B.5.3.1. Notation

Ranges are written as described elsewhere in the Nomenclature Guidelines, with two exceptions.

The first exception concerns ranges which are inferred from recombination data (for genes) or complementation (for breakpoints). These are enclosed in square brackets when no range (even a wider one) can be determined by other means. This is most commonly found for breakpoints of cytologically invisible deficiencies and for genes which were mapped by recombination but never cloned or mapped by complementation. Note that when an entity has been localized explicitly (such as by in situ hybridization), but a narrower range has been computed from other data, this narrower range is NOT bracketed: thus, brackets specifically denote the unavailability of any direct data.

The other case concerns 'one-ended' limits. The commonest example of this is when a deficiency is stated to delete certain genes, thus giving it a minimum extent, but no flanking undeleted genes are specified so no 'maximum extent' can be computed. In such cases, if there is also no explicit cytology for the deficiency (and if it is also not stated to be cytologically invisible -- see below) the 'half-open' range is denoted by 'less than' and 'greater than' signs, as follows:

For a deficiency that deletes three genes, all localized to 28D-E:
Computed cytological location: <28E;>28D
Right limit of break 1 from inclusion of abc (FBrf0076543)
Left limit of break 2 from inclusion of abc (FBrf0056789)

Note that there is no 'limit line' for the left limit of break 1 or the right limit of break 2. Note also the superficially odd, but logically sound, mention of 28E for the left break and 28D for the right break.

B.5.3.2. Proximity rather than order

There are two cases in which locations are computed based on close proximity of a pair of objects, rather than on their chromosomal order. One is when two genes are reported to lie within 20kb or less on a molecular map. For example, if a gene xyz is stated to lie in 22F1-2 and a second gene, pqr, is stated to lie a few kilobases away from xyz (and there is no other relevant information in FlyBase), the computed location of pqr will be 22F1-2, even if there is no information on the chromosomal order of the two genes.

The other case concerns cytologically invisible deficiencies. If a deficiency is stated to be cytologically invisible, the computation makes the assumption that it is less than a band in extent, so that the ranges of uncertainty of the left and right breakpoint should be identical. For example: if the deficiency in the previous example, which deletes a gene in 28D-E, were said to be cytologically invisible then its computed data would appear as follows:

Computed cytological location: [28D-E];[28D-E]
Left limit of break 1 from cytological invisibility (FBrf0002468)
Right limit of break 1 from inclusion of abc (FBrf0076543)
Left limit of break 2 from inclusion of abc (FBrf0056789)
Right limit of break 2 from cytological invisibility (FBrf0002468)

Note the use of square brackets as described under "Notation", since this is a case where no explicit cytology is available. A statement that a deficiency is less than 20kb long is, for this purpose, treated as a statement that it is cytologically invisible.

B.5.3.3. Provisos

Though we believe that the presentation of computed map statements is of value to the community, providing an easily accessible synthesis of the primary data, such statements can -- by their very brevity -- be interpreted as more authoritative than is really justified. Certain precautions are advisable.

B.5.3.4. Genome-Derived Cytology

All the predicted genes have now been incorporated into FlyBase with inferred cytology. The inference system we have used is based on the estimates that Sorsa published a few years ago of the size in kb of each polytene band. These estimates can be summed to give the length (according to Sorsa) in kb of a region between two very well-mapped entities ('anchors') that are also identified on the genome. The genome sequence gives a different number for that length, of course. So we then apply a scaling factor, i.e. we calculate the cytology of each predicted gene in the region between the anchors by interpolation from its sequence coordinates. The anchors we use are a set of over 1200 P insertions that have been localised on the genome by sequencing flanking DNA and on polytenes by Todd Laverty of the BDGP. The scaling works out slightly different for each inter-anchor region, of course, but we estimate that even in the middle of a region the error in the computed location should never be more than a band or so. As the remaining gaps in the genome sequence are filled, some currently unmappable stretches of sequence (especially near centromeres) will be joined up with the main sequence, and that will shift all the coordinates. Smaller changes will occur as a result of other gap-filling in the middle of arms. These will be reflected in updates to map locations. If you have further questions do not hesitate to mail us at flybase-help at morgan.harvard.edu (reformat to standard e-mail address).

B.6. Wild genotypes and Chromosomes

Information on wild-type genotypes and chromosomes is kept in the Wild Stocks section of Genes. The core of wild-stocks.txt is the information on wild-type stocks from Lindsley and Grell (1968) (itself derived from Bridges and Brehme, 1942), supplemented with more recent data. The file not only includes information on stocks, but also on certain chromosomes, extracted from natural or laboratory populations, whose genetic properties have been studied - in particular chromosomes found to induce male recombination or other phenomena related to the activity of naturally-occurring transposable elements.

The fields in wild-stocks are:
*a Name or symbol of stock or chromosome
*c Description of cytological features
*d Date of origin as a laboratory stock or chromosome
*e Full name
*i Synonym(s)
*o Origin
*p Phenotypic characteristics and properties
*q Notes on how stock or chromosome is maintained
*s Molecular characteristics, including information on transposable elements
*w Collector
*x References
*C Class, e.g., wild-type stock; selected wild-type stock; extracted wild-type chromosome; laboratory stock
*E A duplicate of a *x field, used to tie data to a reference
*R Collection site

B.7. Function and Structure of Gene Products

'Function'

FlyBase uses the terms of the Gene Ontology database to describe 'functional' attributes of gene products. Three classes of attribute are used, function, process and cellular location. The information is provided in three formats:
     html tables sorted alphabetically by GO term
     text tables sorted alphabetically by GO term
     tab delimited tables with the following syntax:
          DB Gene_id Gene_symbol [NOT] GOid DB:ref evidence with aspect
               In the case where NOT is written in the '[NOT]' column then the GO term does not apply
               to the gene it is attached to. This field is used rarely for cases of conflicting/unexpected data.
               'with' can be used to qualify one of the following evidences:
                   IGI, IPI, ISS and is in the format:
                    database:gene_symbol (or protein_symbol or sequence_ID)
                    or species\gene_symbol (or protein_symbol)
               'aspect' is one of: P (process), F (function) or C (cellular compartment)
               'evidence' is one of:
                    IMP = inferred from mutant phenotype
                    IGI = inferred from genetic interaction
                    IPI = inferred from physical interaction
                    ISS = inferred from sequence similarity
                    IDA = inferred from direct assay
                    IEP = inferred from expression pattern
                    IEA = inferred from electronic annotation
                    TAS = traceable author statement
                    NAS = non-traceable author statement

'Structure'

The "structure" tables includes all genes from Drosophila known to encode a product with known protein features - for example a zinc finger domain. These data are from two different databases. The first of these is the INTERPRO database, a database of protein sequence domains and motifs. INTERPRO is, in effect, a union of six different protein domain/motif databases: PROSITE, ProDom, SMART, TIGRFAMs, Pfam and PRINTS. SCOP is a database of protein structures.

Syntax: domain <== INTERPRO_identifier>: gene_symbol<; gene_symbol>
Syntax: domain <== SCOP_identifier>: gene_symbol<; gene_symbol>

B.8. Aberrations

Information on chromosomal aberrations is found in the Aberrations section of FlyBase. The initial data set was produced by merging the data in the "Chromosomes" and "Special Chromosomes" sections of the Red Book (Lindsley and Zimm, 1992) with Ashburner's files (compiled between 1989 and 1992) and the "TE" transposable elements of Ising, which we feel are most naturally considered as aberrations. In the process of this merge, a great number of synonyms and typographical errors in aberration names were identified. New aberration records are added through FlyBase's curation of the literature.

The representation of aberrations from species other than D. melanogaster is the same as that for genes, that is to say the aberration symbol will have the syntax <Nnnn\>symbol, where Nnnn is an abbreviation of the species. The default species will always be D. melanogaster, in which case the species abbreviation will not be shown.

B.8.1. List of Aberrations field descriptions

*a aberration symbol
*b genetic map position (for some small insertions and transposons/transgene constructs)
*c comments on cytology
*e full name
*g nucleic acid sequence accession numbers
*i symbol synonym(s)
*n position-effect variegation information
*o origin/mutagen [cv]
*p phenotypic data
*q genetic data with respect to genes
*s molecular data
*u other information
*v information on availability
*w discoverer(s)
*x reference(s)
*y secondary FlyBase aberration identifier number
*z FlyBase aberration identifier number
*A associated allele
*B breakpoints
*C class of aberration [cv]
*E a duplicate of a *x field, used to tie data to a reference
*F Breakpoints inherited from progenitor(s)
*G formal description of genetic data
*H date record entered or updated
*I genotype variant symbol
*J revised cytological data
*N new cytological order
*O progenitor genotype if relevant to aberration
*P transposon/transgene construct insertion(s)
*Q name synonym
*R comments on origin, including progenitor genotype if irrelevant to aberration
*S alleles
*T genetic data with respect to other aberrations
*U aberration nickname or balancer short genotype
*V position effect variegation information
*W source of cytological description
*Y separable component

B.8.2. Detailed description of the Aberrations fields

B.9. Transgene constructs and insertions

The Transgene Constructs section of FlyBase contains information on engineered or synthetic transposons and insertions of natural and synthetic transposons, related cosmids and plasmids, and cell culture vectors. Data on transgenic constructs are almost exclusively derived from the literature. Sequence database entries and personal communications from investigators provide secondary sources of information. The data sets described below are not yet up to date, and will be expanding rapidly in the future.

Transgene constructs

Reports on transgene constructs, including transformation vectors, enhancer traps, and Scer\GAL4/Scer\UAS constructs, are available through the Transgene Construct Search page. See Reference Manual C: Using FlyBase on the Web for information on searching the Transgene Construct data.

The data categories in these reports include:

Transposon and Transgene Construct Insertions

Transposon and Transgene Construct Insertions data include insertions of natural and synthetic transposons. Insertion Reports can be accessed via the Insertions Search page using a symbol-based query or a browseable listing of insertions by cytological location.

The data categories in the Insertion Reports include:

Insertion Reports are extensively hyperlinked, including links to:

FlyBase is developing comprehensive Insertion Reports that will place all relevant data in one report.

B.10. Stocks

The Stocks section of FlyBase includes stock lists from both public and private collections of Drosophila. The Stocks directory contains search options, links to stock center web sites, stock order forms, and help files. Stocks should be requested from individual labs only if a comparable stock is not available from one of the public stock centers.

When the stock description provided by a public center is other than a genotype composed of valid symbols or the name of a wild-type strain, FlyBase creates a genotype where possible based on symbol synonyms. Laboratory stock lists in standardized formats are incorporated into FlyBase as is; FlyBase does not edit laboratory lists to create valid symbols. Laboratory stock lists in non-standard formats are simply posted and are available for browsing. The contents of individual laboratory stock lists are the responsibilities of the laboratories concerned and not of FlyBase. Contact Kathy Matthews (matthewk at indiana.edu, reformat to standard e-mail address) to contribute your own stock list to FlyBase.

Stock center stock information is available through Gene, Allele, Aberration and Transgene Insertion reports as well as directly from the Stocks data section. Laboratory stocks are linked to Gene, Allele, Aberration and Insertion reports when valid symbols are present in a genotype. Recently added stock center stocks may appear in the Stocks section before the links to Alleles, etc. have been updated. See Reference Manual C: Using FlyBase on the Web for help with stock list searches.

B.11. Genomic Clones and STSs

Genomic clone data are archived on FlyBase as a set of text files.The Drosophila Resources list includes information on how to request clones from the various projects included here. Questions about these data and materials should be directed to the genome projects themselves.

B.11.1. Cosmids and cosmid STSs

The cosmids are those from the European Drosophila Genome Project. The cosmid library was prepared from a Sau3A partial digest of Oregon-R adults and is in the Lorist 6 vector. The sequence of the Lorist 6 vector can be obtained by FTP from genome.wustl.edu, the file is in /pub/gsc1/sequence/vector/lorist6.seq. A full description of the techniques, and of the project as a whole, can be found in the following references:

STS sequences of many cosmids have been determined from either (or both) the SP6 or T7 promoters flanking the cloning site. These sequences are available from the EMBL/GenBank/DDBJ nucleic acid sequence data libraries. These sequences are also available from dbSTS, the NCBI STS database. The dbSTS records may include information from more recent matches of the STS sequences against other sequences than are available from the EMBL/GenBank/DDBJ accessions.

See the file Drosophila Resources for information on obtaining cosmids.

The following fields are included in cosmids-sts.txt:

This file is an output from the European Cosmid mapping Consortium's working database, and for this reason includes internal notes.

B.11.2. P1 clones and P1 STSs

The P1 library of D. melanogaster are largely obsolete and the Berkeley Drosophila Genome Project is discouraging the use of P1 clones. See the FlyBase file Drosophila Resources for additional information.

B.11.3. BAC clones and BAC STSs

Three libraries of BAC clones are now available. These were all made from DNA of the same y[1] ; cn[1] bw[1] sp[1] stock as was used for the Berkeley Drosophila Genome Project P1 clones.

The libraries are BACR made for the BDGP by K. Osoegawa and P. de Jong (Roswell Park), BACE and BACH made for the EDGP by Alain Billaud at CEPH (Centre d'Etude du Polymorphisme Humaine) with funding provided by a MRC project grant to D.M. Glover and M. Ashburner.

The BACR library is 18,432 clones in pBACe3.6 and the average clone size is 160-Kb. The BACE and BACH libraries are in pBeloBAC11 and consist of 23,400 clones of size range 75 - 150-Kb.

Information about obtaining BAC clones is included in the FlyBase file Drosophila Resources. STS sequences of many BACs have been determined from either (or both) the TET3 or T7 promoters flanking the cloning site. These sequences are available from the EMBL/GenBank/DDBJ nucleic acid sequence data libraries. These sequences are also available from dbSTS, the NCBI STS database. The dbSTS records may include information from more recent matches of the STS sequences against other sequences than are available from the EMBL/GenBank/DDBJ accessions.

B.11.4. Drosophila virilis P1 Clones

The data on P1 clones from D. virilis were provided by D. Hartl. The clones are described in:

B.11.5. YACs

The YACS are those from the St. Louis and Harvard projects. References for the YACs:

A complete set of YAC clones is maintained by Ian Duncan and clones may be requested from him. See Drosophila Resources for contact information.

B.12. References - the Drosophila Bibliography

The References section of FlyBase holds as complete a bibliography of papers, books, etc., concerned with the biology and genetics of Drosophila that we can assemble. The sources of these references are given in section B.12.4. of the FlyBase Reference Manual. A variety of search options are available (see Reference Manual C: Using FlyBase on the Web for information on FlyBase searches) in References and in the All Searches section.

Reference reports include the bibliographic citation, the National Library of Medicine's PubMed abstract if available, and a linked list of genes, alleles and aberrations for which the paper includes data that have been curated by FlyBase. See for example the report of Yasuda et al., 1995. Users should be aware that not all papers in the FlyBase bibliography have been curated using current practice, thus a sparse list of FlyBase data items does not necessarily indicate a lack of content in the paper.

B.12.1. Reference formats

The bibliographic file is distributed in four different formats:

There are six groups of files for each format, sorted by decade (earlier than 1950, 1950-1959, 1960-1969, 1970-1979, 1980-1989, 1990-present). The archived files (rpt, refer and csv formats) are available by ftp from the Indiana server.

references-obsolete.txt is a list of deleted FlyBase FBrf identifier numbers, with a note on whether the reference to which this refers has been deleted from the files or merged with another record.

Files with the extension rpt are the report format files used for searches. Here is a typical entry:

Complete information for the journal abbreviation is available through the Journal/Book Abbreviations Search or the file references-abbreviations.rpt. The Also In field provides the FlyBase ID of any other appearances of this paper in the literature.

references.*.star are field delimited text files. Each record is terminated by a # character on a line of its own, and all other lines have an * as the first character, followed by a field-identifier letter, a space, and then the field value starting in column 4. There are no trailing spaces -- in particular there is no space in column 3 unless there is something in the field. # and * do not appear anywhere other than in column 1.

This is an example:

*U FBrf0030018
*a Karakin
*b E.I.
*c T.Y.
*d Lerner
*e V.A.
*f Kokoza
*g S.M.
*h Sviridov
*t 1977
*u Secretion antigens of salivary glands of larval Drosophila melanogaster.
*v
*w Dokl. Akad. Nauk SSSR
*x
*y 233
*z 698--701
*Y 1
*L Russian
#

references.*.refer files are formatted in the Unix REFER format to allow direct import into Refer, EndNote, Pro-Cite and other reference handling software. This format is a text file with tags that each begin with the % symbol. Records are separated by a blank line. In this file we use the EndNote tags. Not all the tags are used. Note, also, that empty fields are absent from a record.

%A author(s)
%B secondary title
%C place published
%D year
%E secondary author
%F FlyBase reference ID
%G type of publication
%H ISBN (for books) or ISSN (for serials)
%I publisher
%J journal or book reference
%K keyword [not used]
%L journal CODEN
%N issue of journal
%O Medline identifier; BIOSIS identifier; language
%P pages
%Q author
%R title
%S tertiary title
%T title
%U series of journal
%V Volume
%W also published as
%X abstract
%Y tertiary author
%Z errata or reference ID(s) of relevant obsolete records
<blank line>

An example of a reference in REFER format is:

%A E.I. Karakin
%A Lerner, T.Y.
%A Kokoza, V.A.
%A Sviridov, S.M.
%D 1977
%T Secretion antigens of salivary glands of larval Drosophila melanogaster.
%D 1977
%V 233
%P 698--701
%O Languages: Russian
%N 1
%J Dokl. Akad. Nauk SSSR
%F FBrf0030018
%W also in FBrf0030017
<blank line>

references.*.csv files in comma-separated-values format, that can be used by many spreadsheet and database programs. The format is:

primary_author :primary author
other_authors :subsequent authors, semicolon separated
pub_title :full title of the publication
year :year of publication
volume :volume number
publisher :publisher
pubplace :place of publication
pages :page range
volumetitle :title of part if one of a series
language :language that publication is written in
language2 :any alternate languages
series :series of journal
issue :issue of journal
type :type of publication, can be Book, Abstract, etc
med_uid :Medline identifier
biosis :Biosis identifier
ISBN or ISSN :ISBN (for books) or ISSN (for serials)
CODEN :CODEN (for periodicals)
errata :if this entry is an errata (signified by a type of 'E') this field will provide the FlyBase identifier for publication to be corrected
journal_abbrev :journal abbreviation or book reference
FlyBase_id :unique FlyBase identifier
also_published_in :papers which appear in more than one place will have FlyBase UIDs of the other publications given here

An example of a reference in csv format is:

"Karakin,E.I.","Lerner,T.Y.; Kokoza,V.A.; Sviridov,S.M.","Secretion antigens of salivary glands of larval Drosophila melanogaster.","1977","233","","","698--701","","Russian","","","1","","","","0","","Dokl. Akad. Nauk SSSR", "FBrf0030018","FBrf0030017"

B.12.2. Reference classes

The bibliographic records fall into several different classes. The great majority are papers in journals, but there are also papers in edited publications, theses, manuscripts, other electronic databases and, even, the odd film, archival material and newspaper article. The following classes are recognized by FlyBase and encoded in the *T field [cv]:

The default type is a journal article or book chapter (i.e., paper).

B.12.3. Journals and multi-author works

Because we have collected data for the reference file from a number of different sources a variety of abbreviations have often been used for the same journal or publication. FlyBase is totally consistent in how it refers to any particular journal or any other publication for which there is more (at least potentially) than one record in the bibliography itself. It does this by maintaining a file of reference abbreviations. This includes not only the abbreviations of journals, but also information on any work, e.g., edited book, symposium volume, conference proceedings, abstract book, that includes more than one independently authored contribution.

The great majority of journal titles and titles of other publications have been verified by reference to the on-line catalogs of the Library of Congress, University of California (Melville) or the University of Cambridge.

Many journals have titles in more than one language. In such cases the title in the second language is enclosed within square brackets.

The file references-abbreviations.csv lists alphabetically the journal abbreviations used, and gives the full name(s) of the journals, place(s) of publication and, where possible, dates and volume numbers. [The information on volume numbers and dates of publication are useful in detecting obvious errors in citations.] This file also includes information on all other multi-author or edited works. These are referred to in the bibliography itself as if they were journals. Maintaining these references as abbreviations in this file ensures total consistency. Entries are sorted alphabetically by their abbreviation. The fields used are:

This file is also available in csv and rpt formats.

There remain a few edited publications and a few journals whose full details have so far proved impossible to find. These can be recognized by only having an abbreviated title, and (usually) no other information in references-abbreviations.csv. Any help in tracking these down will be appreciated.

B.12.4. Reference sources

See Reference sources for a list of the major sources that have been incorporated into the FlyBase Bibliography.

B.12.5. Copyright statements

The following statement is with respect to the copyright of bibliographic entries taken from BIOSIS:

"This database is copyrighted by Biological Abstracts Inc. (BIOSIS®). All rights reserved. No part of the information may be reproduced in hard copy, machine-readable form or other form without advance written permission from BIOSIS. Information has been obtained from public sources believed to be reliable. BIOSIS makes a diligent effort to provide complete and accurate representation of the bioscientific and other literature in its publications and services. However, BIOSIS does not guarantee the accuracy, adequacy, or completeness of any information and BIOSIS makes no warranties or representations of any kind, express or implied, including but not limited to warranties of merchantability or fitness for particular purpose. BIOSIS disclaims all liability for errors or omissions that may exist and shall not be liable for any incidental, consequential or other damages (whether resulting from negligence or otherwise) including, without limitation, exemplary damages or lost profits arising out of or in connection with the use of this database. Errors or omissions may be reported to Biological Abstracts Inc., 2100 Arch Street, Philadelphia, PA 19103-1399."

The following statements are with respect to the copyright of Parts 5 and 6 of Herskowitz's bibliography:

"Bibliography on the genetics of Drosophila: Part 5, by Irwin H. Herskowitz is reproduced with the permission of Macmillan Publishing Company. Copyright ©1969 by Macmillan Publishing Company. "
"Bibliography on the genetics of Drosophila: Part 6, by Irwin H. Herskowitz is reproduced with the permission of Macmillan Publishing Company. Copyright ©1974 by Macmillan Publishing Company."

B.13. People

The People section of FlyBase provides address and e-mail contacts for Drosophila workers. The original list of contact information was compiled from five sources - an E-mail address list compiled and maintained by Dr. John Haynie, the records of the Bloomington Drosophila Stock Center, the distribution list of Drosophila Information Newsletter, a subset of the Genetics Society of America's mailing and membership list, and the mailing list for the European Drosophila Research Conference.

The People list is now user maintained via addition and correction forms available in the People section. The file of updates is searched along with the master file so new information is immediately available to FlyBase users. FlyBase encourages you to keep your FlyBase contact information up to date. Use the Add a New Address option if there is no listing for you in the People list. Use the Update Your Current Address option if you wish to make corrections to an existing record. Until the next update of the master files, any updates you provide through the correction form will appear in search results as additional, updated, records, rather than modifying or replacing the existing record.

The fields in people.* are:

The information contained in People is intended for the personal use of the Drosophila and scientific communities. These lists are the property of the FlyBase Consortium and they are not to be used for commercial purposes. Permission must be obtained from FlyBase if they are to be used for any purpose other than that intended by the Consortium.

B.14. Anatomy and Images

The Anatomy and Images section of FlyBase contains tools and data that provide access to genetic information based on anatomy and development. If you want to know when and where a gene is expressed (including reporter genes such as Ecol\lacZ and Scer\GAL4), or which genes can affect a given body part when mutant, this is the place to start. Controlled vocabularies for anatomical features and developmental stages link, through FlyBase vocabulary Term Reports, relevant gene, allele, transcript and protein records to stages of development, a region of the body or to a specific body part. Miscellaneous images and quick-time films are also accessible from this section.