FlyBase provides both general search tools, which allow searches of more than one data class, and specific search tools, which allow complex queries of a particular data class. Search input may be free text, gene symbol, nucleic acid or peptide sequence, cytological map location, or text based upon preselected terms (controlled vocabulary or CV). Specialized query tools include queries of expression pattern data and a sequence-based query for short sequence motifs. The table below, FlyBase Search Tools Summary , summarizes the search options currently available.
Click to jump to more help on:
More detailed search help sections are available from most of the search page options and in the FlyBase Reference Manual ( section C.3. FlyBase Search Tools). The search engine behind FlyBase text searches is SRS ( http://srs.ebi.ac.uk/ ). The search engine for the BDGP database is written in object-perl and runs on top of an Informix database
Genes | Maps |
Alleles | Annotation/ Sequences |
Aberrations | Transposons |
Transgene Constructs | References | Stocks | People | Polypeptides | Anatomy/ |
Transcripts | |
Quick search of community data | X | X | X | X | X | X | X | X | X | X | X | ||
Search annotated sequences | X | X | |||||||||||
Search Clones | X | ||||||||||||
Search by cytology | X | X | X | X | X | X | X | ||||||
Browse cytological maps | X | X | X | X | |||||||||
Browse Graphical Maps (GBrowse) | X | X | |||||||||||
BLAST/Pattern Search | X | X | |||||||||||
Search by anatomy, CV term | X | X | X | X | X | ||||||||
Specific Data Search (complex query) | X | X | X | X | X | X | X | X | X | X | X |
In a simple (Quick) query with a single text entry box, multiple terms can be specified, separated by logical operators ("and", "or", "but not", "not"). When multiple terms are entered without logical operators, "and" is assumed. Searches of genome project data are the exception, since they currently do not allow multiple term searches.
It is recommended that you use the wild card * (asterisk) if you know only a part of a term or to find all terms with a shared root. The wild card may be used before, after, or between specified characters; multiple wild cards may be used.
Some Quick queries allow the use of the specific field "symbol". If you know the valid symbol for the genetic entity of interest, this is more efficient than searching via the "all text" field.
Complex queries allow you to limit your search to a specific field or combination of fields within a particular data class. This allows much more precise and focused queries. Complex queries also make use of predefined terms (controlled vocabulary), usually presented in scrollable menu boxes. For example, the Gene Search option allows you to select among a listing that describes "class of gene" and includes "protein-coding", "untranslated RNA", "mitochondrial", and "engineered gene"; Allele Search allows you to search the "Origin of mutant alleles" field for mutations induced by a specific mutagen. See below, search using controlled vocabulary terms .
When using a complex query, you may find all records with any type of entry within a specific field by entering "?*". To find all records for which the corresponding field is blank, enter "!*".
Any query that results in more than one hit will present the hit list in tabulated form. For complex queries, the hit list is followed by a "Refine query" option (at the bottom of the page) that allows you to narrow down the number of hits by specifying further search criteria. This may be done repeatedly.
Whenever a query results in "no matches," it is advisable to try to verify that it is a valid null result. You may want to try using the same term flanked by wild cards. If you know of a term that SHOULD work, try it. If the query tool appears to be malfunctioning, please contact FlyBase at flybase-help@morgan.harvard.edu .
The FlyBase BLAST server allows you to search D. melanogaster sequence databases. A BLAST help page describing the different BLAST programs and parameters can be accessed from the main BLAST query page. The BLAST program to be used is selected from a pull-down menu. For nucleotide queries, BLASTN, BLASTX, or TBLASTX should be used; for amino acid queries, BLASTP or TBLASTN should be used. You may also select from the various D. melanogaster sequence datasets (e.g. ESTs, genomic sequence, P insertion sites, STSs). The default is "All Drosophila", which includes all of the nucleotide databases except transposons and repeats.
The results of a BLAST search will be a hyperlinkable list of top matches to entries in BDGP sequence databases or GenBank, as well as links to the alignment of the sequence at the score.
To find exact matches of a short sequence pattern in the Drosophila databases, or in any of the GenBank databases from other species, use the Pattern Search tool. Enter a nucleotide or amino acid query sequence (which can include N or X wild card characters) and select the sequence dataset. The result will be a list with all database entries containing an exact match to the sequence pattern and the coordinates of the match.
"Cytological map position" refers to the classical polytene chromosome maps of Drosophila melanogaster. The standard map of the polytene chromosomes divides the genome into 102 numbered bands (1-20 is the X, 21-60 is the second, 61-100 the third, and 101-102 the fourth); each of these is divided into six letter bands (A-F), and these are further subdivided into numbered divisions.
The cytological range you wish to search may be specified at any level of precision (e.g.: 34 or 34E-34F or 34E1-34E5). Some map-based queries provide two boxes to enter the cytological range, others provide only one box. In the latter case, the entry must be in the form of a hyphen-separated range: 34E-34F or 34E-F but not 34EF.
There are also several browsable map options that provide access to data pertaining to particular chromosomal regions using visual interfaces.
FlyBase is increasing its use of controlled vocabulary terms to capture data in a manner that allows rigorous and complex searches. In designated fields, only previously defined terms are used. A listing of the controlled vocabulary currently in use may be found at controlled-vocabularies.txt .
Many complex query tools explicitly list the controlled vocabulary terms allowable in a given field by means of a scrollable menu. Some query tools still under development do not; allowable terms may be found in controlled-vocabularies.txt . Often, it is possible to discern terms you will find useful by observing the terms used in relevant data class reports. Wild cards (*) may be used in controlled vocabulary text query fields.
FlyBase has organized large controlled vocabulary listings as hierarchies. This is true of the anatomical (body parts) CV developed for describing phenotypes and expression patterns. The expression pattern search pages display only a portion of the controlled vocabulary terms and allow you to move deeper into the hierarchy wherever you choose.
The FlyBase Gene Expression summary tool allows discovery of several data classes associated with anatomical expression patterns, including phenotypes of mutant alleles, wild-type expression patterns, and expression of reporter (GAL4/lacZ/GFP) lines. This tool allows you to browse through the extensive anatomical controlled vocabulary, and to stop and initiate a search at any level. Such a search returns a table from which you can: (1) Retrieve the items for a given class of data for a given CV term by clicking on the number in the appropriate box within the table; (2) Descend down the hierarchy to increasingly specific subcomponents of an anatomical term by clicking on the hyperlinked term itself; (3) List multiple data classes for multiple anatomical terms by checking the boxes to the left of the term, then selecting the data classes and other options from the form at the bottom of the page, and clicking "List." To jump to a report about another anatomical term, simply type that controlled vocabulary term in the box at the top of the page and click "Find."