Kenneth,
The data you ask about are available, with much more, in the bulk
flybase gene data files, from ftp://flybase.bio.indiana.edu/flybase-data/
(see http://flybase.bio.indiana.edu/.data/docs/refman/refman-D.html
D.2.2. Bulk data in ACODE and XML formats for some basic info on these).
The full flat files (~ 100 MB in size) with these data
key=value format at
ftp://flybase.bio.indiana.edu/flybase-data/acode/data/FBgn.acode
xml format at
ftp://flybase.bio.indiana.edu/flybase-data/xml/FBgn.xml
There are also some other ways to get the particular (a), (b) and (c)
requests of yours.
For both (a and c) list all Dmel. genes, and a file of all homologues
Use the gene search form
http://flybase.bio.indiana.edu/genes/fbgquery.hform
with Species field set to Dmel, no other entries, and get
Query: [libs={FBgn PFgn}-org:Dmel] but not [libs-cla:uncertain], No. matches= 40626
If you want a summary table without homolog fields,
find "Page to item" box below list, enter
[ 1] and [999999] items, select Format: spreadsheet, tabbed, and get this
# FlyBase Genes query results. Query: [libs={FBgn PFgn}-org:Dmel] but not [libs-cla:uncertain], No. matches= 40626
# # Symbol Name Map Alleles Stocks Refs DNA acc. Date Rept. size ID
1 alphagamma-element alphagamma element 87C1 2 - 10 3 23 Aug 02 3300 FBgn0004084
2 beta'Cop beta'-coatomer protein 34B4 1 - 9 15 23 Aug 02 3258 FBgn0025724
3 beta'Cop - 34B9 - - - - 24 Oct 02 166 PFgn0025724
...
40623 zwilch - 100B3 - - - - 24 Oct 02 162 PFgn0061476
40624 Zwim Zwirbelmuetze - 1 - 1 - 22 Aug 02 833 FBgn0062246
40625 Zyx102EF - 102E--F 1 - 10 6 22 Aug 02 2369 FBgn0011642
40626 Zyx102EF - - - - 1 - 24 Oct 02 170 PFgn0011642
To get the Homolog field along with others, find below this
Batch Download
and select these or other fields, where HG is the homology field
Report only Select fields:ID,GSYM,NAM,HG
Set Format to Spreadsheet, tabbed
Set Fetch items to All (maybe test first with a few hundred to see if
it is what you want).
You will get this kind of table
FlyBase_ID Symbol Full_name Similar_genes
FBgn0010339 128up upstream of RpIII128 ; Caenorhabditis elegans C02F5.3 WP:CE00039; Homo sapiens 'neural precursor cell expressed, developmentally down-regulated 3' gi:4758796; Mus musculus Nedd3 MGI:97296; Saccharomyces cerevisiae 'HYPOTHETICAL 40.7 KD PROTEIN IN PYK1-SNC1 INTERGENIC REGION' SWP:P39729 gi:731276; Xenopus laevis 'DEVELOPMENTALLY REGULATED GTP-BINDING PROTEIN DRG (XDRG)' SWP:P43690 gi:1169421;
FBgn0005673 1360 1360 element
FBgn0020238 14-3-3epsilon
FBgn0004907 14-3-3zeta ; Caenorhabditis elegans F52D10.3 WP:CE03389; Homo sapiens 'tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, zeta polypeptide' gi:4507953; Mus musculus Ywhaz MGI:109484; Saccharomyces cerevisiae BMH2 SGDID:L0000186; rat '14-3-3 protein isoform zeta' PIR:JC5232 gi:2143553;
For (b), gene interaction data is complex and currently we don't
have a summarized table option. It is something we are working
on. This includes 6 data fields:
Interacts genetically with [WTI]
Genetic interaction (effect, class) [GIC]
Genetic interaction (class, effect) [GIC2]
Genetic interaction (effect, anatomy) [GIA]
Genetic interaction (anatomy, effect) [GIA2]
Genetic interaction info. [GII]
GIC,GIA.. here are the keys for these fields in flybase data.
This is fairly complex data, and currently the flybase web interface
doesn't have a batch download option to select just this data.
You can obtain the full flybase data set which includes this,
in XML or simpler tag=value formats, and parse out with software the
gene interaction fields.
Gene interaction data fields are attributed to particular experimental
references for individual alleles of a given gene record.
Within the flybase gene record they appear as
Gene a
allele a[1]
reference R1
GIC, GIC2, ... fields
ref. R2
GIC, ...
allele a[2]
...
I've appended at end an experimental perl script for parsing
out gene interaction data from flybase FBgn.acode bulk data,
producing graph structure data.
-- Don Gilbert
#!/usr/local/bin/perl
# fbgiparse.pl
=head1 NAME
fbgiparse - flybase gene iteraction data parser
=head1 NOTES
parse gene interaction fields from flybase gene data FBgn.acode
ftp://flybase.bio.indiana.edu/flybase-data/acode/data/FBgn.acode
write as digraph or related graph data
This includes 6 data fields:
Interacts genetically with [WTI]
Genetic interaction (effect, class) [GIC]
Genetic interaction (class, effect) [GIC2]
Genetic interaction (effect, anatomy) [GIA]
Genetic interaction (anatomy, effect) [GIA2]
Genetic interaction info. [GII]
GIC,GIA.. here are the keys for these fields in flybase data.
This is fairly complex data, and currently the flybase web interface
doesn't have a batch download option to select just this data.
You can obtain the full flybase data set which includes this,
in XML or simpler tag=value formats, and parse out with software the
gene interaction fields.
Gene interaction data fields are attributed to particular experimental
references for individual alleles of a given gene record.
Within the flybase gene record they appear as
Gene a
allele a[1]
reference R1
GIC, GIC2, ... fields
ref. R2
GIC, ...
allele a[2]
...
This is experimental software; Don Gilbert, gilbertd@bio.indiana.edu, 2002
=cut
use lib('/Users/gilbertd/perl','/bio/perlib');
use strict;
use vars qw( $nodenums %didnode %nodenam %regex);
#use Graph;
#use Graph::Directed;
#use Graph::Writer::Dot;
#=head2 notes
#
# graphviz dot graphs:
# foreach f ( $gi )
# set w=`fgrep -c 'label=' $f`
# if ($w > 2) then
# $gr/bin/dot -Tgif -Nshape=box -Nfontsize=8 -o$f.gif $f >& /dev/null
# endif
# end
#
# $gr/bin/dot -Tgif -Nshape=box -Nfixedsize=true -Nwidth=0.1 -Nstyle=invis \
# -Gsize="15,7" -Grotate=90 -Gratio=compress -o$f.gif $f > & /dev/null
#
#=cut
my $DG;
my @keys= qw(GIA GIA2 GIC2 GIC WTI);
my %keys= map{ $_,1; }@keys;
my $path;
#$path= '/Volumes/MacHome/bio/flybase/geneinter/';
#$path= '/Users/gilbertd/bio/flybase/fbjava/fbrj/fbobs/';
$path= './';
my $file;
# $file= "dally.acode";
$file= "FBgn.acode";
chomp($path);
my $savep= 'gi';
my $f= (@ARGV) ? shift @ARGV : "$path/$file";
open(F,$f) or die $f;
init();
my $testline= 0; ##1;
my $dographviz= 1;
my $dograph= 0;
my $testlimit= 0;
my $showprogress= 0; #1;
my $dogene= ''; # 'dally';
my %syms= (); my %rete= (); my @rete= (); my %eg= ();
my $mainsym; my $mainid;
my $atkey;
$savep= $dogene if ($dogene && $dogene ne 'nogene');
graphtop();
my $nrec= 0;
my $indogene= 0;
while(){
chomp;
if (/^GENR/) { startrec(); }
elsif (/^RETE\|(.+)/) {
@rete= split "\t", $1;
%rete= map{ split(/\s/,$_,2); } @rete;
$mainsym= $rete{GSYM}; $mainsym =~ s/^\s*\d\s+//;
$mainid= $rete{ID}; $mainid =~ s/^\s*\d\s+//;
if ($testline && $mainsym =~ /$dogene/) {
print STDERR "# at $mainsym: $_\n";
$indogene= 1;
}
else { $indogene= 0; }
# print "# node [name=\"$mainsym\" id=$mainid ];\n" if $testline;
}
elsif (/^# EOR/) {
endrec();
last if ($testlimit && $nrec>=$testlimit);
}
elsif (/^(\w+)\|(.+)/) {
my ($k,$v)= ($1,$2);
if ($keys{$k}) { $atkey= $k; edge($atkey,$v); }
else { $atkey= 0; }
}
elsif ($atkey && /^\|(.+)/) {
edge($atkey,$1);
}
}
graphend();
sub graphtop {
# if ($dograph){
# $DG= new Graph::Directed;
# }
if($dographviz){
print "digraph geneinter ";
## if ($title) { print "[label=\"$title\"]"; }
print "{\n";
## add graph styles node [fontname=helvetica,fontsize=10];
## page="8.5,100" ; ratio=???? ; ...
}
}
sub graphend {
print "\n}\n" if($dographviz);
# if ($dograph) {
# # @S = $DG->strongly_connected_components;
#
## my $outh= IO::Handle->new_from_fd( fileno(STDOUT),"w"); # *STDOUT;
# my $writer = Graph::Writer::Dot->new();
# my $outh;
#
## $outh = IO::File->new(">$path/gi2.dot");
## warn "Write to $path/*.dot\n";
## open(DOT,">gi2.dot") || die "'gi2.dot'";
## $outh= *DOT;
#
# my $fn;
# $fn= "$path/$savep-plain.dot";
# warn "$fn\n";
# my $T = $DG;
# $T->set_attribute('name','geneinter');
# $writer->write_graph($T, $fn);
#
# $fn= "$path/$savep-strong.dot";
# warn "$fn\n";
# $T = $DG->strongly_connected_graph;
# $T->set_attribute('name','strongly_connected_graph');
# $writer->write_graph($T, $fn);
#
# $fn= "$path/$savep-MST_Kruskal.dot";
# warn "$fn\n";
# $T = $DG->MST_Kruskal;
# $T->set_attribute('name','MST_Kruskal');
# $writer->write_graph($T, $fn);
#
# my $node= nodenum($dogene);
# if ($node) {
# $fn= "$path/$savep-MST_Prim.dot";
# warn "$fn\n";
# $T = $DG->MST_Prim($node);
# $T->set_attribute('name','MST_Prim');
# $writer->write_graph($T, $fn);
#
# ## no good, memory pig
## warn "$path/gi-Dijkstra.dot\n";
## $T = $DG->SSSP_Dijkstra($node);
## $T->set_attribute('name','Dijkstra');
## $writer->write_graph($T, "$path/gi-Dijkstra.dot");
# }
#
## warn "$path/gi-APSP.dot\n";
## $T = $DG->APSP_Floyd_Warshall;
## $T->set_attribute('name','APSP_Floyd_Warshall');
## $writer->write_graph($T, "$path/gi-APSP.dot");
#
## $T = $G->TransitiveClosure_Floyd_Warshall;
#
#
## $T = $DG->SSSP_DAG($s);
## $T->set_attribute('name','SSSP_DAG');
## $writer->write_graph($T, $outh);
#
# close($outh) if (ref $outh);
# }
}
sub edge {
my($k,$v)= @_;
my $kind= '?';
if ($k eq 'WTI') {
$v =~ s/\s.+$//;
$syms{$v}++;
}
else {
# print "$k: $v\n" if $testline;
my $showit= ($testline && (/$dogene/ || $testlimit));
my @m= m/$regex{regenes}/g;
print "# syms: ",join(",",@m)," = " if ($showit);
my $re;
$re= $regex{reNonint};
if ($v =~ m/$re/) {
$kind= 'reNonint';
showre('reNonint',$re,$v) if ($showit);
}
elsif ($v =~ m/$regex{reInter}/) {
foreach my $r (qw(reSupper reSuppby reEnher reEnhby)) {
$re= $regex{$r};
if ($v =~ m/$re/) {
$kind= $r;
showre($r,$re,$v) if ($showit);
last;
}
}
}
else {
$kind= 'unknown';
showre('unknown','.',$v) if ($showit);
}
for my $m (@m) {
next if ($m eq 'Scer\GAL4'); # ~= /Scer|GAL4/);
$eg{$m}{$kind}++;
}
print "\n" if ($showit);
}
}
sub showre {
my $r= shift;
my $re= shift;
my $vx= shift;
$vx =~ s/($re)/\U$1/; $vx =~ s/\@.+$//;
print " $r: $vx";
}
sub startrec {
%syms= (); %rete= (); @rete= (); %eg= ();
# print "\n";
}
sub nodenum {
my ($m,$dosave,$id)= @_;
my $nd= $didnode{$m};
unless($nd) {
++$nodenums;
$nd= 'n'.$nodenums;
$didnode{$m}= $nd;
$nodenam{$nd}= $m;
if ($dosave && $dographviz) {
print "$nd [label=\"$m\"";
print " id=$id" if ($id);
print "];\n";
}
# if ($dosave && $dograph) {
# $DG->add_vertex($nd);
# $DG->set_attribute('label', $nd, $m);
# $DG->set_attribute('id', $nd, $id) if ($id);
# }
}
return $nd;
}
sub endrec {
# dump %syms,
if ($indogene) {
print STDERR "# endrec $mainsym: ".join(",",keys %eg)."\n";
}
return unless(%eg);
$nrec++;
my $showit= ($testline && ($testlimit || ($mainsym =~ /$dogene/)));
unless ($showit) {
foreach my $m (keys %eg) { $showit=1 if ($m =~ /$dogene/); }
}
if ($showprogress) {
print STDERR '.';
print STDERR " $mainsym $nrec\n" if (($nrec % 50) == 0);
}
return unless(
$testline ? $showit : $dograph||$dographviz
);
my $nmain= nodenum( $mainsym,($testline ? $showit : $dographviz),$mainid);
print "# node $nmain [name=\"$mainsym\" id=$mainid ];\n" if $showit;
for my $m (sort keys %eg) {
my $nm= nodenum($m, ($testline ? $showit : $dographviz));
my $nam= $nodenam{$nm};
for my $k (sort keys %{$eg{$m}}) {
my $w= $eg{$m}{$k};
$w= -$w if ($k =~ /Sup/);
# if ($dograph) {
# if ($k =~ /by$/) { $DG->add_weighted_edge($nm, $w, $nmain); }
# elsif ($k =~ /er$/) { $DG->add_weighted_edge($nmain, $w, $nm); }
# # else ($k =~ /unknown|reNonint/) { $DG->add_weighted_edge($mainsym, 0, $nam); }
# }
if($dographviz && !$testline) {
if ($k =~ /by$/) { print "$nm -> $nmain [weight=$w];\n"; }
elsif ($k =~ /er$/) { print "$nmain -> $nm [weight=$w];\n"; }
elsif ($k =~ /unknown|reNonint/) { print "$nmain -- $nm [weight=$w];\n"; }
}
if($showit && !$dograph) {
if ($k =~ /by$/) { print "${nm} -> ${nmain} [weight=$w k=$k];\n"; }
if ($k =~ /er$/) { print "${nmain} -> ${nm} [weight=$w k=$k];\n"; }
if ($k =~ /reNonint/) { print "${nmain} -- ${nm} [weight=$w k=$k];\n"; }
if ($k =~ /unknown/) { print "${nmain} -- ${nm} [weight=$w k=$k];\n"; }
}
}
}
# print "\n# ------------------\n";
}
sub init {
%regex= (
reNonint => 'non-(suppress|enhanc)',
reInter => '\W*(suppress|enhanc)',
reSupper => '\W*suppress(or|es)\W',
reSuppby => '\W*suppress(ible)\W',
reEnher => '\W*enhanc(er|es)\W',
reEnhby => '\W*enhanc(eable)\W',
# regenes => '\@([^@<]+)[@<]',
regenes => '\@([^@<]+)[^@]*\@',
);
}
=head2 regex for GeneInter fields
reNonint = new RE("non-(suppress|enhanc)");
reInter = new RE("\\W(suppress|enhanc)");
reSupper = new RE("\\W*suppress(or|es)\\W"); //.+(of)? //? non-suppressor
reSuppby = new RE("\\W*suppress(ible)\\W"); //.+(by)?
reEnher = new RE("\\W*enhanc(er|es)\\W"); //.+(of)?
reEnhby = new RE("\\W*enhanc(eable)\\W"); //.+(by)?// what of non-enhanceable by
reGenes = new RE("@([^@<]+)[@<]");
int ikind;
if (reNonint.isMatch(s)) {
ikind= kOtherInter;
}
else if (reInter.isMatch(s)) {
if (reSupper.isMatch(s)) ikind= kSupper;
else if (reSuppby.isMatch(s)) ikind= kSupby;
else if (reEnher.isMatch(s)) ikind= kEnher;
else if (reEnhby.isMatch(s)) ikind= kEnhby;
else ikind= kOtherInter;
}
else {
ikind= kOtherInter;
}
REMatchEnumeration ren= reGenes.getMatchEnumeration( s);
while (ren.hasMoreMatches()) {
rem= ren.nextMatch();
String sym= rem.substituteInto("$1");
interset[ikind].put(sym,sym); // count ?
}
=cut