Kenneth, The data you ask about are available, with much more, in the bulk flybase gene data files, from ftp://flybase.bio.indiana.edu/flybase-data/ (see http://flybase.bio.indiana.edu/.data/docs/refman/refman-D.html D.2.2. Bulk data in ACODE and XML formats for some basic info on these). The full flat files (~ 100 MB in size) with these data key=value format at ftp://flybase.bio.indiana.edu/flybase-data/acode/data/FBgn.acode xml format at ftp://flybase.bio.indiana.edu/flybase-data/xml/FBgn.xml There are also some other ways to get the particular (a), (b) and (c) requests of yours. For both (a and c) list all Dmel. genes, and a file of all homologues Use the gene search form http://flybase.bio.indiana.edu/genes/fbgquery.hform with Species field set to Dmel, no other entries, and get Query: [libs={FBgn PFgn}-org:Dmel] but not [libs-cla:uncertain], No. matches= 40626 If you want a summary table without homolog fields, find "Page to item" box below list, enter [ 1] and [999999] items, select Format: spreadsheet, tabbed, and get this # FlyBase Genes query results. Query: [libs={FBgn PFgn}-org:Dmel] but not [libs-cla:uncertain], No. matches= 40626 # # Symbol Name Map Alleles Stocks Refs DNA acc. Date Rept. size ID 1 alphagamma-element alphagamma element 87C1 2 - 10 3 23 Aug 02 3300 FBgn0004084 2 beta'Cop beta'-coatomer protein 34B4 1 - 9 15 23 Aug 02 3258 FBgn0025724 3 beta'Cop - 34B9 - - - - 24 Oct 02 166 PFgn0025724 ... 40623 zwilch - 100B3 - - - - 24 Oct 02 162 PFgn0061476 40624 Zwim Zwirbelmuetze - 1 - 1 - 22 Aug 02 833 FBgn0062246 40625 Zyx102EF - 102E--F 1 - 10 6 22 Aug 02 2369 FBgn0011642 40626 Zyx102EF - - - - 1 - 24 Oct 02 170 PFgn0011642 To get the Homolog field along with others, find below this Batch Download and select these or other fields, where HG is the homology field Report only Select fields:ID,GSYM,NAM,HG Set Format to Spreadsheet, tabbed Set Fetch items to All (maybe test first with a few hundred to see if it is what you want). You will get this kind of table FlyBase_ID Symbol Full_name Similar_genes FBgn0010339 128up upstream of RpIII128 ; Caenorhabditis elegans C02F5.3 WP:CE00039; Homo sapiens 'neural precursor cell expressed, developmentally down-regulated 3' gi:4758796; Mus musculus Nedd3 MGI:97296; Saccharomyces cerevisiae 'HYPOTHETICAL 40.7 KD PROTEIN IN PYK1-SNC1 INTERGENIC REGION' SWP:P39729 gi:731276; Xenopus laevis 'DEVELOPMENTALLY REGULATED GTP-BINDING PROTEIN DRG (XDRG)' SWP:P43690 gi:1169421; FBgn0005673 1360 1360 element FBgn0020238 14-3-3epsilon FBgn0004907 14-3-3zeta ; Caenorhabditis elegans F52D10.3 WP:CE03389; Homo sapiens 'tyrosine 3-monooxygenase/tryptophan 5-monooxygenase activation protein, zeta polypeptide' gi:4507953; Mus musculus Ywhaz MGI:109484; Saccharomyces cerevisiae BMH2 SGDID:L0000186; rat '14-3-3 protein isoform zeta' PIR:JC5232 gi:2143553; For (b), gene interaction data is complex and currently we don't have a summarized table option. It is something we are working on. This includes 6 data fields: Interacts genetically with [WTI] Genetic interaction (effect, class) [GIC] Genetic interaction (class, effect) [GIC2] Genetic interaction (effect, anatomy) [GIA] Genetic interaction (anatomy, effect) [GIA2] Genetic interaction info. [GII] GIC,GIA.. here are the keys for these fields in flybase data. This is fairly complex data, and currently the flybase web interface doesn't have a batch download option to select just this data. You can obtain the full flybase data set which includes this, in XML or simpler tag=value formats, and parse out with software the gene interaction fields. Gene interaction data fields are attributed to particular experimental references for individual alleles of a given gene record. Within the flybase gene record they appear as Gene a allele a[1] reference R1 GIC, GIC2, ... fields ref. R2 GIC, ... allele a[2] ... I've appended at end an experimental perl script for parsing out gene interaction data from flybase FBgn.acode bulk data, producing graph structure data. -- Don Gilbert #!/usr/local/bin/perl # fbgiparse.pl =head1 NAME fbgiparse - flybase gene iteraction data parser =head1 NOTES parse gene interaction fields from flybase gene data FBgn.acode ftp://flybase.bio.indiana.edu/flybase-data/acode/data/FBgn.acode write as digraph or related graph data This includes 6 data fields: Interacts genetically with [WTI] Genetic interaction (effect, class) [GIC] Genetic interaction (class, effect) [GIC2] Genetic interaction (effect, anatomy) [GIA] Genetic interaction (anatomy, effect) [GIA2] Genetic interaction info. [GII] GIC,GIA.. here are the keys for these fields in flybase data. This is fairly complex data, and currently the flybase web interface doesn't have a batch download option to select just this data. You can obtain the full flybase data set which includes this, in XML or simpler tag=value formats, and parse out with software the gene interaction fields. Gene interaction data fields are attributed to particular experimental references for individual alleles of a given gene record. Within the flybase gene record they appear as Gene a allele a[1] reference R1 GIC, GIC2, ... fields ref. R2 GIC, ... allele a[2] ... This is experimental software; Don Gilbert, gilbertd@bio.indiana.edu, 2002 =cut use lib('/Users/gilbertd/perl','/bio/perlib'); use strict; use vars qw( $nodenums %didnode %nodenam %regex); #use Graph; #use Graph::Directed; #use Graph::Writer::Dot; #=head2 notes # # graphviz dot graphs: # foreach f ( $gi ) # set w=`fgrep -c 'label=' $f` # if ($w > 2) then # $gr/bin/dot -Tgif -Nshape=box -Nfontsize=8 -o$f.gif $f >& /dev/null # endif # end # # $gr/bin/dot -Tgif -Nshape=box -Nfixedsize=true -Nwidth=0.1 -Nstyle=invis \ # -Gsize="15,7" -Grotate=90 -Gratio=compress -o$f.gif $f > & /dev/null # #=cut my $DG; my @keys= qw(GIA GIA2 GIC2 GIC WTI); my %keys= map{ $_,1; }@keys; my $path; #$path= '/Volumes/MacHome/bio/flybase/geneinter/'; #$path= '/Users/gilbertd/bio/flybase/fbjava/fbrj/fbobs/'; $path= './'; my $file; # $file= "dally.acode"; $file= "FBgn.acode"; chomp($path); my $savep= 'gi'; my $f= (@ARGV) ? shift @ARGV : "$path/$file"; open(F,$f) or die $f; init(); my $testline= 0; ##1; my $dographviz= 1; my $dograph= 0; my $testlimit= 0; my $showprogress= 0; #1; my $dogene= ''; # 'dally'; my %syms= (); my %rete= (); my @rete= (); my %eg= (); my $mainsym; my $mainid; my $atkey; $savep= $dogene if ($dogene && $dogene ne 'nogene'); graphtop(); my $nrec= 0; my $indogene= 0; while(){ chomp; if (/^GENR/) { startrec(); } elsif (/^RETE\|(.+)/) { @rete= split "\t", $1; %rete= map{ split(/\s/,$_,2); } @rete; $mainsym= $rete{GSYM}; $mainsym =~ s/^\s*\d\s+//; $mainid= $rete{ID}; $mainid =~ s/^\s*\d\s+//; if ($testline && $mainsym =~ /$dogene/) { print STDERR "# at $mainsym: $_\n"; $indogene= 1; } else { $indogene= 0; } # print "# node [name=\"$mainsym\" id=$mainid ];\n" if $testline; } elsif (/^# EOR/) { endrec(); last if ($testlimit && $nrec>=$testlimit); } elsif (/^(\w+)\|(.+)/) { my ($k,$v)= ($1,$2); if ($keys{$k}) { $atkey= $k; edge($atkey,$v); } else { $atkey= 0; } } elsif ($atkey && /^\|(.+)/) { edge($atkey,$1); } } graphend(); sub graphtop { # if ($dograph){ # $DG= new Graph::Directed; # } if($dographviz){ print "digraph geneinter "; ## if ($title) { print "[label=\"$title\"]"; } print "{\n"; ## add graph styles node [fontname=helvetica,fontsize=10]; ## page="8.5,100" ; ratio=???? ; ... } } sub graphend { print "\n}\n" if($dographviz); # if ($dograph) { # # @S = $DG->strongly_connected_components; # ## my $outh= IO::Handle->new_from_fd( fileno(STDOUT),"w"); # *STDOUT; # my $writer = Graph::Writer::Dot->new(); # my $outh; # ## $outh = IO::File->new(">$path/gi2.dot"); ## warn "Write to $path/*.dot\n"; ## open(DOT,">gi2.dot") || die "'gi2.dot'"; ## $outh= *DOT; # # my $fn; # $fn= "$path/$savep-plain.dot"; # warn "$fn\n"; # my $T = $DG; # $T->set_attribute('name','geneinter'); # $writer->write_graph($T, $fn); # # $fn= "$path/$savep-strong.dot"; # warn "$fn\n"; # $T = $DG->strongly_connected_graph; # $T->set_attribute('name','strongly_connected_graph'); # $writer->write_graph($T, $fn); # # $fn= "$path/$savep-MST_Kruskal.dot"; # warn "$fn\n"; # $T = $DG->MST_Kruskal; # $T->set_attribute('name','MST_Kruskal'); # $writer->write_graph($T, $fn); # # my $node= nodenum($dogene); # if ($node) { # $fn= "$path/$savep-MST_Prim.dot"; # warn "$fn\n"; # $T = $DG->MST_Prim($node); # $T->set_attribute('name','MST_Prim'); # $writer->write_graph($T, $fn); # # ## no good, memory pig ## warn "$path/gi-Dijkstra.dot\n"; ## $T = $DG->SSSP_Dijkstra($node); ## $T->set_attribute('name','Dijkstra'); ## $writer->write_graph($T, "$path/gi-Dijkstra.dot"); # } # ## warn "$path/gi-APSP.dot\n"; ## $T = $DG->APSP_Floyd_Warshall; ## $T->set_attribute('name','APSP_Floyd_Warshall'); ## $writer->write_graph($T, "$path/gi-APSP.dot"); # ## $T = $G->TransitiveClosure_Floyd_Warshall; # # ## $T = $DG->SSSP_DAG($s); ## $T->set_attribute('name','SSSP_DAG'); ## $writer->write_graph($T, $outh); # # close($outh) if (ref $outh); # } } sub edge { my($k,$v)= @_; my $kind= '?'; if ($k eq 'WTI') { $v =~ s/\s.+$//; $syms{$v}++; } else { # print "$k: $v\n" if $testline; my $showit= ($testline && (/$dogene/ || $testlimit)); my @m= m/$regex{regenes}/g; print "# syms: ",join(",",@m)," = " if ($showit); my $re; $re= $regex{reNonint}; if ($v =~ m/$re/) { $kind= 'reNonint'; showre('reNonint',$re,$v) if ($showit); } elsif ($v =~ m/$regex{reInter}/) { foreach my $r (qw(reSupper reSuppby reEnher reEnhby)) { $re= $regex{$r}; if ($v =~ m/$re/) { $kind= $r; showre($r,$re,$v) if ($showit); last; } } } else { $kind= 'unknown'; showre('unknown','.',$v) if ($showit); } for my $m (@m) { next if ($m eq 'Scer\GAL4'); # ~= /Scer|GAL4/); $eg{$m}{$kind}++; } print "\n" if ($showit); } } sub showre { my $r= shift; my $re= shift; my $vx= shift; $vx =~ s/($re)/\U$1/; $vx =~ s/\@.+$//; print " $r: $vx"; } sub startrec { %syms= (); %rete= (); @rete= (); %eg= (); # print "\n"; } sub nodenum { my ($m,$dosave,$id)= @_; my $nd= $didnode{$m}; unless($nd) { ++$nodenums; $nd= 'n'.$nodenums; $didnode{$m}= $nd; $nodenam{$nd}= $m; if ($dosave && $dographviz) { print "$nd [label=\"$m\""; print " id=$id" if ($id); print "];\n"; } # if ($dosave && $dograph) { # $DG->add_vertex($nd); # $DG->set_attribute('label', $nd, $m); # $DG->set_attribute('id', $nd, $id) if ($id); # } } return $nd; } sub endrec { # dump %syms, if ($indogene) { print STDERR "# endrec $mainsym: ".join(",",keys %eg)."\n"; } return unless(%eg); $nrec++; my $showit= ($testline && ($testlimit || ($mainsym =~ /$dogene/))); unless ($showit) { foreach my $m (keys %eg) { $showit=1 if ($m =~ /$dogene/); } } if ($showprogress) { print STDERR '.'; print STDERR " $mainsym $nrec\n" if (($nrec % 50) == 0); } return unless( $testline ? $showit : $dograph||$dographviz ); my $nmain= nodenum( $mainsym,($testline ? $showit : $dographviz),$mainid); print "# node $nmain [name=\"$mainsym\" id=$mainid ];\n" if $showit; for my $m (sort keys %eg) { my $nm= nodenum($m, ($testline ? $showit : $dographviz)); my $nam= $nodenam{$nm}; for my $k (sort keys %{$eg{$m}}) { my $w= $eg{$m}{$k}; $w= -$w if ($k =~ /Sup/); # if ($dograph) { # if ($k =~ /by$/) { $DG->add_weighted_edge($nm, $w, $nmain); } # elsif ($k =~ /er$/) { $DG->add_weighted_edge($nmain, $w, $nm); } # # else ($k =~ /unknown|reNonint/) { $DG->add_weighted_edge($mainsym, 0, $nam); } # } if($dographviz && !$testline) { if ($k =~ /by$/) { print "$nm -> $nmain [weight=$w];\n"; } elsif ($k =~ /er$/) { print "$nmain -> $nm [weight=$w];\n"; } elsif ($k =~ /unknown|reNonint/) { print "$nmain -- $nm [weight=$w];\n"; } } if($showit && !$dograph) { if ($k =~ /by$/) { print "${nm} -> ${nmain} [weight=$w k=$k];\n"; } if ($k =~ /er$/) { print "${nmain} -> ${nm} [weight=$w k=$k];\n"; } if ($k =~ /reNonint/) { print "${nmain} -- ${nm} [weight=$w k=$k];\n"; } if ($k =~ /unknown/) { print "${nmain} -- ${nm} [weight=$w k=$k];\n"; } } } } # print "\n# ------------------\n"; } sub init { %regex= ( reNonint => 'non-(suppress|enhanc)', reInter => '\W*(suppress|enhanc)', reSupper => '\W*suppress(or|es)\W', reSuppby => '\W*suppress(ible)\W', reEnher => '\W*enhanc(er|es)\W', reEnhby => '\W*enhanc(eable)\W', # regenes => '\@([^@<]+)[@<]', regenes => '\@([^@<]+)[^@]*\@', ); } =head2 regex for GeneInter fields reNonint = new RE("non-(suppress|enhanc)"); reInter = new RE("\\W(suppress|enhanc)"); reSupper = new RE("\\W*suppress(or|es)\\W"); //.+(of)? //? non-suppressor reSuppby = new RE("\\W*suppress(ible)\\W"); //.+(by)? reEnher = new RE("\\W*enhanc(er|es)\\W"); //.+(of)? reEnhby = new RE("\\W*enhanc(eable)\\W"); //.+(by)?// what of non-enhanceable by reGenes = new RE("@([^@<]+)[@<]"); int ikind; if (reNonint.isMatch(s)) { ikind= kOtherInter; } else if (reInter.isMatch(s)) { if (reSupper.isMatch(s)) ikind= kSupper; else if (reSuppby.isMatch(s)) ikind= kSupby; else if (reEnher.isMatch(s)) ikind= kEnher; else if (reEnhby.isMatch(s)) ikind= kEnhby; else ikind= kOtherInter; } else { ikind= kOtherInter; } REMatchEnumeration ren= reGenes.getMatchEnumeration( s); while (ren.hasMoreMatches()) { rem= ren.nextMatch(); String sym= rem.substituteInto("$1"); interset[ikind].put(sym,sym); // count ? } =cut