[Gmod-help] DB loading

Wed Feb 20 12:03:31 EST 2008

Hi Ed,

I'm going to put my comments/answers/questions mixed in with your text
below.

Scott

On Tue, 2008-02-19 at 17:09 -0500, Ed Johnson wrote:
> Hello Helpdesk,
> 
>  
> 
> I’ve gotten Chado and GBrowse installed and I’m trying to load some
> EST sequences and their Blast returns.  The first pass I used a
> mixture of the bp_* and gmod_ load scripts and couldn’t match the
> Blast data to the loaded sequences. 

It isn't clear to me here what your goal is.  I am guessing that you
want to run GBrowse directly off of Chado, but it isn't clear to me that
you need to do that.  Chado is a data warehouse for organism data of
various types.  If all you want to do is have a browser for your data,
you probably don't need Chado.  Another database in the back end would
be faster and easier to deal with, like Bio::DB::GFF or
Bio::DB::SeqFeature::Store.  See 

  http://www.gmod.org/wiki/index.php/GBrowse_adaptors

for a little more information.

Anyway, to directly address the question in the paragraph: using the bp_
and gmod_ load scripts together will result in data ending up in
different databases, which I don't think you want to do.  For the rest
of this email, I'm going to assume that you want to use Chado.  If not,
you can just ignore it.
> 
>  
> 
> What I want to do is load a fasta file of EST sequences and a Blastx
> file run against the sequences.  What tools should I be using?

For Chado, you need features both for the ESTs themselves and for the
Blastx results.  To create a GFF3 file for the ESTs, you can use
gmod_fasta2gff3.pl in the chado/bin directory.  It takes a fasta file
and creates a GFF3 file (optionally with the sequence at the end).

Then you need to create a GFF3 file for the Blastx results.  The BioPerl
script bp_search2gff.pl should do the trick here (though it has honestly
been a little while since I used it).

You'll also need features for the 'other half' of the analysis: that is,
what the ESTs were blasted against.  Presumably, that is what you want
the ace parser for below.
>  
> 
>  
> 
> I’d also like to load a TGICL assembly .ace file and display the the
> contigs.  Has anyone written a parser? I’ve seen hints on various
> sites that such a thing might exist but I haven’t found anything firm.

I have no idea.  You could try asking on the bioperl mailing list:

  http://bioperl.org/mailman/listinfo/bioperl-l
> 
>  
> 
> We have a database in-house and tools to accomplish the above, but
> we’re looking for more flexibility.  Any help would be appreciated.
> 
>  
> 
> Thanks,
> 
>  
> 
> Ed
> 
>  
> 
>  
> 
> Ed Johnson
> 
> Scientific Computing Professional Specialist
> 
> IBL - Laboratory for Genomics and Bioinformatics
> 
> University Of Georgia
> 
> Room 154 - IBL
> 
> 110 Riverbend Road
> 
> Athens, Ga 30602
> 
> Phone (706) 542-1039
> 
>  
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory