[Gmod-help] gmod_bulk_load_gff3.pl

Scott Cain cain.cshl at gmail.com
Fri Feb 29 10:09:41 EST 2008


Hi Todd,

When you created the database, did you use the 'make' based procedure
that is outlined in the install document?  That is, did you do this:

  make load_schema
  make prepdb
  make ontologies

and when you loaded ontologies, did you load all of relation, sequence,
gene and feature property?  My best guess for what is going wrong is
that the feature property controlled vocabulary is missing or that
something from the prepdb inserts is missing (though I doubt that latter
is the problem--I don't think you would have made it this far if that
were the case).  I'll try loading the SGD GFF file now to see if I run
into any problems.

Scott

On Fri, 2008-02-29 at 10:00 -0500, todd.moughamer at syngenta.com wrote:
> Scott,
> 
> No resolution yet. Just in case I  ran the gff file through dos2unix and
> that didn't help. I saw in a posting about a similar error that it might
> have something to do with auto-incrementing of IDs. My next step would
> be to wipe out the database and reload the ontology dump...unless you
> have other suggestions.
> 
> Thanks,
> 
> Todd
> 
> -----Original Message-----
> From: Scott Cain [mailto:cain.cshl at gmail.com] 
> Sent: Friday, February 29, 2008 9:09 AM
> To: Moughamer Todd USRE
> Cc: help at gmod.org
> Subject: RE: [Gmod-help] gmod_bulk_load_gff3.pl
> 
> Hi Todd,
> 
> I'm trying to catch up on my email after being gone for a few days--did
> you get this resolved?
> 
> Scott
> 
> On Fri, 2008-02-22 at 13:01 -0500, todd.moughamer at syngenta.com wrote:
> > Scott,
> > 
> > The BioPerl live updated fixed the "Can't locate object method 
> > "database" problem (Thanks!). The loading progresses much further but 
> > now errors out with the message below. Here I am using the first 100 
> > lines of the sample yeast GFF file which did not produce errors in the
> > GFF3 validator:
> > 
> > Preparing data for inserting into the chadotest database (This may 
> > take a while ...) Loading data into feature table ...
> > Loading data into featureloc table ...
> > Skipping feature_relationship table since the load file is empty...
> > Loading data into featureprop table ...
> > DBD::Pg::db pg_endcopy failed: ERROR:  invalid input syntax for
> integer:
> > ""
> > CONTEXT:  COPY featureprop, line 1, column type_id: ""
> > 
> > ------------- EXCEPTION: Bio::Root::Exception -------------
> > MSG: calling endcopy for featureprop failed:
> > STACK: Error::throw
> > STACK: Bio::Root::Root::throw
> > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
> > /usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm:2723
> > STACK: Bio::GMOD::DB::Adapter::load_data
> > /usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm:2644
> > STACK: /usr/bin/gmod_bulk_load_gff3.pl:912
> > -----------------------------------------------------------
> > Issuing rollback() for database handle being DESTROY'd without 
> > explicit disconnect().
> > 
> > Thanks,
> > 
> > Todd
> > 
> > -----Original Message-----
> > From: Scott Cain [mailto:cain.cshl at gmail.com]
> > Sent: Wednesday, February 20, 2008 11:45 AM
> > To: Moughamer Todd USRE
> > Cc: hlapp at duke.edu; help at gmod.org
> > Subject: RE: [Gmod-help] gmod_bulk_load_gff3.pl
> > 
> > Hi Todd,
> > 
> > The ##gff-version error it reported won't be a problem; the loader is 
> > quite forgiving about that.  The invalid type problems could 
> > potentially be a problem though.  Here's the thing: Chado uses SO (so 
> > yes, you should be using so.obo) for feature types, while the current 
> > GFF3 spec requires SOFA (I've been advocating for changing the spec 
> > and it probably will change in the near future).
> > 
> > So, if the validator is complaining about those terms because they 
> > aren't in SOFA, but they are in SO, that's no problem.  But if they 
> > aren't in SO either (if, for instance, they've been obsoleted), then 
> > you'll have to fix the file.  When I confronted with something like 
> > that, I just do a global search and replace on the term to swap in the
> 
> > nearest term in SO.
> > 
> > Scott
> > 
> > On Wed, 2008-02-20 at 11:34 -0500, todd.moughamer at syngenta.com wrote:
> > > Hi Scott,
> > > 
> > > We downloaded Chado from CVS. We are in the process of installing 
> > > the live BioPerl.
> > > 
> > > I am using the example yeast GFF3 files from the web site 
> > > (http://www.gmod.org/wiki/index.php/Load_GFF_Into_Chado). I ran them
> 
> > > through the validator and sure enough they came back invalid (both 
> > > the
> > 
> > > original and sorted forms). Here are some some of the errors:
> > > 
> > > Line Number  Error/Warning
> > > -----------  -------------
> > > 1            [ERROR]   first line must be ##gff-version 3 (line:
> SGD)
> > > 350          [ERROR]   invalid type (type: gene_cassette)
> > > 352          [ERROR]   invalid type (type: gene_cassette)
> > > 369          [ERROR]   invalid type (type: gene_cassette)
> > > 1065         [ERROR]   invalid type (type: long_terminal_repeat)
> > > 1066         [ERROR]   invalid type (type: long_terminal_repeat)
> > > 1072         [ERROR]   invalid type (type:
> transposable_element_gene)
> > > ...
> > > 
> > > I'm also wondering if I should be using the sofa.obo file rather 
> > > than the so.obo file I downloaded from sequenceontology.org?
> > > 
> > > Thanks,
> > > 
> > > Todd
> > > 
> > > -----Original Message-----
> > > From: Scott Cain [mailto:cain.cshl at gmail.com]
> > > Sent: Tuesday, February 19, 2008 3:49 PM
> > > To: Hilmar Lapp
> > > Cc: Moughamer Todd USRE; help at gmod.org
> > > Subject: Re: [Gmod-help] gmod_bulk_load_gff3.pl
> > > 
> > > OK, after running some tests, I still think an out of date BioPerl 
> > > is probably at fault.  With a current checkout of both the schema 
> > > and bioperl repositories, I can load GFF3 data with Dbxref tags 
> > > successfully.
> > > 
> > > Todd, a few questions for you:
> > > 
> > > * are you using a cvs checkout of chado as well?  I should have 
> > > asked that before telling you to update bioperl.
> > > 
> > > * If you are using current checkouts of chado and bioperl and still 
> > > have the problem, could you please run your GFF3 through the GFF3 
> > > validator to see if it turns up any problems:
> > > 
> > >   http://dev.wormbase.org/db/validate_gff3/validate_gff3_online
> > > 
> > > * If you still get the same error message, could you please send me 
> > > a sample of the offending GFF3?  There may be a case that I didn't 
> > > think
> > 
> > > of.
> > > 
> > > Thanks,
> > > Scott
> > > 
> > > On Tue, 2008-02-19 at 14:13 -0500, Scott Cain wrote:
> > > > Hi Hilmar,
> > > > 
> > > > Hah!  I was getting ready to write a response where I basically 
> > > > said
> > 
> > > > I
> > > 
> > > > didn't think that was what was going on, and I would have 
> > > > justified it
> > > 
> > > > with some hand waving, since I didn't have the actual error 
> > > > message,
> > 
> > > > so I was just guessing and hopefully, updating bioperl will fix 
> > > > the problem (and that still is a possibility).
> > > > 
> > > > However, I do have the actual error message, so I went and looked 
> > > > the offending line, and it is in a method called 'handle_dbxref', 
> > > > so
> > 
> > > > it looks like your diagnosis is spot on.  Now I need to figure out
> 
> > > > if this is still happening with bioperl-live and figure out why.
> > > > I've got a
> > > > Bio::SeqFeature::Annotated->annotation->get_Annotations('Dbxref'),
> > > which I think should return a list of DBLink features.  I guess I'll
> 
> > > go see.
> > > > 
> > > > Thanks for pointing that out!
> > > > Scott
> > > > 
> > > > On Tue, 2008-02-19 at 13:55 -0500, Hilmar Lapp wrote:
> > > > > Well, B::A::SimpleValue never had a method called database(). It
> 
> > > > > is B::A::DBLink that has that (and always had).
> > > > > 
> > > > > So my first diagnosis from afar would be that something is 
> > > > > returning
> > > 
> > > > > or creating a B::A::SimpleValue when it was expected to return 
> > > > > or create a B::A::DBLink.
> > > > > 
> > > > > 	-hilmar
> > > > > 
> > > > > On Feb 19, 2008, at 1:07 PM, Scott Cain wrote:
> > > > > 
> > > > > > Hi Todd,
> > > > > >
> > > > > > Yes, that is still the most likely solution for you.  A few 
> > > > > > months
> > > 
> > > > > > ago, the BioPerl API changed and Bio::Annotation::SimpleValue 
> > > > > > objects don't work the same way that they used to, thus the 
> > > > > > error you are seeing.
> > > > > > It's not really looking for a method named 'database'; that is
> 
> > > > > > an artifact left over from the API change.
> > > > > >
> > > > > > Scott
> > > > > >
> > > > > > On Tue, 2008-02-19 at 11:59 -0500, todd.moughamer at syngenta.com
> > > wrote:
> > > > > >> I run into the following error when trying to load Chado:
> > > > > >>
> > > > > >> gmod_bulk_load_gff3.pl --organism yeast  --gfffile 
> > > > > >> ~/tmp/saccharomyces_cerevisiae.gff.sorted --dbname chadotest 
> > > > > >> Preparing data for inserting into the chadotest database 
> > > > > >> (This may take a while ...) Can't locate object method 
> > > > > >> "database" via
> > 
> > > > > >> package "Bio::Annotation::SimpleValue"
> > > > > >> at /usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm line
> 
> > > > > >> 3061, <GEN0> line 1.
> > > > > >> Issuing rollback() for database handle being DESTROY'd 
> > > > > >> without explicit disconnect().
> > > > > >>
> > > > > >> I found reference to this problem online
> > > > > >> (http://www.nabble.com/question-about-gmod_bulk_load_gff3.pl-
> > > > > >> td15135949.html) with the recommendation of downloading the
> > > 'live'  
> > > > > >> version of BioPerl. However, upon browsing the latest code in
> 
> > > > > >> SVN
> > > 
> > > > > >> I do not see inclusion of a "database" method. Is this still 
> > > > > >> the recommended solution to the problem?
> > > > > >>
> > > > > >> Thanks,
> > > > > >>
> > > > > >> Todd
> > > > > >>
> > > > > >>
> > > > > >> Todd Moughamer
> > > > > >>
> > > > > >> Bioinformatics Consultancy & Training Group
> > > > > >>
> > > > > >> Syngenta Biotechnology, Inc.
> > > > > >>
> > > > > >> 3054 Cornwallis Road, 1243.E
> > > > > >>
> > > > > >> Research Triangle Park, NC 27709-2257
> > > > > >>
> > > > > >> Tel: 919-597-3078
> > > > > >>
> > > > > >> Email: todd.moughamer at syngenta.com www.syngenta.com
> > > > > >>
> > > > > >>
> > > > > > --
> > > > > >
> > > --------------------------------------------------------------------
> > > --
> > > > > > --
> > > > > > Scott Cain, Ph. D.                                          
> > > > > > cain at cshl.edu
> > > > > > GMOD Coordinator (http://www.gmod.org/)                      
> > > > > > 216-392-3087
> > > > > > Cold Spring Harbor Laboratory
> > > > > >
> > > > > 
> > --
> >
> ------------------------------------------------------------------------
> > Scott Cain, Ph. D.
> cain.cshl at gmail.com
> > GMOD Coordinator (http://www.gmod.org/)
> 216-392-3087
> > Cold Spring Harbor Laboratory
> > 
> > 
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory





More information about the Gmod-help mailing list