[Gmod-help] gmod_bulk_load_gff3.pl

Scott Cain cain.cshl at gmail.com
Fri Feb 29 09:09:19 EST 2008


Hi Todd,

I'm trying to catch up on my email after being gone for a few days--did
you get this resolved?

Scott

On Fri, 2008-02-22 at 13:01 -0500, todd.moughamer at syngenta.com wrote:
> Scott,
> 
> The BioPerl live updated fixed the "Can't locate object method
> "database" problem (Thanks!). The loading progresses much further but
> now errors out with the message below. Here I am using the first 100
> lines of the sample yeast GFF file which did not produce errors in the
> GFF3 validator:
> 
> Preparing data for inserting into the chadotest database
> (This may take a while ...)
> Loading data into feature table ...
> Loading data into featureloc table ...
> Skipping feature_relationship table since the load file is empty...
> Loading data into featureprop table ...
> DBD::Pg::db pg_endcopy failed: ERROR:  invalid input syntax for integer:
> ""
> CONTEXT:  COPY featureprop, line 1, column type_id: ""
> 
> ------------- EXCEPTION: Bio::Root::Exception -------------
> MSG: calling endcopy for featureprop failed:
> STACK: Error::throw
> STACK: Bio::Root::Root::throw
> /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
> /usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm:2723
> STACK: Bio::GMOD::DB::Adapter::load_data
> /usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm:2644
> STACK: /usr/bin/gmod_bulk_load_gff3.pl:912
> -----------------------------------------------------------
> Issuing rollback() for database handle being DESTROY'd without explicit
> disconnect(). 
> 
> Thanks,
> 
> Todd
> 
> -----Original Message-----
> From: Scott Cain [mailto:cain.cshl at gmail.com] 
> Sent: Wednesday, February 20, 2008 11:45 AM
> To: Moughamer Todd USRE
> Cc: hlapp at duke.edu; help at gmod.org
> Subject: RE: [Gmod-help] gmod_bulk_load_gff3.pl
> 
> Hi Todd,
> 
> The ##gff-version error it reported won't be a problem; the loader is
> quite forgiving about that.  The invalid type problems could potentially
> be a problem though.  Here's the thing: Chado uses SO (so yes, you
> should be using so.obo) for feature types, while the current GFF3 spec
> requires SOFA (I've been advocating for changing the spec and it
> probably will change in the near future).
> 
> So, if the validator is complaining about those terms because they
> aren't in SOFA, but they are in SO, that's no problem.  But if they
> aren't in SO either (if, for instance, they've been obsoleted), then
> you'll have to fix the file.  When I confronted with something like
> that, I just do a global search and replace on the term to swap in the
> nearest term in SO.
> 
> Scott
> 
> On Wed, 2008-02-20 at 11:34 -0500, todd.moughamer at syngenta.com wrote:
> > Hi Scott,
> > 
> > We downloaded Chado from CVS. We are in the process of installing the 
> > live BioPerl.
> > 
> > I am using the example yeast GFF3 files from the web site 
> > (http://www.gmod.org/wiki/index.php/Load_GFF_Into_Chado). I ran them 
> > through the validator and sure enough they came back invalid (both the
> 
> > original and sorted forms). Here are some some of the errors:
> > 
> > Line Number  Error/Warning
> > -----------  -------------
> > 1            [ERROR]   first line must be ##gff-version 3 (line: SGD)
> > 350          [ERROR]   invalid type (type: gene_cassette)
> > 352          [ERROR]   invalid type (type: gene_cassette)
> > 369          [ERROR]   invalid type (type: gene_cassette)
> > 1065         [ERROR]   invalid type (type: long_terminal_repeat)
> > 1066         [ERROR]   invalid type (type: long_terminal_repeat)
> > 1072         [ERROR]   invalid type (type: transposable_element_gene)
> > ...
> > 
> > I'm also wondering if I should be using the sofa.obo file rather than 
> > the so.obo file I downloaded from sequenceontology.org?
> > 
> > Thanks,
> > 
> > Todd
> > 
> > -----Original Message-----
> > From: Scott Cain [mailto:cain.cshl at gmail.com]
> > Sent: Tuesday, February 19, 2008 3:49 PM
> > To: Hilmar Lapp
> > Cc: Moughamer Todd USRE; help at gmod.org
> > Subject: Re: [Gmod-help] gmod_bulk_load_gff3.pl
> > 
> > OK, after running some tests, I still think an out of date BioPerl is 
> > probably at fault.  With a current checkout of both the schema and 
> > bioperl repositories, I can load GFF3 data with Dbxref tags 
> > successfully.
> > 
> > Todd, a few questions for you:
> > 
> > * are you using a cvs checkout of chado as well?  I should have asked 
> > that before telling you to update bioperl.
> > 
> > * If you are using current checkouts of chado and bioperl and still 
> > have the problem, could you please run your GFF3 through the GFF3 
> > validator to see if it turns up any problems:
> > 
> >   http://dev.wormbase.org/db/validate_gff3/validate_gff3_online
> > 
> > * If you still get the same error message, could you please send me a 
> > sample of the offending GFF3?  There may be a case that I didn't think
> 
> > of.
> > 
> > Thanks,
> > Scott
> > 
> > On Tue, 2008-02-19 at 14:13 -0500, Scott Cain wrote:
> > > Hi Hilmar,
> > > 
> > > Hah!  I was getting ready to write a response where I basically said
> 
> > > I
> > 
> > > didn't think that was what was going on, and I would have justified 
> > > it
> > 
> > > with some hand waving, since I didn't have the actual error message,
> 
> > > so I was just guessing and hopefully, updating bioperl will fix the 
> > > problem (and that still is a possibility).
> > > 
> > > However, I do have the actual error message, so I went and looked 
> > > the offending line, and it is in a method called 'handle_dbxref', so
> 
> > > it looks like your diagnosis is spot on.  Now I need to figure out 
> > > if this is still happening with bioperl-live and figure out why.  
> > > I've got a 
> > > Bio::SeqFeature::Annotated->annotation->get_Annotations('Dbxref'),
> > which I think should return a list of DBLink features.  I guess I'll 
> > go see.
> > > 
> > > Thanks for pointing that out!
> > > Scott
> > > 
> > > On Tue, 2008-02-19 at 13:55 -0500, Hilmar Lapp wrote:
> > > > Well, B::A::SimpleValue never had a method called database(). It 
> > > > is B::A::DBLink that has that (and always had).
> > > > 
> > > > So my first diagnosis from afar would be that something is 
> > > > returning
> > 
> > > > or creating a B::A::SimpleValue when it was expected to return or 
> > > > create a B::A::DBLink.
> > > > 
> > > > 	-hilmar
> > > > 
> > > > On Feb 19, 2008, at 1:07 PM, Scott Cain wrote:
> > > > 
> > > > > Hi Todd,
> > > > >
> > > > > Yes, that is still the most likely solution for you.  A few 
> > > > > months
> > 
> > > > > ago, the BioPerl API changed and Bio::Annotation::SimpleValue 
> > > > > objects don't work the same way that they used to, thus the 
> > > > > error you are seeing.
> > > > > It's not really looking for a method named 'database'; that is 
> > > > > an artifact left over from the API change.
> > > > >
> > > > > Scott
> > > > >
> > > > > On Tue, 2008-02-19 at 11:59 -0500, todd.moughamer at syngenta.com
> > wrote:
> > > > >> I run into the following error when trying to load Chado:
> > > > >>
> > > > >> gmod_bulk_load_gff3.pl --organism yeast  --gfffile 
> > > > >> ~/tmp/saccharomyces_cerevisiae.gff.sorted --dbname chadotest 
> > > > >> Preparing data for inserting into the chadotest database (This 
> > > > >> may take a while ...) Can't locate object method "database" via
> 
> > > > >> package "Bio::Annotation::SimpleValue"
> > > > >> at /usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm line 
> > > > >> 3061, <GEN0> line 1.
> > > > >> Issuing rollback() for database handle being DESTROY'd without 
> > > > >> explicit disconnect().
> > > > >>
> > > > >> I found reference to this problem online
> > > > >> (http://www.nabble.com/question-about-gmod_bulk_load_gff3.pl-
> > > > >> td15135949.html) with the recommendation of downloading the
> > 'live'  
> > > > >> version of BioPerl. However, upon browsing the latest code in 
> > > > >> SVN
> > 
> > > > >> I do not see inclusion of a "database" method. Is this still 
> > > > >> the recommended solution to the problem?
> > > > >>
> > > > >> Thanks,
> > > > >>
> > > > >> Todd
> > > > >>
> > > > >>
> > > > >> Todd Moughamer
> > > > >>
> > > > >> Bioinformatics Consultancy & Training Group
> > > > >>
> > > > >> Syngenta Biotechnology, Inc.
> > > > >>
> > > > >> 3054 Cornwallis Road, 1243.E
> > > > >>
> > > > >> Research Triangle Park, NC 27709-2257
> > > > >>
> > > > >> Tel: 919-597-3078
> > > > >>
> > > > >> Email: todd.moughamer at syngenta.com www.syngenta.com
> > > > >>
> > > > >>
> > > > > --
> > > > >
> > ----------------------------------------------------------------------
> > > > > --
> > > > > Scott Cain, Ph. D.                                          
> > > > > cain at cshl.edu
> > > > > GMOD Coordinator (http://www.gmod.org/)                      
> > > > > 216-392-3087
> > > > > Cold Spring Harbor Laboratory
> > > > >
> > > > 
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory





More information about the Gmod-help mailing list