[Gmod-help] gmod_bulk_load_gff3.pl

todd.moughamer at syngenta.com todd.moughamer at syngenta.com
Fri Feb 22 13:01:29 EST 2008


Scott,

The BioPerl live updated fixed the "Can't locate object method
"database" problem (Thanks!). The loading progresses much further but
now errors out with the message below. Here I am using the first 100
lines of the sample yeast GFF file which did not produce errors in the
GFF3 validator:

Preparing data for inserting into the chadotest database
(This may take a while ...)
Loading data into feature table ...
Loading data into featureloc table ...
Skipping feature_relationship table since the load file is empty...
Loading data into featureprop table ...
DBD::Pg::db pg_endcopy failed: ERROR:  invalid input syntax for integer:
""
CONTEXT:  COPY featureprop, line 1, column type_id: ""

------------- EXCEPTION: Bio::Root::Exception -------------
MSG: calling endcopy for featureprop failed:
STACK: Error::throw
STACK: Bio::Root::Root::throw
/usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
/usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm:2723
STACK: Bio::GMOD::DB::Adapter::load_data
/usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm:2644
STACK: /usr/bin/gmod_bulk_load_gff3.pl:912
-----------------------------------------------------------
Issuing rollback() for database handle being DESTROY'd without explicit
disconnect(). 

Thanks,

Todd

-----Original Message-----
From: Scott Cain [mailto:cain.cshl at gmail.com] 
Sent: Wednesday, February 20, 2008 11:45 AM
To: Moughamer Todd USRE
Cc: hlapp at duke.edu; help at gmod.org
Subject: RE: [Gmod-help] gmod_bulk_load_gff3.pl

Hi Todd,

The ##gff-version error it reported won't be a problem; the loader is
quite forgiving about that.  The invalid type problems could potentially
be a problem though.  Here's the thing: Chado uses SO (so yes, you
should be using so.obo) for feature types, while the current GFF3 spec
requires SOFA (I've been advocating for changing the spec and it
probably will change in the near future).

So, if the validator is complaining about those terms because they
aren't in SOFA, but they are in SO, that's no problem.  But if they
aren't in SO either (if, for instance, they've been obsoleted), then
you'll have to fix the file.  When I confronted with something like
that, I just do a global search and replace on the term to swap in the
nearest term in SO.

Scott

On Wed, 2008-02-20 at 11:34 -0500, todd.moughamer at syngenta.com wrote:
> Hi Scott,
> 
> We downloaded Chado from CVS. We are in the process of installing the 
> live BioPerl.
> 
> I am using the example yeast GFF3 files from the web site 
> (http://www.gmod.org/wiki/index.php/Load_GFF_Into_Chado). I ran them 
> through the validator and sure enough they came back invalid (both the

> original and sorted forms). Here are some some of the errors:
> 
> Line Number  Error/Warning
> -----------  -------------
> 1            [ERROR]   first line must be ##gff-version 3 (line: SGD)
> 350          [ERROR]   invalid type (type: gene_cassette)
> 352          [ERROR]   invalid type (type: gene_cassette)
> 369          [ERROR]   invalid type (type: gene_cassette)
> 1065         [ERROR]   invalid type (type: long_terminal_repeat)
> 1066         [ERROR]   invalid type (type: long_terminal_repeat)
> 1072         [ERROR]   invalid type (type: transposable_element_gene)
> ...
> 
> I'm also wondering if I should be using the sofa.obo file rather than 
> the so.obo file I downloaded from sequenceontology.org?
> 
> Thanks,
> 
> Todd
> 
> -----Original Message-----
> From: Scott Cain [mailto:cain.cshl at gmail.com]
> Sent: Tuesday, February 19, 2008 3:49 PM
> To: Hilmar Lapp
> Cc: Moughamer Todd USRE; help at gmod.org
> Subject: Re: [Gmod-help] gmod_bulk_load_gff3.pl
> 
> OK, after running some tests, I still think an out of date BioPerl is 
> probably at fault.  With a current checkout of both the schema and 
> bioperl repositories, I can load GFF3 data with Dbxref tags 
> successfully.
> 
> Todd, a few questions for you:
> 
> * are you using a cvs checkout of chado as well?  I should have asked 
> that before telling you to update bioperl.
> 
> * If you are using current checkouts of chado and bioperl and still 
> have the problem, could you please run your GFF3 through the GFF3 
> validator to see if it turns up any problems:
> 
>   http://dev.wormbase.org/db/validate_gff3/validate_gff3_online
> 
> * If you still get the same error message, could you please send me a 
> sample of the offending GFF3?  There may be a case that I didn't think

> of.
> 
> Thanks,
> Scott
> 
> On Tue, 2008-02-19 at 14:13 -0500, Scott Cain wrote:
> > Hi Hilmar,
> > 
> > Hah!  I was getting ready to write a response where I basically said

> > I
> 
> > didn't think that was what was going on, and I would have justified 
> > it
> 
> > with some hand waving, since I didn't have the actual error message,

> > so I was just guessing and hopefully, updating bioperl will fix the 
> > problem (and that still is a possibility).
> > 
> > However, I do have the actual error message, so I went and looked 
> > the offending line, and it is in a method called 'handle_dbxref', so

> > it looks like your diagnosis is spot on.  Now I need to figure out 
> > if this is still happening with bioperl-live and figure out why.  
> > I've got a 
> > Bio::SeqFeature::Annotated->annotation->get_Annotations('Dbxref'),
> which I think should return a list of DBLink features.  I guess I'll 
> go see.
> > 
> > Thanks for pointing that out!
> > Scott
> > 
> > On Tue, 2008-02-19 at 13:55 -0500, Hilmar Lapp wrote:
> > > Well, B::A::SimpleValue never had a method called database(). It 
> > > is B::A::DBLink that has that (and always had).
> > > 
> > > So my first diagnosis from afar would be that something is 
> > > returning
> 
> > > or creating a B::A::SimpleValue when it was expected to return or 
> > > create a B::A::DBLink.
> > > 
> > > 	-hilmar
> > > 
> > > On Feb 19, 2008, at 1:07 PM, Scott Cain wrote:
> > > 
> > > > Hi Todd,
> > > >
> > > > Yes, that is still the most likely solution for you.  A few 
> > > > months
> 
> > > > ago, the BioPerl API changed and Bio::Annotation::SimpleValue 
> > > > objects don't work the same way that they used to, thus the 
> > > > error you are seeing.
> > > > It's not really looking for a method named 'database'; that is 
> > > > an artifact left over from the API change.
> > > >
> > > > Scott
> > > >
> > > > On Tue, 2008-02-19 at 11:59 -0500, todd.moughamer at syngenta.com
> wrote:
> > > >> I run into the following error when trying to load Chado:
> > > >>
> > > >> gmod_bulk_load_gff3.pl --organism yeast  --gfffile 
> > > >> ~/tmp/saccharomyces_cerevisiae.gff.sorted --dbname chadotest 
> > > >> Preparing data for inserting into the chadotest database (This 
> > > >> may take a while ...) Can't locate object method "database" via

> > > >> package "Bio::Annotation::SimpleValue"
> > > >> at /usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm line 
> > > >> 3061, <GEN0> line 1.
> > > >> Issuing rollback() for database handle being DESTROY'd without 
> > > >> explicit disconnect().
> > > >>
> > > >> I found reference to this problem online
> > > >> (http://www.nabble.com/question-about-gmod_bulk_load_gff3.pl-
> > > >> td15135949.html) with the recommendation of downloading the
> 'live'  
> > > >> version of BioPerl. However, upon browsing the latest code in 
> > > >> SVN
> 
> > > >> I do not see inclusion of a "database" method. Is this still 
> > > >> the recommended solution to the problem?
> > > >>
> > > >> Thanks,
> > > >>
> > > >> Todd
> > > >>
> > > >>
> > > >> Todd Moughamer
> > > >>
> > > >> Bioinformatics Consultancy & Training Group
> > > >>
> > > >> Syngenta Biotechnology, Inc.
> > > >>
> > > >> 3054 Cornwallis Road, 1243.E
> > > >>
> > > >> Research Triangle Park, NC 27709-2257
> > > >>
> > > >> Tel: 919-597-3078
> > > >>
> > > >> Email: todd.moughamer at syngenta.com www.syngenta.com
> > > >>
> > > >>
> > > > --
> > > >
> ----------------------------------------------------------------------
> > > > --
> > > > Scott Cain, Ph. D.                                          
> > > > cain at cshl.edu
> > > > GMOD Coordinator (http://www.gmod.org/)                      
> > > > 216-392-3087
> > > > Cold Spring Harbor Laboratory
> > > >
> > > 
--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory


-------------- next part --------------
A non-text attachment was scrubbed...
Name: saccharomyces_cerevisiae.gff.sorted.100.gz
Type: application/x-gzip
Size: 2652 bytes
Desc: saccharomyces_cerevisiae.gff.sorted.100.gz
URL: <http://brie4.cshl.edu/pipermail/gmod-help/attachments/20080222/7a948c15/attachment.gz>


More information about the Gmod-help mailing list