[Gmod-help] gmod_bulk_load_gff3.pl

todd.moughamer at syngenta.com todd.moughamer at syngenta.com
Tue Mar 4 08:44:29 EST 2008


Hi Scott,

Thanks. Is the feature property vocuabulary part of the sequence
ontology or is it separate? As I said we installed only the SO and
Relationship Ontology. If not would you recommend clearing out the
database before re-running make ontologies?

Best,

Todd 

-----Original Message-----
From: Scott Cain [mailto:cain.cshl at gmail.com] 
Sent: Friday, February 29, 2008 11:24 PM
To: Moughamer Todd USRE
Cc: help at gmod.org
Subject: RE: [Gmod-help] gmod_bulk_load_gff3.pl

Hi Todd,

I just reproduced the problem you are seeing by not loading the feature
property controlled vocabulary.  For most cvterms, the loader checks to
make sure it is present before writing back to the database.  The GFF
annotation 'Note' was being treated as a special case and I forgot to
add the check that it existed.  I've updated the loader to give a useful
message and stop when it finds that Note doesn't exist.

Sorry for the hassle.
Scott


On Fri, 2008-02-29 at 11:50 -0500, todd.moughamer at syngenta.com wrote:
> Hi Scott,
> 
> I talked to the unix admin who did that part of the install. He ran 
> the makes and only had a problem with make ontologies. There was a 
> problem with make ontologies...I believe it was a missing library and 
> eventually he got it to run with no reported problems. He installed 
> the relationship and sequence ontologies.
> 
> Todd
> 
> -----Original Message-----
> From: Scott Cain [mailto:cain.cshl at gmail.com]
> Sent: Friday, February 29, 2008 10:10 AM
> To: Moughamer Todd USRE
> Cc: help at gmod.org
> Subject: RE: [Gmod-help] gmod_bulk_load_gff3.pl
> 
> Hi Todd,
> 
> When you created the database, did you use the 'make' based procedure 
> that is outlined in the install document?  That is, did you do this:
> 
>   make load_schema
>   make prepdb
>   make ontologies
> 
> and when you loaded ontologies, did you load all of relation, 
> sequence, gene and feature property?  My best guess for what is going 
> wrong is that the feature property controlled vocabulary is missing or

> that something from the prepdb inserts is missing (though I doubt that

> latter is the problem--I don't think you would have made it this far 
> if that were the case).  I'll try loading the SGD GFF file now to see 
> if I run into any problems.
> 
> Scott
> 
> On Fri, 2008-02-29 at 10:00 -0500, todd.moughamer at syngenta.com wrote:
> > Scott,
> > 
> > No resolution yet. Just in case I  ran the gff file through dos2unix

> > and that didn't help. I saw in a posting about a similar error that 
> > it
> 
> > might have something to do with auto-incrementing of IDs. My next 
> > step
> 
> > would be to wipe out the database and reload the ontology 
> > dump...unless you have other suggestions.
> > 
> > Thanks,
> > 
> > Todd
> > 
> > -----Original Message-----
> > From: Scott Cain [mailto:cain.cshl at gmail.com]
> > Sent: Friday, February 29, 2008 9:09 AM
> > To: Moughamer Todd USRE
> > Cc: help at gmod.org
> > Subject: RE: [Gmod-help] gmod_bulk_load_gff3.pl
> > 
> > Hi Todd,
> > 
> > I'm trying to catch up on my email after being gone for a few 
> > days--did you get this resolved?
> > 
> > Scott
> > 
> > On Fri, 2008-02-22 at 13:01 -0500, todd.moughamer at syngenta.com
wrote:
> > > Scott,
> > > 
> > > The BioPerl live updated fixed the "Can't locate object method 
> > > "database" problem (Thanks!). The loading progresses much further 
> > > but now errors out with the message below. Here I am using the 
> > > first
> 
> > > 100 lines of the sample yeast GFF file which did not produce 
> > > errors in the
> > > GFF3 validator:
> > > 
> > > Preparing data for inserting into the chadotest database (This may

> > > take a while ...) Loading data into feature table ...
> > > Loading data into featureloc table ...
> > > Skipping feature_relationship table since the load file is
empty...
> > > Loading data into featureprop table ...
> > > DBD::Pg::db pg_endcopy failed: ERROR:  invalid input syntax for
> > integer:
> > > ""
> > > CONTEXT:  COPY featureprop, line 1, column type_id: ""
> > > 
> > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > MSG: calling endcopy for featureprop failed:
> > > STACK: Error::throw
> > > STACK: Bio::Root::Root::throw
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm:2723
> > > STACK: Bio::GMOD::DB::Adapter::load_data
> > > /usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm:2644
> > > STACK: /usr/bin/gmod_bulk_load_gff3.pl:912
> > > -----------------------------------------------------------
> > > Issuing rollback() for database handle being DESTROY'd without 
> > > explicit disconnect().
> > > 
> > > Thanks,
> > > 
> > > Todd
> > > 
> > > -----Original Message-----
> > > From: Scott Cain [mailto:cain.cshl at gmail.com]
> > > Sent: Wednesday, February 20, 2008 11:45 AM
> > > To: Moughamer Todd USRE
> > > Cc: hlapp at duke.edu; help at gmod.org
> > > Subject: RE: [Gmod-help] gmod_bulk_load_gff3.pl
> > > 
> > > Hi Todd,
> > > 
> > > The ##gff-version error it reported won't be a problem; the loader

> > > is quite forgiving about that.  The invalid type problems could 
> > > potentially be a problem though.  Here's the thing: Chado uses SO 
> > > (so yes, you should be using so.obo) for feature types, while the 
> > > current
> > > GFF3 spec requires SOFA (I've been advocating for changing the 
> > > spec and it probably will change in the near future).
> > > 
> > > So, if the validator is complaining about those terms because they

> > > aren't in SOFA, but they are in SO, that's no problem.  But if 
> > > they aren't in SO either (if, for instance, they've been 
> > > obsoleted), then
> 
> > > you'll have to fix the file.  When I confronted with something 
> > > like that, I just do a global search and replace on the term to 
> > > swap in the
> > 
> > > nearest term in SO.
> > > 
> > > Scott
> > > 
> > > On Wed, 2008-02-20 at 11:34 -0500, todd.moughamer at syngenta.com
> wrote:
> > > > Hi Scott,
> > > > 
> > > > We downloaded Chado from CVS. We are in the process of 
> > > > installing the live BioPerl.
> > > > 
> > > > I am using the example yeast GFF3 files from the web site 
> > > > (http://www.gmod.org/wiki/index.php/Load_GFF_Into_Chado). I ran 
> > > > them
> > 
> > > > through the validator and sure enough they came back invalid 
> > > > (both
> 
> > > > the
> > > 
> > > > original and sorted forms). Here are some some of the errors:
> > > > 
> > > > Line Number  Error/Warning
> > > > -----------  -------------
> > > > 1            [ERROR]   first line must be ##gff-version 3 (line:
> > SGD)
> > > > 350          [ERROR]   invalid type (type: gene_cassette)
> > > > 352          [ERROR]   invalid type (type: gene_cassette)
> > > > 369          [ERROR]   invalid type (type: gene_cassette)
> > > > 1065         [ERROR]   invalid type (type: long_terminal_repeat)
> > > > 1066         [ERROR]   invalid type (type: long_terminal_repeat)
> > > > 1072         [ERROR]   invalid type (type:
> > transposable_element_gene)
> > > > ...
> > > > 
> > > > I'm also wondering if I should be using the sofa.obo file rather

> > > > than the so.obo file I downloaded from sequenceontology.org?
> > > > 
> > > > Thanks,
> > > > 
> > > > Todd
> > > > 
> > > > -----Original Message-----
> > > > From: Scott Cain [mailto:cain.cshl at gmail.com]
> > > > Sent: Tuesday, February 19, 2008 3:49 PM
> > > > To: Hilmar Lapp
> > > > Cc: Moughamer Todd USRE; help at gmod.org
> > > > Subject: Re: [Gmod-help] gmod_bulk_load_gff3.pl
> > > > 
> > > > OK, after running some tests, I still think an out of date 
> > > > BioPerl
> 
> > > > is probably at fault.  With a current checkout of both the 
> > > > schema and bioperl repositories, I can load GFF3 data with 
> > > > Dbxref tags successfully.
> > > > 
> > > > Todd, a few questions for you:
> > > > 
> > > > * are you using a cvs checkout of chado as well?  I should have 
> > > > asked that before telling you to update bioperl.
> > > > 
> > > > * If you are using current checkouts of chado and bioperl and 
> > > > still have the problem, could you please run your GFF3 through 
> > > > the
> 
> > > > GFF3 validator to see if it turns up any problems:
> > > > 
> > > >   http://dev.wormbase.org/db/validate_gff3/validate_gff3_online
> > > > 
> > > > * If you still get the same error message, could you please send

> > > > me a sample of the offending GFF3?  There may be a case that I 
> > > > didn't think
> > > 
> > > > of.
> > > > 
> > > > Thanks,
> > > > Scott
> > > > 
> > > > On Tue, 2008-02-19 at 14:13 -0500, Scott Cain wrote:
> > > > > Hi Hilmar,
> > > > > 
> > > > > Hah!  I was getting ready to write a response where I 
> > > > > basically said
> > > 
> > > > > I
> > > > 
> > > > > didn't think that was what was going on, and I would have 
> > > > > justified it
> > > > 
> > > > > with some hand waving, since I didn't have the actual error 
> > > > > message,
> > > 
> > > > > so I was just guessing and hopefully, updating bioperl will 
> > > > > fix the problem (and that still is a possibility).
> > > > > 
> > > > > However, I do have the actual error message, so I went and 
> > > > > looked the offending line, and it is in a method called 
> > > > > 'handle_dbxref', so
> > > 
> > > > > it looks like your diagnosis is spot on.  Now I need to figure

> > > > > out
> > 
> > > > > if this is still happening with bioperl-live and figure out
why.
> > > > > I've got a
> > > > >
Bio::SeqFeature::Annotated->annotation->get_Annotations('Dbxref'
> > > > > ),
> > > > which I think should return a list of DBLink features.  I guess 
> > > > I'll
> > 
> > > > go see.
> > > > > 
> > > > > Thanks for pointing that out!
> > > > > Scott
> > > > > 
> > > > > On Tue, 2008-02-19 at 13:55 -0500, Hilmar Lapp wrote:
> > > > > > Well, B::A::SimpleValue never had a method called
database(). 
> > > > > > It
> > 
> > > > > > is B::A::DBLink that has that (and always had).
> > > > > > 
> > > > > > So my first diagnosis from afar would be that something is 
> > > > > > returning
> > > > 
> > > > > > or creating a B::A::SimpleValue when it was expected to 
> > > > > > return
> 
> > > > > > or create a B::A::DBLink.
> > > > > > 
> > > > > > 	-hilmar
> > > > > > 
> > > > > > On Feb 19, 2008, at 1:07 PM, Scott Cain wrote:
> > > > > > 
> > > > > > > Hi Todd,
> > > > > > >
> > > > > > > Yes, that is still the most likely solution for you.  A 
> > > > > > > few months
> > > > 
> > > > > > > ago, the BioPerl API changed and 
> > > > > > > Bio::Annotation::SimpleValue objects don't work the same 
> > > > > > > way
> 
> > > > > > > that they used to, thus the error you are seeing.
> > > > > > > It's not really looking for a method named 'database'; 
> > > > > > > that is
> > 
> > > > > > > an artifact left over from the API change.
> > > > > > >
> > > > > > > Scott
> > > > > > >
> > > > > > > On Tue, 2008-02-19 at 11:59 -0500, 
> > > > > > > todd.moughamer at syngenta.com
> > > > wrote:
> > > > > > >> I run into the following error when trying to load Chado:
> > > > > > >>
> > > > > > >> gmod_bulk_load_gff3.pl --organism yeast  --gfffile 
> > > > > > >> ~/tmp/saccharomyces_cerevisiae.gff.sorted --dbname 
> > > > > > >> chadotest Preparing data for inserting into the chadotest

> > > > > > >> database (This may take a while ...) Can't locate object 
> > > > > > >> method "database" via
> > > 
> > > > > > >> package "Bio::Annotation::SimpleValue"
> > > > > > >> at /usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm
> > > > > > >> line
> > 
> > > > > > >> 3061, <GEN0> line 1.
> > > > > > >> Issuing rollback() for database handle being DESTROY'd 
> > > > > > >> without explicit disconnect().
> > > > > > >>
> > > > > > >> I found reference to this problem online 
> > > > > > >> (http://www.nabble.com/question-about-gmod_bulk_load_gff3
> > > > > > >> .p
> > > > > > >> l-
> > > > > > >> td15135949.html) with the recommendation of downloading 
> > > > > > >> the
> > > > 'live'  
> > > > > > >> version of BioPerl. However, upon browsing the latest 
> > > > > > >> code in
> > 
> > > > > > >> SVN
> > > > 
> > > > > > >> I do not see inclusion of a "database" method. Is this 
> > > > > > >> still the recommended solution to the problem?
> > > > > > >>
> > > > > > >> Thanks,
> > > > > > >>
> > > > > > >> Todd
> > > > > > >>
> > > > > > >>
> > > > > > >> Todd Moughamer
> > > > > > >>
> > > > > > >> Bioinformatics Consultancy & Training Group
> > > > > > >>
> > > > > > >> Syngenta Biotechnology, Inc.
> > > > > > >>
> > > > > > >> 3054 Cornwallis Road, 1243.E
> > > > > > >>
> > > > > > >> Research Triangle Park, NC 27709-2257
> > > > > > >>
> > > > > > >> Tel: 919-597-3078
> > > > > > >>
> > > > > > >> Email: todd.moughamer at syngenta.com www.syngenta.com
> > > > > > >>
> > > > > > >>
> > > > > > > --
> > > > > > >
> > > > ----------------------------------------------------------------
> > > > --
> > > > --
> > > > --
> > > > > > > --
> > > > > > > Scott Cain, Ph. D.

> > > > > > > cain at cshl.edu
> > > > > > > GMOD Coordinator (http://www.gmod.org/)
> 
> > > > > > > 216-392-3087
> > > > > > > Cold Spring Harbor Laboratory
> > > > > > >
> > > > > > 
> > > --
> > >
> > --------------------------------------------------------------------
> > --
> > --
> > > Scott Cain, Ph. D.
> > cain.cshl at gmail.com
> > > GMOD Coordinator (http://www.gmod.org/)
> > 216-392-3087
> > > Cold Spring Harbor Laboratory
> > > 
> > > 
> > --
> >
> ----------------------------------------------------------------------
> --
> > Scott Cain, Ph. D.
> cain.cshl at gmail.com
> > GMOD Coordinator (http://www.gmod.org/)
> 216-392-3087
> > Cold Spring Harbor Laboratory
> > 
> > 
> --
>
------------------------------------------------------------------------
> Scott Cain, Ph. D.
cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)
216-392-3087
> Cold Spring Harbor Laboratory
> 
> 
> 
--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory






More information about the Gmod-help mailing list