[Gmod-help] gmod_bulk_load_gff3.pl

todd.moughamer at syngenta.com todd.moughamer at syngenta.com
Tue Mar 4 18:27:30 EST 2008


Scott,

Thanks I was able to load successfully!!

I have one remaining issue. I trying to connect Gbrowse and Chado and I
set up my configuration and there are no error that I can. However, I
can seem to get anything to render in Gbrowse. Nothing comes up on the
search. The sample GFF uses the 'chromosome' type and that is specified
as the reference in the conf. One thing that I noticed is that there is
no place in the conf file to specify which organism you are working
with. So I'm wondering if I missing something.

Todd


[GENERAL]
description =  test implementation of chado5
db_adaptor    = Bio::DB::Das::Chado
database      = dbi:PgPP:dbname=chadotest;host=localhost;port=5432
user          = mccall 
pass          = <pwd> 

-----Original Message-----
From: Scott Cain [mailto:cain.cshl at gmail.com] 
Sent: Tuesday, March 04, 2008 10:28 AM
To: Moughamer Todd USRE
Cc: help at gmod.org
Subject: RE: [Gmod-help] gmod_bulk_load_gff3.pl

Hi Todd,

The feature property vocabulary is separate.  It is a collection of
common annotation terms, like Note, non_canonical_start_codon, problem
and status.  It can be loaded by running 'make ontologies' and selecting
option 4.  After loading it, you should be good to go--at the worst,
you'll need to add '--recreate_cache' to the GFF load command line to
flush out incomplete information in a loader helping table.

Scott

On Tue, 2008-03-04 at 08:44 -0500, todd.moughamer at syngenta.com wrote:
> Hi Scott,
> 
> Thanks. Is the feature property vocuabulary part of the sequence 
> ontology or is it separate? As I said we installed only the SO and 
> Relationship Ontology. If not would you recommend clearing out the 
> database before re-running make ontologies?
> 
> Best,
> 
> Todd
> 
> -----Original Message-----
> From: Scott Cain [mailto:cain.cshl at gmail.com]
> Sent: Friday, February 29, 2008 11:24 PM
> To: Moughamer Todd USRE
> Cc: help at gmod.org
> Subject: RE: [Gmod-help] gmod_bulk_load_gff3.pl
> 
> Hi Todd,
> 
> I just reproduced the problem you are seeing by not loading the 
> feature property controlled vocabulary.  For most cvterms, the loader 
> checks to make sure it is present before writing back to the database.

> The GFF annotation 'Note' was being treated as a special case and I 
> forgot to add the check that it existed.  I've updated the loader to 
> give a useful message and stop when it finds that Note doesn't exist.
> 
> Sorry for the hassle.
> Scott
> 
> 
> On Fri, 2008-02-29 at 11:50 -0500, todd.moughamer at syngenta.com wrote:
> > Hi Scott,
> > 
> > I talked to the unix admin who did that part of the install. He ran 
> > the makes and only had a problem with make ontologies. There was a 
> > problem with make ontologies...I believe it was a missing library 
> > and eventually he got it to run with no reported problems. He 
> > installed the relationship and sequence ontologies.
> > 
> > Todd
> > 
> > -----Original Message-----
> > From: Scott Cain [mailto:cain.cshl at gmail.com]
> > Sent: Friday, February 29, 2008 10:10 AM
> > To: Moughamer Todd USRE
> > Cc: help at gmod.org
> > Subject: RE: [Gmod-help] gmod_bulk_load_gff3.pl
> > 
> > Hi Todd,
> > 
> > When you created the database, did you use the 'make' based 
> > procedure that is outlined in the install document?  That is, did
you do this:
> > 
> >   make load_schema
> >   make prepdb
> >   make ontologies
> > 
> > and when you loaded ontologies, did you load all of relation, 
> > sequence, gene and feature property?  My best guess for what is 
> > going wrong is that the feature property controlled vocabulary is 
> > missing or
> 
> > that something from the prepdb inserts is missing (though I doubt 
> > that
> 
> > latter is the problem--I don't think you would have made it this far

> > if that were the case).  I'll try loading the SGD GFF file now to 
> > see if I run into any problems.
> > 
> > Scott
> > 
> > On Fri, 2008-02-29 at 10:00 -0500, todd.moughamer at syngenta.com
wrote:
> > > Scott,
> > > 
> > > No resolution yet. Just in case I  ran the gff file through 
> > > dos2unix
> 
> > > and that didn't help. I saw in a posting about a similar error 
> > > that it
> > 
> > > might have something to do with auto-incrementing of IDs. My next 
> > > step
> > 
> > > would be to wipe out the database and reload the ontology 
> > > dump...unless you have other suggestions.
> > > 
> > > Thanks,
> > > 
> > > Todd
> > > 
> > > -----Original Message-----
> > > From: Scott Cain [mailto:cain.cshl at gmail.com]
> > > Sent: Friday, February 29, 2008 9:09 AM
> > > To: Moughamer Todd USRE
> > > Cc: help at gmod.org
> > > Subject: RE: [Gmod-help] gmod_bulk_load_gff3.pl
> > > 
> > > Hi Todd,
> > > 
> > > I'm trying to catch up on my email after being gone for a few 
> > > days--did you get this resolved?
> > > 
> > > Scott
> > > 
> > > On Fri, 2008-02-22 at 13:01 -0500, todd.moughamer at syngenta.com
> wrote:
> > > > Scott,
> > > > 
> > > > The BioPerl live updated fixed the "Can't locate object method 
> > > > "database" problem (Thanks!). The loading progresses much 
> > > > further but now errors out with the message below. Here I am 
> > > > using the first
> > 
> > > > 100 lines of the sample yeast GFF file which did not produce 
> > > > errors in the
> > > > GFF3 validator:
> > > > 
> > > > Preparing data for inserting into the chadotest database (This 
> > > > may
> 
> > > > take a while ...) Loading data into feature table ...
> > > > Loading data into featureloc table ...
> > > > Skipping feature_relationship table since the load file is
> empty...
> > > > Loading data into featureprop table ...
> > > > DBD::Pg::db pg_endcopy failed: ERROR:  invalid input syntax for
> > > integer:
> > > > ""
> > > > CONTEXT:  COPY featureprop, line 1, column type_id: ""
> > > > 
> > > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > > MSG: calling endcopy for featureprop failed:
> > > > STACK: Error::throw
> > > > STACK: Bio::Root::Root::throw
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > > STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm:2723
> > > > STACK: Bio::GMOD::DB::Adapter::load_data
> > > > /usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm:2644
> > > > STACK: /usr/bin/gmod_bulk_load_gff3.pl:912
> > > > -----------------------------------------------------------
> > > > Issuing rollback() for database handle being DESTROY'd without 
> > > > explicit disconnect().
> > > > 
> > > > Thanks,
> > > > 
> > > > Todd
> > > > 
> > > > -----Original Message-----
> > > > From: Scott Cain [mailto:cain.cshl at gmail.com]
> > > > Sent: Wednesday, February 20, 2008 11:45 AM
> > > > To: Moughamer Todd USRE
> > > > Cc: hlapp at duke.edu; help at gmod.org
> > > > Subject: RE: [Gmod-help] gmod_bulk_load_gff3.pl
> > > > 
> > > > Hi Todd,
> > > > 
> > > > The ##gff-version error it reported won't be a problem; the 
> > > > loader
> 
> > > > is quite forgiving about that.  The invalid type problems could 
> > > > potentially be a problem though.  Here's the thing: Chado uses 
> > > > SO (so yes, you should be using so.obo) for feature types, while

> > > > the current
> > > > GFF3 spec requires SOFA (I've been advocating for changing the 
> > > > spec and it probably will change in the near future).
> > > > 
> > > > So, if the validator is complaining about those terms because 
> > > > they
> 
> > > > aren't in SOFA, but they are in SO, that's no problem.  But if 
> > > > they aren't in SO either (if, for instance, they've been 
> > > > obsoleted), then
> > 
> > > > you'll have to fix the file.  When I confronted with something 
> > > > like that, I just do a global search and replace on the term to 
> > > > swap in the
> > > 
> > > > nearest term in SO.
> > > > 
> > > > Scott
> > > > 
> > > > On Wed, 2008-02-20 at 11:34 -0500, todd.moughamer at syngenta.com
> > wrote:
> > > > > Hi Scott,
> > > > > 
> > > > > We downloaded Chado from CVS. We are in the process of 
> > > > > installing the live BioPerl.
> > > > > 
> > > > > I am using the example yeast GFF3 files from the web site 
> > > > > (http://www.gmod.org/wiki/index.php/Load_GFF_Into_Chado). I 
> > > > > ran them
> > > 
> > > > > through the validator and sure enough they came back invalid 
> > > > > (both
> > 
> > > > > the
> > > > 
> > > > > original and sorted forms). Here are some some of the errors:
> > > > > 
> > > > > Line Number  Error/Warning
> > > > > -----------  -------------
> > > > > 1            [ERROR]   first line must be ##gff-version 3
(line:
> > > SGD)
> > > > > 350          [ERROR]   invalid type (type: gene_cassette)
> > > > > 352          [ERROR]   invalid type (type: gene_cassette)
> > > > > 369          [ERROR]   invalid type (type: gene_cassette)
> > > > > 1065         [ERROR]   invalid type (type:
long_terminal_repeat)
> > > > > 1066         [ERROR]   invalid type (type:
long_terminal_repeat)
> > > > > 1072         [ERROR]   invalid type (type:
> > > transposable_element_gene)
> > > > > ...
> > > > > 
> > > > > I'm also wondering if I should be using the sofa.obo file 
> > > > > rather
> 
> > > > > than the so.obo file I downloaded from sequenceontology.org?
> > > > > 
> > > > > Thanks,
> > > > > 
> > > > > Todd
> > > > > 
> > > > > -----Original Message-----
> > > > > From: Scott Cain [mailto:cain.cshl at gmail.com]
> > > > > Sent: Tuesday, February 19, 2008 3:49 PM
> > > > > To: Hilmar Lapp
> > > > > Cc: Moughamer Todd USRE; help at gmod.org
> > > > > Subject: Re: [Gmod-help] gmod_bulk_load_gff3.pl
> > > > > 
> > > > > OK, after running some tests, I still think an out of date 
> > > > > BioPerl
> > 
> > > > > is probably at fault.  With a current checkout of both the 
> > > > > schema and bioperl repositories, I can load GFF3 data with 
> > > > > Dbxref tags successfully.
> > > > > 
> > > > > Todd, a few questions for you:
> > > > > 
> > > > > * are you using a cvs checkout of chado as well?  I should 
> > > > > have asked that before telling you to update bioperl.
> > > > > 
> > > > > * If you are using current checkouts of chado and bioperl and 
> > > > > still have the problem, could you please run your GFF3 through

> > > > > the
> > 
> > > > > GFF3 validator to see if it turns up any problems:
> > > > > 
> > > > >   
> > > > > http://dev.wormbase.org/db/validate_gff3/validate_gff3_online
> > > > > 
> > > > > * If you still get the same error message, could you please 
> > > > > send
> 
> > > > > me a sample of the offending GFF3?  There may be a case that I

> > > > > didn't think
> > > > 
> > > > > of.
> > > > > 
> > > > > Thanks,
> > > > > Scott
> > > > > 
> > > > > On Tue, 2008-02-19 at 14:13 -0500, Scott Cain wrote:
> > > > > > Hi Hilmar,
> > > > > > 
> > > > > > Hah!  I was getting ready to write a response where I 
> > > > > > basically said
> > > > 
> > > > > > I
> > > > > 
> > > > > > didn't think that was what was going on, and I would have 
> > > > > > justified it
> > > > > 
> > > > > > with some hand waving, since I didn't have the actual error 
> > > > > > message,
> > > > 
> > > > > > so I was just guessing and hopefully, updating bioperl will 
> > > > > > fix the problem (and that still is a possibility).
> > > > > > 
> > > > > > However, I do have the actual error message, so I went and 
> > > > > > looked the offending line, and it is in a method called 
> > > > > > 'handle_dbxref', so
> > > > 
> > > > > > it looks like your diagnosis is spot on.  Now I need to 
> > > > > > figure
> 
> > > > > > out
> > > 
> > > > > > if this is still happening with bioperl-live and figure out
> why.
> > > > > > I've got a
> > > > > >
> Bio::SeqFeature::Annotated->annotation->get_Annotations('Dbxref'
> > > > > > ),
> > > > > which I think should return a list of DBLink features.  I 
> > > > > guess I'll
> > > 
> > > > > go see.
> > > > > > 
> > > > > > Thanks for pointing that out!
> > > > > > Scott
> > > > > > 
> > > > > > On Tue, 2008-02-19 at 13:55 -0500, Hilmar Lapp wrote:
> > > > > > > Well, B::A::SimpleValue never had a method called
> database(). 
> > > > > > > It
> > > 
> > > > > > > is B::A::DBLink that has that (and always had).
> > > > > > > 
> > > > > > > So my first diagnosis from afar would be that something is

> > > > > > > returning
> > > > > 
> > > > > > > or creating a B::A::SimpleValue when it was expected to 
> > > > > > > return
> > 
> > > > > > > or create a B::A::DBLink.
> > > > > > > 
> > > > > > > 	-hilmar
> > > > > > > 
> > > > > > > On Feb 19, 2008, at 1:07 PM, Scott Cain wrote:
> > > > > > > 
> > > > > > > > Hi Todd,
> > > > > > > >
> > > > > > > > Yes, that is still the most likely solution for you.  A 
> > > > > > > > few months
> > > > > 
> > > > > > > > ago, the BioPerl API changed and 
> > > > > > > > Bio::Annotation::SimpleValue objects don't work the same

> > > > > > > > way
> > 
> > > > > > > > that they used to, thus the error you are seeing.
> > > > > > > > It's not really looking for a method named 'database'; 
> > > > > > > > that is
> > > 
> > > > > > > > an artifact left over from the API change.
> > > > > > > >
> > > > > > > > Scott
> > > > > > > >
> > > > > > > > On Tue, 2008-02-19 at 11:59 -0500, 
> > > > > > > > todd.moughamer at syngenta.com
> > > > > wrote:
> > > > > > > >> I run into the following error when trying to load
Chado:
> > > > > > > >>
> > > > > > > >> gmod_bulk_load_gff3.pl --organism yeast  --gfffile 
> > > > > > > >> ~/tmp/saccharomyces_cerevisiae.gff.sorted --dbname 
> > > > > > > >> chadotest Preparing data for inserting into the 
> > > > > > > >> chadotest
> 
> > > > > > > >> database (This may take a while ...) Can't locate 
> > > > > > > >> object method "database" via
> > > > 
> > > > > > > >> package "Bio::Annotation::SimpleValue"
> > > > > > > >> at 
> > > > > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm
> > > > > > > >> line
> > > 
> > > > > > > >> 3061, <GEN0> line 1.
> > > > > > > >> Issuing rollback() for database handle being DESTROY'd 
> > > > > > > >> without explicit disconnect().
> > > > > > > >>
> > > > > > > >> I found reference to this problem online
> > > > > > > >> (http://www.nabble.com/question-about-gmod_bulk_load_gf
> > > > > > > >> f3
> > > > > > > >> .p
> > > > > > > >> l-
> > > > > > > >> td15135949.html) with the recommendation of downloading

> > > > > > > >> the
> > > > > 'live'  
> > > > > > > >> version of BioPerl. However, upon browsing the latest 
> > > > > > > >> code in
> > > 
> > > > > > > >> SVN
> > > > > 
> > > > > > > >> I do not see inclusion of a "database" method. Is this 
> > > > > > > >> still the recommended solution to the problem?
> > > > > > > >>
> > > > > > > >> Thanks,
> > > > > > > >>
> > > > > > > >> Todd
> > > > > > > >>
> > > > > > > >>
> > > > > > > >> Todd Moughamer
> > > > > > > >>
> > > > > > > >> Bioinformatics Consultancy & Training Group
> > > > > > > >>
> > > > > > > >> Syngenta Biotechnology, Inc.
> > > > > > > >>
> > > > > > > >> 3054 Cornwallis Road, 1243.E
> > > > > > > >>
> > > > > > > >> Research Triangle Park, NC 27709-2257
> > > > > > > >>
> > > > > > > >> Tel: 919-597-3078
> > > > > > > >>
> > > > > > > >> Email: todd.moughamer at syngenta.com www.syngenta.com
> > > > > > > >>
> > > > > > > >>
> > > > > > > > --
> > > > > > > >
> > > > > --------------------------------------------------------------
> > > > > --
> > > > > --
> > > > > --
> > > > > --
> > > > > > > > --
> > > > > > > > Scott Cain, Ph. D.
> 
> > > > > > > > cain at cshl.edu
> > > > > > > > GMOD Coordinator (http://www.gmod.org/)
> > 
> > > > > > > > 216-392-3087
> > > > > > > > Cold Spring Harbor Laboratory
> > > > > > > >
> > > > > > > 
> > > > --
> > > >
> > > ------------------------------------------------------------------
> > > --
> > > --
> > > --
> > > > Scott Cain, Ph. D.
> > > cain.cshl at gmail.com
> > > > GMOD Coordinator (http://www.gmod.org/)
> > > 216-392-3087
> > > > Cold Spring Harbor Laboratory
> > > > 
> > > > 
> > > --
> > >
> > --------------------------------------------------------------------
> > --
> > --
> > > Scott Cain, Ph. D.
> > cain.cshl at gmail.com
> > > GMOD Coordinator (http://www.gmod.org/)
> > 216-392-3087
> > > Cold Spring Harbor Laboratory
> > > 
> > > 
> > --
> >
> ----------------------------------------------------------------------
> --
> > Scott Cain, Ph. D.
> cain.cshl at gmail.com
> > GMOD Coordinator (http://www.gmod.org/)
> 216-392-3087
> > Cold Spring Harbor Laboratory
> > 
> > 
> > 
> --
>
------------------------------------------------------------------------
> Scott Cain, Ph. D.
cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/)
216-392-3087
> Cold Spring Harbor Laboratory
> 
> 
> 
--
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory






More information about the Gmod-help mailing list