[Gmod-help] gmod_bulk_load_gff3.pl
Scott Cain
cain.cshl at gmail.com
Wed Mar 5 09:25:11 EST 2008
Hi Todd,
By default, the Chado adaptor for GBrowse assumes that there is only one
organism. If there is more than one organism in the database, there is
a db_args option to specify which organism to use for GBrowse.
What version of GBrowse are you using? Are there any messages in your
apache error_log file?
Scott
On Tue, 2008-03-04 at 18:27 -0500, todd.moughamer at syngenta.com wrote:
> Scott,
>
> Thanks I was able to load successfully!!
>
> I have one remaining issue. I trying to connect Gbrowse and Chado and I
> set up my configuration and there are no error that I can. However, I
> can seem to get anything to render in Gbrowse. Nothing comes up on the
> search. The sample GFF uses the 'chromosome' type and that is specified
> as the reference in the conf. One thing that I noticed is that there is
> no place in the conf file to specify which organism you are working
> with. So I'm wondering if I missing something.
>
> Todd
>
>
> [GENERAL]
> description = test implementation of chado5
> db_adaptor = Bio::DB::Das::Chado
> database = dbi:PgPP:dbname=chadotest;host=localhost;port=5432
> user = mccall
> pass = <pwd>
>
> -----Original Message-----
> From: Scott Cain [mailto:cain.cshl at gmail.com]
> Sent: Tuesday, March 04, 2008 10:28 AM
> To: Moughamer Todd USRE
> Cc: help at gmod.org
> Subject: RE: [Gmod-help] gmod_bulk_load_gff3.pl
>
> Hi Todd,
>
> The feature property vocabulary is separate. It is a collection of
> common annotation terms, like Note, non_canonical_start_codon, problem
> and status. It can be loaded by running 'make ontologies' and selecting
> option 4. After loading it, you should be good to go--at the worst,
> you'll need to add '--recreate_cache' to the GFF load command line to
> flush out incomplete information in a loader helping table.
>
> Scott
>
> On Tue, 2008-03-04 at 08:44 -0500, todd.moughamer at syngenta.com wrote:
> > Hi Scott,
> >
> > Thanks. Is the feature property vocuabulary part of the sequence
> > ontology or is it separate? As I said we installed only the SO and
> > Relationship Ontology. If not would you recommend clearing out the
> > database before re-running make ontologies?
> >
> > Best,
> >
> > Todd
> >
> > -----Original Message-----
> > From: Scott Cain [mailto:cain.cshl at gmail.com]
> > Sent: Friday, February 29, 2008 11:24 PM
> > To: Moughamer Todd USRE
> > Cc: help at gmod.org
> > Subject: RE: [Gmod-help] gmod_bulk_load_gff3.pl
> >
> > Hi Todd,
> >
> > I just reproduced the problem you are seeing by not loading the
> > feature property controlled vocabulary. For most cvterms, the loader
> > checks to make sure it is present before writing back to the database.
>
> > The GFF annotation 'Note' was being treated as a special case and I
> > forgot to add the check that it existed. I've updated the loader to
> > give a useful message and stop when it finds that Note doesn't exist.
> >
> > Sorry for the hassle.
> > Scott
> >
> >
> > On Fri, 2008-02-29 at 11:50 -0500, todd.moughamer at syngenta.com wrote:
> > > Hi Scott,
> > >
> > > I talked to the unix admin who did that part of the install. He ran
> > > the makes and only had a problem with make ontologies. There was a
> > > problem with make ontologies...I believe it was a missing library
> > > and eventually he got it to run with no reported problems. He
> > > installed the relationship and sequence ontologies.
> > >
> > > Todd
> > >
> > > -----Original Message-----
> > > From: Scott Cain [mailto:cain.cshl at gmail.com]
> > > Sent: Friday, February 29, 2008 10:10 AM
> > > To: Moughamer Todd USRE
> > > Cc: help at gmod.org
> > > Subject: RE: [Gmod-help] gmod_bulk_load_gff3.pl
> > >
> > > Hi Todd,
> > >
> > > When you created the database, did you use the 'make' based
> > > procedure that is outlined in the install document? That is, did
> you do this:
> > >
> > > make load_schema
> > > make prepdb
> > > make ontologies
> > >
> > > and when you loaded ontologies, did you load all of relation,
> > > sequence, gene and feature property? My best guess for what is
> > > going wrong is that the feature property controlled vocabulary is
> > > missing or
> >
> > > that something from the prepdb inserts is missing (though I doubt
> > > that
> >
> > > latter is the problem--I don't think you would have made it this far
>
> > > if that were the case). I'll try loading the SGD GFF file now to
> > > see if I run into any problems.
> > >
> > > Scott
> > >
> > > On Fri, 2008-02-29 at 10:00 -0500, todd.moughamer at syngenta.com
> wrote:
> > > > Scott,
> > > >
> > > > No resolution yet. Just in case I ran the gff file through
> > > > dos2unix
> >
> > > > and that didn't help. I saw in a posting about a similar error
> > > > that it
> > >
> > > > might have something to do with auto-incrementing of IDs. My next
> > > > step
> > >
> > > > would be to wipe out the database and reload the ontology
> > > > dump...unless you have other suggestions.
> > > >
> > > > Thanks,
> > > >
> > > > Todd
> > > >
> > > > -----Original Message-----
> > > > From: Scott Cain [mailto:cain.cshl at gmail.com]
> > > > Sent: Friday, February 29, 2008 9:09 AM
> > > > To: Moughamer Todd USRE
> > > > Cc: help at gmod.org
> > > > Subject: RE: [Gmod-help] gmod_bulk_load_gff3.pl
> > > >
> > > > Hi Todd,
> > > >
> > > > I'm trying to catch up on my email after being gone for a few
> > > > days--did you get this resolved?
> > > >
> > > > Scott
> > > >
> > > > On Fri, 2008-02-22 at 13:01 -0500, todd.moughamer at syngenta.com
> > wrote:
> > > > > Scott,
> > > > >
> > > > > The BioPerl live updated fixed the "Can't locate object method
> > > > > "database" problem (Thanks!). The loading progresses much
> > > > > further but now errors out with the message below. Here I am
> > > > > using the first
> > >
> > > > > 100 lines of the sample yeast GFF file which did not produce
> > > > > errors in the
> > > > > GFF3 validator:
> > > > >
> > > > > Preparing data for inserting into the chadotest database (This
> > > > > may
> >
> > > > > take a while ...) Loading data into feature table ...
> > > > > Loading data into featureloc table ...
> > > > > Skipping feature_relationship table since the load file is
> > empty...
> > > > > Loading data into featureprop table ...
> > > > > DBD::Pg::db pg_endcopy failed: ERROR: invalid input syntax for
> > > > integer:
> > > > > ""
> > > > > CONTEXT: COPY featureprop, line 1, column type_id: ""
> > > > >
> > > > > ------------- EXCEPTION: Bio::Root::Exception -------------
> > > > > MSG: calling endcopy for featureprop failed:
> > > > > STACK: Error::throw
> > > > > STACK: Bio::Root::Root::throw
> > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:357
> > > > > STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
> > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm:2723
> > > > > STACK: Bio::GMOD::DB::Adapter::load_data
> > > > > /usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm:2644
> > > > > STACK: /usr/bin/gmod_bulk_load_gff3.pl:912
> > > > > -----------------------------------------------------------
> > > > > Issuing rollback() for database handle being DESTROY'd without
> > > > > explicit disconnect().
> > > > >
> > > > > Thanks,
> > > > >
> > > > > Todd
> > > > >
> > > > > -----Original Message-----
> > > > > From: Scott Cain [mailto:cain.cshl at gmail.com]
> > > > > Sent: Wednesday, February 20, 2008 11:45 AM
> > > > > To: Moughamer Todd USRE
> > > > > Cc: hlapp at duke.edu; help at gmod.org
> > > > > Subject: RE: [Gmod-help] gmod_bulk_load_gff3.pl
> > > > >
> > > > > Hi Todd,
> > > > >
> > > > > The ##gff-version error it reported won't be a problem; the
> > > > > loader
> >
> > > > > is quite forgiving about that. The invalid type problems could
> > > > > potentially be a problem though. Here's the thing: Chado uses
> > > > > SO (so yes, you should be using so.obo) for feature types, while
>
> > > > > the current
> > > > > GFF3 spec requires SOFA (I've been advocating for changing the
> > > > > spec and it probably will change in the near future).
> > > > >
> > > > > So, if the validator is complaining about those terms because
> > > > > they
> >
> > > > > aren't in SOFA, but they are in SO, that's no problem. But if
> > > > > they aren't in SO either (if, for instance, they've been
> > > > > obsoleted), then
> > >
> > > > > you'll have to fix the file. When I confronted with something
> > > > > like that, I just do a global search and replace on the term to
> > > > > swap in the
> > > >
> > > > > nearest term in SO.
> > > > >
> > > > > Scott
> > > > >
> > > > > On Wed, 2008-02-20 at 11:34 -0500, todd.moughamer at syngenta.com
> > > wrote:
> > > > > > Hi Scott,
> > > > > >
> > > > > > We downloaded Chado from CVS. We are in the process of
> > > > > > installing the live BioPerl.
> > > > > >
> > > > > > I am using the example yeast GFF3 files from the web site
> > > > > > (http://www.gmod.org/wiki/index.php/Load_GFF_Into_Chado). I
> > > > > > ran them
> > > >
> > > > > > through the validator and sure enough they came back invalid
> > > > > > (both
> > >
> > > > > > the
> > > > >
> > > > > > original and sorted forms). Here are some some of the errors:
> > > > > >
> > > > > > Line Number Error/Warning
> > > > > > ----------- -------------
> > > > > > 1 [ERROR] first line must be ##gff-version 3
> (line:
> > > > SGD)
> > > > > > 350 [ERROR] invalid type (type: gene_cassette)
> > > > > > 352 [ERROR] invalid type (type: gene_cassette)
> > > > > > 369 [ERROR] invalid type (type: gene_cassette)
> > > > > > 1065 [ERROR] invalid type (type:
> long_terminal_repeat)
> > > > > > 1066 [ERROR] invalid type (type:
> long_terminal_repeat)
> > > > > > 1072 [ERROR] invalid type (type:
> > > > transposable_element_gene)
> > > > > > ...
> > > > > >
> > > > > > I'm also wondering if I should be using the sofa.obo file
> > > > > > rather
> >
> > > > > > than the so.obo file I downloaded from sequenceontology.org?
> > > > > >
> > > > > > Thanks,
> > > > > >
> > > > > > Todd
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: Scott Cain [mailto:cain.cshl at gmail.com]
> > > > > > Sent: Tuesday, February 19, 2008 3:49 PM
> > > > > > To: Hilmar Lapp
> > > > > > Cc: Moughamer Todd USRE; help at gmod.org
> > > > > > Subject: Re: [Gmod-help] gmod_bulk_load_gff3.pl
> > > > > >
> > > > > > OK, after running some tests, I still think an out of date
> > > > > > BioPerl
> > >
> > > > > > is probably at fault. With a current checkout of both the
> > > > > > schema and bioperl repositories, I can load GFF3 data with
> > > > > > Dbxref tags successfully.
> > > > > >
> > > > > > Todd, a few questions for you:
> > > > > >
> > > > > > * are you using a cvs checkout of chado as well? I should
> > > > > > have asked that before telling you to update bioperl.
> > > > > >
> > > > > > * If you are using current checkouts of chado and bioperl and
> > > > > > still have the problem, could you please run your GFF3 through
>
> > > > > > the
> > >
> > > > > > GFF3 validator to see if it turns up any problems:
> > > > > >
> > > > > >
> > > > > > http://dev.wormbase.org/db/validate_gff3/validate_gff3_online
> > > > > >
> > > > > > * If you still get the same error message, could you please
> > > > > > send
> >
> > > > > > me a sample of the offending GFF3? There may be a case that I
>
> > > > > > didn't think
> > > > >
> > > > > > of.
> > > > > >
> > > > > > Thanks,
> > > > > > Scott
> > > > > >
> > > > > > On Tue, 2008-02-19 at 14:13 -0500, Scott Cain wrote:
> > > > > > > Hi Hilmar,
> > > > > > >
> > > > > > > Hah! I was getting ready to write a response where I
> > > > > > > basically said
> > > > >
> > > > > > > I
> > > > > >
> > > > > > > didn't think that was what was going on, and I would have
> > > > > > > justified it
> > > > > >
> > > > > > > with some hand waving, since I didn't have the actual error
> > > > > > > message,
> > > > >
> > > > > > > so I was just guessing and hopefully, updating bioperl will
> > > > > > > fix the problem (and that still is a possibility).
> > > > > > >
> > > > > > > However, I do have the actual error message, so I went and
> > > > > > > looked the offending line, and it is in a method called
> > > > > > > 'handle_dbxref', so
> > > > >
> > > > > > > it looks like your diagnosis is spot on. Now I need to
> > > > > > > figure
> >
> > > > > > > out
> > > >
> > > > > > > if this is still happening with bioperl-live and figure out
> > why.
> > > > > > > I've got a
> > > > > > >
> > Bio::SeqFeature::Annotated->annotation->get_Annotations('Dbxref'
> > > > > > > ),
> > > > > > which I think should return a list of DBLink features. I
> > > > > > guess I'll
> > > >
> > > > > > go see.
> > > > > > >
> > > > > > > Thanks for pointing that out!
> > > > > > > Scott
> > > > > > >
> > > > > > > On Tue, 2008-02-19 at 13:55 -0500, Hilmar Lapp wrote:
> > > > > > > > Well, B::A::SimpleValue never had a method called
> > database().
> > > > > > > > It
> > > >
> > > > > > > > is B::A::DBLink that has that (and always had).
> > > > > > > >
> > > > > > > > So my first diagnosis from afar would be that something is
>
> > > > > > > > returning
> > > > > >
> > > > > > > > or creating a B::A::SimpleValue when it was expected to
> > > > > > > > return
> > >
> > > > > > > > or create a B::A::DBLink.
> > > > > > > >
> > > > > > > > -hilmar
> > > > > > > >
> > > > > > > > On Feb 19, 2008, at 1:07 PM, Scott Cain wrote:
> > > > > > > >
> > > > > > > > > Hi Todd,
> > > > > > > > >
> > > > > > > > > Yes, that is still the most likely solution for you. A
> > > > > > > > > few months
> > > > > >
> > > > > > > > > ago, the BioPerl API changed and
> > > > > > > > > Bio::Annotation::SimpleValue objects don't work the same
>
> > > > > > > > > way
> > >
> > > > > > > > > that they used to, thus the error you are seeing.
> > > > > > > > > It's not really looking for a method named 'database';
> > > > > > > > > that is
> > > >
> > > > > > > > > an artifact left over from the API change.
> > > > > > > > >
> > > > > > > > > Scott
> > > > > > > > >
> > > > > > > > > On Tue, 2008-02-19 at 11:59 -0500,
> > > > > > > > > todd.moughamer at syngenta.com
> > > > > > wrote:
> > > > > > > > >> I run into the following error when trying to load
> Chado:
> > > > > > > > >>
> > > > > > > > >> gmod_bulk_load_gff3.pl --organism yeast --gfffile
> > > > > > > > >> ~/tmp/saccharomyces_cerevisiae.gff.sorted --dbname
> > > > > > > > >> chadotest Preparing data for inserting into the
> > > > > > > > >> chadotest
> >
> > > > > > > > >> database (This may take a while ...) Can't locate
> > > > > > > > >> object method "database" via
> > > > >
> > > > > > > > >> package "Bio::Annotation::SimpleValue"
> > > > > > > > >> at
> > > > > > > > >> /usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm
> > > > > > > > >> line
> > > >
> > > > > > > > >> 3061, <GEN0> line 1.
> > > > > > > > >> Issuing rollback() for database handle being DESTROY'd
> > > > > > > > >> without explicit disconnect().
> > > > > > > > >>
> > > > > > > > >> I found reference to this problem online
> > > > > > > > >> (http://www.nabble.com/question-about-gmod_bulk_load_gf
> > > > > > > > >> f3
> > > > > > > > >> .p
> > > > > > > > >> l-
> > > > > > > > >> td15135949.html) with the recommendation of downloading
>
> > > > > > > > >> the
> > > > > > 'live'
> > > > > > > > >> version of BioPerl. However, upon browsing the latest
> > > > > > > > >> code in
> > > >
> > > > > > > > >> SVN
> > > > > >
> > > > > > > > >> I do not see inclusion of a "database" method. Is this
> > > > > > > > >> still the recommended solution to the problem?
> > > > > > > > >>
> > > > > > > > >> Thanks,
> > > > > > > > >>
> > > > > > > > >> Todd
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > >> Todd Moughamer
> > > > > > > > >>
> > > > > > > > >> Bioinformatics Consultancy & Training Group
> > > > > > > > >>
> > > > > > > > >> Syngenta Biotechnology, Inc.
> > > > > > > > >>
> > > > > > > > >> 3054 Cornwallis Road, 1243.E
> > > > > > > > >>
> > > > > > > > >> Research Triangle Park, NC 27709-2257
> > > > > > > > >>
> > > > > > > > >> Tel: 919-597-3078
> > > > > > > > >>
> > > > > > > > >> Email: todd.moughamer at syngenta.com www.syngenta.com
> > > > > > > > >>
> > > > > > > > >>
> > > > > > > > > --
> > > > > > > > >
> > > > > > --------------------------------------------------------------
> > > > > > --
> > > > > > --
> > > > > > --
> > > > > > --
> > > > > > > > > --
> > > > > > > > > Scott Cain, Ph. D.
> >
> > > > > > > > > cain at cshl.edu
> > > > > > > > > GMOD Coordinator (http://www.gmod.org/)
> > >
> > > > > > > > > 216-392-3087
> > > > > > > > > Cold Spring Harbor Laboratory
> > > > > > > > >
> > > > > > > >
> > > > > --
> > > > >
> > > > ------------------------------------------------------------------
> > > > --
> > > > --
> > > > --
> > > > > Scott Cain, Ph. D.
> > > > cain.cshl at gmail.com
> > > > > GMOD Coordinator (http://www.gmod.org/)
> > > > 216-392-3087
> > > > > Cold Spring Harbor Laboratory
> > > > >
> > > > >
> > > > --
> > > >
> > > --------------------------------------------------------------------
> > > --
> > > --
> > > > Scott Cain, Ph. D.
> > > cain.cshl at gmail.com
> > > > GMOD Coordinator (http://www.gmod.org/)
> > > 216-392-3087
> > > > Cold Spring Harbor Laboratory
> > > >
> > > >
> > > --
> > >
> > ----------------------------------------------------------------------
> > --
> > > Scott Cain, Ph. D.
> > cain.cshl at gmail.com
> > > GMOD Coordinator (http://www.gmod.org/)
> > 216-392-3087
> > > Cold Spring Harbor Laboratory
> > >
> > >
> > >
> > --
> >
> ------------------------------------------------------------------------
> > Scott Cain, Ph. D.
> cain.cshl at gmail.com
> > GMOD Coordinator (http://www.gmod.org/)
> 216-392-3087
> > Cold Spring Harbor Laboratory
> >
> >
> >
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D. cain.cshl at gmail.com
> GMOD Coordinator (http://www.gmod.org/) 216-392-3087
> Cold Spring Harbor Laboratory
>
>
>
--
------------------------------------------------------------------------
Scott Cain, Ph. D. cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
More information about the Gmod-help
mailing list