[Gmod-help] chado loading problem
Scott Cain
cain.cshl at gmail.com
Tue Apr 8 14:58:03 EDT 2008
Hi Genevieve,
The warnings "no cvterm for CDS/region" indicate that there is a problem
with the Sequence Ontology. Did you load it? If you think you did, you
could try a query on the cvterm table:
SELECT cvterm.name,cvterm.cvterm_id,cv.name FROM cvterm,cv WHERE cv.cv_id=cvterm.cv_id AND cvterm.name = 'CDS';
To see what you get (you can try 'region' for the name too).
A few other notes:
* When you rerun the loader, you may need to add the --recreate_cache
option to flush out temporary table entries that may have been created
during previous failed database loads.
* Depending on how you converted the GenBank file to GFF, you probably
don't need to preprocess it. For example the BioPerl genbank2gff3.pl
script (I believe) creates GFF in a Chado friendly way.
* When the loader runs successfully, it will print messages out to the
console indicating what tables it is populating. Certainly, feature and
featureloc will get entries.
* I edited the "Load GFF Into Chado" page to take out the explicit
insert of an organism_id, as I think it is a bad idea to override the
built-in sequences without a good reason (they result in database errors
later when the DB tries to reuse the ID you used). Given how high a
number you chose though, I don't thing you will realistically run into
this problem.
Also, I'm leaving to attend a conference tomorrow and then on vacation
for a while, so I will be only intermittently answering email.
Scott
On Tue, 2008-04-08 at 11:52 -0400, Genevieve DeClerck wrote:
> Hello,
>
> I have the chado schema installed in a postgres database on an OS X
> 10.4.11 box. I am following the instructions at the gmod wiki in
> "Load GFF Into Chado" and am encountering a problem.
> I am trying to load Pseudomonas syringae pv DC3000 data, which is in
> refseq (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/
> Pseudomonas_syringae_tomato_DC3000). I inserted an entry in table
> 'organism' an entry for DC3000:
>
> insert into organism (abbreviation, genus, species, common_name,
> organism_id) values
> ('NC_004578','Pseudomonas','syringae','DC3000','223283')
>
> and the data is now in the db:
>
> test=# select * from organism where organism_id='223283';
> organism_id | abbreviation | genus | species | common_name |
> comment
> -------------+--------------+-------------+----------+-------------
> +---------
> 223283 | NC_004578 | Pseudomonas | syringae | DC3000 |
> (1 row)
>
>
> Now, I preprocess the genbank gff with 'gmod_gff3_preprocessor.pl',
> which seems to go fine. then try to load the gff with
> 'gmod_bulk_load_gff3.pl' and I get an error:
>
> $ gmod_gff3_preprocessor.pl --gfffile NC_004578.gff
> Sorting the contents of NC_004578.gff ...
> Writing sorted contents to NC_004578.gff.sorted ...
>
> $ gmod_bulk_load_gff3.pl --organism DC3000 --gfffile
> NC_004578.gff.sorted
> Preparing data for inserting into the test database
> (This may take a while ...)
>
> There is a CDS feature with no parent (ID:) I think that is wrong!
>
> This GFF file has CDS and/or UTR features that do not belong to a
> 'central dogma' gene (ie, gene/transcript/CDS). The features of this
> type are being stored in the database as is.
>
> ------------- EXCEPTION -------------
> MSG: no cvterm for CDS
> STACK Bio::GMOD::DB::Adapter::get_type /my_packages/gmod/chado/schema/
> chado/lib/Bio/GMOD/DB/Adapter.pm:4050
> STACK toplevel /sw/bin/gmod_bulk_load_gff3.pl:752
> --------------------------------------
> Issuing rollback() for database handle being DESTROY'd without
> explicit disconnect().
>
>
> I also tried starting with the gbk file from genbank, but still no
> success with loading (the gbk -> gff conversion seems to have gone ok):
>
> $ ../../../bin/gmod_bulk_load_gff3.pl --organism DC3000 -gfffile
> NC_004578.gbk.gff
> Preparing data for inserting into the test database(This may take a
> while ...)
> ------------- EXCEPTION -------------
> MSG: no cvterm for region
> STACK Bio::GMOD::DB::Adapter::get_type /my_packages/gmod/chado/schema/
> chado/lib/Bio/GMOD/DB/Adapter.pm:4050
> STACK toplevel ../../../bin/gmod_bulk_load_gff3.pl:752
> --------------------------------------
> Issuing rollback() for database handle being DESTROY'd without
> explicit disconnect().
>
>
> Any ideas about what might be going wrong?
> Also, what tables would be populated if the load was successful?
>
> Thanks,
> Genevieve
>
>
>
>
>
>
--
------------------------------------------------------------------------
Scott Cain, Ph. D. cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
More information about the Gmod-help
mailing list