[Gmod-help] chado loading problem

Scott Cain cain.cshl at gmail.com
Tue Apr 8 14:58:03 EDT 2008


Hi Genevieve,

The warnings "no cvterm for CDS/region" indicate that there is a problem
with the Sequence Ontology.  Did you load it?  If you think you did, you
could try a query on the cvterm table:

  SELECT cvterm.name,cvterm.cvterm_id,cv.name FROM cvterm,cv WHERE cv.cv_id=cvterm.cv_id AND cvterm.name = 'CDS';

To see what you get (you can try 'region' for the name too).

A few other notes:

* When you rerun the loader, you may need to add the --recreate_cache
option to flush out temporary table entries that may have been created
during previous failed database loads.
* Depending on how you converted the GenBank file to GFF, you probably
don't need to preprocess it.  For example the BioPerl genbank2gff3.pl
script (I believe) creates GFF in a Chado friendly way.
* When the loader runs successfully, it will print messages out to the
console indicating what tables it is populating.  Certainly, feature and
featureloc will get entries.
* I edited the "Load GFF Into Chado" page to take out the explicit
insert of an organism_id, as I think it is a bad idea to override the
built-in sequences without a good reason (they result in database errors
later when the DB tries to reuse the ID you used).  Given how high a
number you chose though, I don't thing you will realistically run into
this problem.

Also, I'm leaving to attend a conference tomorrow and then on vacation
for a while, so I will be only intermittently answering email.

Scott

On Tue, 2008-04-08 at 11:52 -0400, Genevieve DeClerck wrote:
> Hello,
> 
> I have the chado schema installed in a postgres database on an OS X  
> 10.4.11 box. I am following the instructions at the gmod wiki in  
> "Load GFF Into Chado" and am encountering a problem.
> I am trying to load Pseudomonas syringae pv DC3000 data, which is in  
> refseq (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ 
> Pseudomonas_syringae_tomato_DC3000). I inserted an entry in table  
> 'organism' an entry for DC3000:
> 
> insert into organism (abbreviation, genus, species, common_name,  
> organism_id) values  
> ('NC_004578','Pseudomonas','syringae','DC3000','223283')
> 
> and the data is now in the db:
> 
> test=# select * from organism where organism_id='223283';
> organism_id | abbreviation |    genus    | species  | common_name |  
> comment
> -------------+--------------+-------------+----------+------------- 
> +---------
>       223283 | NC_004578    | Pseudomonas | syringae | DC3000      |  
> (1 row)
> 
> 
> Now, I preprocess the genbank gff with 'gmod_gff3_preprocessor.pl',  
> which seems to go fine. then try to load the gff with  
> 'gmod_bulk_load_gff3.pl' and I get an error:
> 
> $ gmod_gff3_preprocessor.pl --gfffile NC_004578.gff
> Sorting the contents of NC_004578.gff ...
> Writing sorted contents to NC_004578.gff.sorted ...
> 
> $ gmod_bulk_load_gff3.pl --organism DC3000 --gfffile  
> NC_004578.gff.sorted
> Preparing data for inserting into the test database
> (This may take a while ...)
> 
> There is a CDS feature with no parent (ID:)  I think that is wrong!
> 
> This GFF file has CDS and/or UTR features that do not belong to a  
> 'central dogma' gene (ie, gene/transcript/CDS).  The features of this  
> type are being stored in the database as is.
> 
> ------------- EXCEPTION  -------------
> MSG: no cvterm for CDS
> STACK Bio::GMOD::DB::Adapter::get_type /my_packages/gmod/chado/schema/ 
> chado/lib/Bio/GMOD/DB/Adapter.pm:4050
> STACK toplevel /sw/bin/gmod_bulk_load_gff3.pl:752
> --------------------------------------
> Issuing rollback() for database handle being DESTROY'd without  
> explicit disconnect().
> 
> 
> I also tried starting with the gbk file from genbank, but still no  
> success with loading (the gbk -> gff conversion seems to have gone ok):
> 
> $ ../../../bin/gmod_bulk_load_gff3.pl --organism DC3000 -gfffile  
> NC_004578.gbk.gff
> Preparing data for inserting into the test database(This may take a  
> while ...)
> ------------- EXCEPTION  -------------
> MSG: no cvterm for region
> STACK Bio::GMOD::DB::Adapter::get_type /my_packages/gmod/chado/schema/ 
> chado/lib/Bio/GMOD/DB/Adapter.pm:4050
> STACK toplevel ../../../bin/gmod_bulk_load_gff3.pl:752
> --------------------------------------
> Issuing rollback() for database handle being DESTROY'd without  
> explicit disconnect().
> 
> 
> Any ideas about what might be going wrong?
> Also, what tables would be populated if the load was successful?
> 
> Thanks,
> Genevieve
> 
> 
> 
> 
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory




More information about the Gmod-help mailing list