[Gmod-help] chado loading problem

Genevieve DeClerck gad14 at cornell.edu
Tue Apr 8 11:52:36 EDT 2008


Hello,

I have the chado schema installed in a postgres database on an OS X  
10.4.11 box. I am following the instructions at the gmod wiki in  
"Load GFF Into Chado" and am encountering a problem.
I am trying to load Pseudomonas syringae pv DC3000 data, which is in  
refseq (ftp://ftp.ncbi.nih.gov/genomes/Bacteria/ 
Pseudomonas_syringae_tomato_DC3000). I inserted an entry in table  
'organism' an entry for DC3000:

insert into organism (abbreviation, genus, species, common_name,  
organism_id) values  
('NC_004578','Pseudomonas','syringae','DC3000','223283')

and the data is now in the db:

test=# select * from organism where organism_id='223283';
organism_id | abbreviation |    genus    | species  | common_name |  
comment
-------------+--------------+-------------+----------+------------- 
+---------
      223283 | NC_004578    | Pseudomonas | syringae | DC3000      |  
(1 row)


Now, I preprocess the genbank gff with 'gmod_gff3_preprocessor.pl',  
which seems to go fine. then try to load the gff with  
'gmod_bulk_load_gff3.pl' and I get an error:

$ gmod_gff3_preprocessor.pl --gfffile NC_004578.gff
Sorting the contents of NC_004578.gff ...
Writing sorted contents to NC_004578.gff.sorted ...

$ gmod_bulk_load_gff3.pl --organism DC3000 --gfffile  
NC_004578.gff.sorted
Preparing data for inserting into the test database
(This may take a while ...)

There is a CDS feature with no parent (ID:)  I think that is wrong!

This GFF file has CDS and/or UTR features that do not belong to a  
'central dogma' gene (ie, gene/transcript/CDS).  The features of this  
type are being stored in the database as is.

------------- EXCEPTION  -------------
MSG: no cvterm for CDS
STACK Bio::GMOD::DB::Adapter::get_type /my_packages/gmod/chado/schema/ 
chado/lib/Bio/GMOD/DB/Adapter.pm:4050
STACK toplevel /sw/bin/gmod_bulk_load_gff3.pl:752
--------------------------------------
Issuing rollback() for database handle being DESTROY'd without  
explicit disconnect().


I also tried starting with the gbk file from genbank, but still no  
success with loading (the gbk -> gff conversion seems to have gone ok):

$ ../../../bin/gmod_bulk_load_gff3.pl --organism DC3000 -gfffile  
NC_004578.gbk.gff
Preparing data for inserting into the test database(This may take a  
while ...)
------------- EXCEPTION  -------------
MSG: no cvterm for region
STACK Bio::GMOD::DB::Adapter::get_type /my_packages/gmod/chado/schema/ 
chado/lib/Bio/GMOD/DB/Adapter.pm:4050
STACK toplevel ../../../bin/gmod_bulk_load_gff3.pl:752
--------------------------------------
Issuing rollback() for database handle being DESTROY'd without  
explicit disconnect().


Any ideas about what might be going wrong?
Also, what tables would be populated if the load was successful?

Thanks,
Genevieve









More information about the Gmod-help mailing list