[Gmod-help] Error with gmod_bulk_load_gff3.pl

Christian M. Probst cmacprobst at gmail.com
Thu Jul 17 17:26:24 EDT 2008


Hi,

I am trying to upload my organism data to CHADO and I am stuck in a error.
I have downloaded a GenBank formatted file, used the suggested
transformation to GFF3:

bp_genbank2gff3.pl -noCDS -s -o . temp.txt


After, I have used this sintax for gmod_bulk_load:


gmod_bulk_load_gff3.pl --dbname XXX --dbxref GeneID --organism XXX
--gff temp.gff


Preparing data for inserting into the CruziGeneDB database
(This may take a while ...)

no parent Tc00.1047053508153.20;
you probably need to rerun the loader with the --recreate_cache option

Well, the Tc00.1047053508153.20 ID is in the GFF file and is before
the entry that references it as Parent.

I have followed the suggestion, and ran the same command line above,
but including --recreate_cache.

The script runs for a long time and then the following error appears.

DBD::Pg::db pg_endcopy failed: ERROR:  invalid input syntax for integer: ""
CONTEXT:  COPY feature_relationship, line 1, column type_id: "" at
/opt/coolstack/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm line
2723, <$fh> line 64298.


Then I tried to run gmod_bulk_load with --noload --inserts --save_tmpfiles.

When inspecting the chado-feature_relationshipXXX file, I have found
that Features having a Parent= delimiter in the GFF file have a empty
field in the INSERT statement for type_id. As an example:

INSERT INTO feature_relationship
(feature_relationship_id,subject_id,object_id,type_id) VALUES
(15,190738,190737,);
INSERT INTO feature_relationship
(feature_relationship_id,subject_id,object_id,type_id) VALUES
(16,190739,190738,53);

The first line is from a feature containing a Parent delimiter. It has
a empty value for type_id
The second line is from a feature containing a derived_from delimiter.
It has the correct cvterm_id for type_id.

So, the OBO relationship of the Parent Delimiter is not being
correctly identified.

I have tried to found 'part_of' in the cvterm table, and found only
entries related to the cv 'Gene Ontology' and 'Plant Ontology'.
The 'derives_from' term, in the cvterm table, is mapped to the
'relationship' cv, but I have no 'part_of' mapped to 'relationship'
cv. Is that a possible source for this error? Anyway, if you could
help me in any
sense, I would be very glad.

Thanks in advance.

Christian M. Probst
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://brie4.cshl.edu/pipermail/gmod-help/attachments/20080717/2bb0bc1d/attachment.html>


More information about the Gmod-help mailing list