[Gmod-help] Error with gmod_bulk_load_gff3.pl
Scott Cain
cain.cshl at gmail.com
Thu Jul 17 17:48:31 EDT 2008
Hi Christian,
Sorry for the hassle. The problem is a result of a bug in one of the
prerequisites for gmod, so I can't directly fix it. Since this is the
third or so time we were asked this question, I created a FAQ entry
for it:
http://www.gmod.org/wiki/index.php/Chado_FAQ#Loading_data_into_Chado
I don't know off hand if this is the best place for this
quesiton/answer, so Dave may move it somewhere else. To make sure you
get the info, I'll paste it below. Please let us know if this fixes
the problem.
Thanks,
Scott
Loading data into Chado
When I try to load data into Chado using the GFF bulk loader
(gmod_bulk_load_gff3.pl), I get this error:
DBD::Pg::db pg_endcopy failed: ERROR: invalid input syntax for integer: ""
CONTEXT: COPY feature_relationship, line 1, column type_id: "" at
/usr/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm line 2723, <$fh>
line 64298.
Why is that and what do I do?
Unfortunately there is a bug in one of the prerequisites for the
Chado loader, a perl module called DBIx::DBStag, which does the actual
writing of ontology data to the database. When it loads the Gene
Ontology (and possibly other ontologies), it destroys the 'part_of'
cvterm that belongs to the relationship ontology and makes it part of
GO instead. This is the wrong behavior, but at the moment, there is
nothing we can do about it.
Instead, you must run a SQL command to repair the database:
UPDATE cvterm SET cv_id = (SELECT cv_id FROM cv WHERE name = 'relationship')
WHERE name = 'part_of'
AND cv_id IN (SELECT cv_id FROM cv WHERE name='gene_ontology');
Then, rerunning the loader with the --recreate_cache option should
allow the database to load. Sorry for the hassle.
On Thu, Jul 17, 2008 at 5:26 PM, Christian M. Probst
<cmacprobst at gmail.com> wrote:
> Hi,
>
> I am trying to upload my organism data to CHADO and I am stuck in a error.
> I have downloaded a GenBank formatted file, used the suggested
> transformation to GFF3:
>
> bp_genbank2gff3.pl -noCDS -s -o . temp.txt
>
>
>
> After, I have used this sintax for gmod_bulk_load:
>
>
> gmod_bulk_load_gff3.pl --dbname XXX --dbxref GeneID --organism XXX --gff
> temp.gff
>
>
> Preparing data for inserting into the CruziGeneDB database
>
> (This may take a while ...)
>
> no parent Tc00.1047053508153.20;
> you probably need to rerun the loader with the --recreate_cache option
>
> Well, the Tc00.1047053508153.20 ID is in the GFF file and is before the
> entry that references it as Parent.
>
>
> I have followed the suggestion, and ran the same command line above, but
> including --recreate_cache.
>
> The script runs for a long time and then the following error appears.
>
> DBD::Pg::db pg_endcopy failed: ERROR: invalid input syntax for integer: ""
>
> CONTEXT: COPY feature_relationship, line 1, column type_id: "" at
> /opt/coolstack/lib/perl5/site_perl/5.8.8/Bio/GMOD/DB/Adapter.pm line 2723,
> <$fh> line 64298.
>
>
> Then I tried to run gmod_bulk_load with --noload --inserts --save_tmpfiles.
>
>
> When inspecting the chado-feature_relationshipXXX file, I have found that
> Features having a Parent= delimiter in the GFF file have a empty field in
> the INSERT statement for type_id. As an example:
>
> INSERT INTO feature_relationship
> (feature_relationship_id,subject_id,object_id,type_id) VALUES
> (15,190738,190737,);
>
> INSERT INTO feature_relationship
> (feature_relationship_id,subject_id,object_id,type_id) VALUES
> (16,190739,190738,53);
>
> The first line is from a feature containing a Parent delimiter. It has a
> empty value for type_id
>
> The second line is from a feature containing a derived_from delimiter. It
> has the correct cvterm_id for type_id.
>
> So, the OBO relationship of the Parent Delimiter is not being correctly
> identified.
>
> I have tried to found 'part_of' in the cvterm table, and found only entries
> related to the cv 'Gene Ontology' and 'Plant Ontology'.
>
> The 'derives_from' term, in the cvterm table, is mapped to the
> 'relationship' cv, but I have no 'part_of' mapped to 'relationship' cv. Is
> that a possible source for this error? Anyway, if you could help me in any
>
> sense, I would be very glad.
>
> Thanks in advance.
>
> Christian M. Probst
>
>
>
--
------------------------------------------------------------------------
Scott Cain, Ph. D. cain.cshl at gmail.com
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
More information about the Gmod-help
mailing list