[Gmod-help] Re: [Gmod-schema] Error using gmod_bulk_load_gff3.pl with a ##sequence-region directive
Jonathan Leto
jaleto at gmail.com
Tue Jul 27 19:39:27 EDT 2010
Howdy,
Could you explain what exactly Chado and BioPerl are disagreeing on?
If modifying BioPerl does not make any BioPerl tests fail and allows the loading
of sequence-region directives, I think it should be done.
If the part of BioPerl that needs to be modified has no or few tests, I can add
some and ask the BioPerl people what they think.
Duke
On Fri, Jul 23, 2010 at 10:52 AM, Scott Cain <scott at scottcain.net> wrote:
> This is in fact a current bug; the easiest work around is to get rid
> of sequence-region directives. Actually fixing the bug is a little
> trickier since it is due to the fact the Chado and BioPerl have
> different ideas of what should happen. While I could (probably)
> modify BioPerl to do the right thing (from my perspective), I am
> reluctant to do that at the moment since that section of BioPerl is
> slated to be refactored.
>
> Scott
>
>
> On Tue, Jul 20, 2010 at 6:55 PM, Dave Clements, GMOD Help Desk
> <help at gmod.org> wrote:
>> Hi Jonathan,
>> I've created a bug report on this:
>> http://sourceforge.net/tracker/?func=detail&aid=3032325&group_id=27707&atid=391291
>> This is interesting because the code says:
>> This script does not use sequence-region directives for anything.
>> If it represents a feature that needs to be inserted into the database,
>> it should be represented with a full GFF line.
>> Dave C.
>> On Fri, Jul 16, 2010 at 1:31 PM, Jonathan Leto <jaleto at gmail.com> wrote:
>>>
>>> Howdy,
>>>
>>> I have been attempting to load the ITAG GFF3 [0] files, which contain
>>> ##sequence-region directives, but I run into errors like this:
>>>
>>> $ ./gmod_bulk_load_gff3.pl --gfffile
>>> ~/git/ITAG1_release/ITAG1_gene_models_sample.gff3 --organism tomato
>>> --noexon --recreate_cache --analysis --remove_lock --save_tmpfiles
>>> (Re)creating the uniquename cache in the database...
>>> Creating table...
>>> Populating table...
>>> Creating indexes...
>>> Adjusting the primary key sequences (if necessary)...Done.
>>>
>>> --------------------- WARNING ---------------------
>>> MSG: '##feature-ontology' directive handling not yet implemented
>>> ---------------------------------------------------
>>> Preparing data for inserting into the cxgn database
>>> (This may take a while ...)
>>> Loading data into feature table ...
>>> COPY feature
>>> (feature_id,organism_id,name,uniquename,type_id,is_analysis,seqlen,dbxref_id)
>>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>>> line 3210.
>>> Loading data into featureloc table ...
>>> COPY featureloc
>>>
>>> (featureloc_id,feature_id,srcfeature_id,fmin,fmax,strand,phase,rank,locgroup)
>>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>>> line 3210.
>>> DBD::Pg::db pg_endcopy failed: ERROR: invalid input syntax for integer:
>>> ""
>>> CONTEXT: COPY featureloc, line 1, column strand: "" at
>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm line 3222, <$fh>
>>> line 3.
>>>
>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>> MSG: calling endcopy for featureloc failed:
>>> STACK: Error::throw
>>> STACK: Bio::Root::Root::throw
>>> /home/leto/local-lib/lib/perl5/Bio/Root/Root.pm:368
>>> STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3222
>>> STACK: Bio::GMOD::DB::Adapter::load_data
>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3144
>>> STACK: ./gmod_bulk_load_gff3.pl:1060
>>> -----------------------------------------------------------
>>>
>>> The salient information is that somehow a strand of "" is attempting
>>> to be inserted into the database, which fails. Note that I have also
>>> uncommented
>>> a warning statement that shows the SQL query that is being executed.
>>>
>>> I have traced this issue to be caused by the sequence-region
>>> directive. When I remove the line, the file loads fine. As another
>>> test, I created a file with nothing but a sequence-region directive,
>>> and the same error occurs. I have attached that file and the temp
>>> data file that gmod_bulk_load_gff3.pl creates as well. The 6th column
>>> of that file is the strand, and it has a value of "\N, which is the
>>> text representation of NULL.
>>>
>>> It seems to me that something is stringifying the NULL into "" and
>>> then attempting to insert the empty string into strand, which has a
>>> type of smallint. This is what causes the failure.
>>>
>>> I would greatly appreciate any thoughts or comments on how to make the
>>> bulk loading script support the sequence-region directive.
>>>
>>> Thanks
>>>
>>> [0] ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG1_release/
>>>
>>> --
>>> Jonathan "Duke" Leto
>>> jonathan at leto.net
>>> http://leto.net
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Sprint
>>> What will you do first with EVO, the first 4G phone?
>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> Gmod-schema at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>
>>
>>
>> --
>> ===> PLEASE KEEP RESPONSES ON THE LIST <===
>> http://gmod.org/wiki/GMOD_News
>> http://gmod.org/wiki/Calendar
>> http://gmod.org/wiki/Help_Desk_Feedback
>>
>>
>> ------------------------------------------------------------------------------
>> This SF.net email is sponsored by Sprint
>> What will you do first with EVO, the first 4G phone?
>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>> _______________________________________________
>> Gmod-schema mailing list
>> Gmod-schema at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D. scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/) 216-392-3087
> Ontario Institute for Cancer Research
>
--
Jonathan "Duke" Leto
jonathan at leto.net
http://leto.net
More information about the Gmod-help
mailing list