[Gmod-help] Re: [Gmod-schema] Error using gmod_bulk_load_gff3.pl with a ##sequence-region directive

Scott Cain scott at scottcain.net
Wed Jul 28 15:20:09 EDT 2010


An additional (though probably somewhat easy to fix) issue is
Bio::FeatureIO's insistence that ##sequence-region directives get
turned into features.  These bits of data are not sufficient to create
a full fledged feature that Chado requires, which is why the loader
(should) ignore them.  Only it can't, because it defers to
Bio::FeatureIO for file parsing.  If the constructor had a flag to
ignore those directives, that would make life a little better.  Even
better than that would be if Bio::FeatureIO could return a message
stating that a ##sequence-region directive was found but was being
ignored, so that message could be relayed to the user.

On the other hand, I was unaware of Bio::FeatureIO dropping features;
that's somewhat unpleasant.  I recall an issue with skipping
sequences, but I thought that was fixed already.

Scott


On Wed, Jul 28, 2010 at 12:53 AM, Chris Fields <cjfields at illinois.edu> wrote:
> I think the part of BioPerl Scott is referring to for significant refactoring is Bio::FeatureIO.  Scott, is that correct?
>
> Having some tests would really help.  I can always sync them over to the Bio-FeatureIO repo, which is separate from core ATM.  I did uncover some pretty significant bugs during my first round of FeatureIO work which are now fixed (skipping features and/or sequences was one).  Now just waiting on tuits...
>
> chris
>
> On Jul 27, 2010, at 6:39 PM, Jonathan Leto wrote:
>
>> Howdy,
>>
>> Could you explain what exactly Chado and BioPerl are disagreeing on?
>> If modifying BioPerl does not make any BioPerl tests fail and allows the loading
>> of sequence-region directives, I think it should be done.
>>
>> If the part of BioPerl that needs to be modified has no or few tests, I can add
>> some and ask the BioPerl people what they think.
>>
>> Duke
>>
>>
>> On Fri, Jul 23, 2010 at 10:52 AM, Scott Cain <scott at scottcain.net> wrote:
>>> This is in fact a current bug; the easiest work around is to get rid
>>> of sequence-region directives.  Actually fixing the bug is a little
>>> trickier since it is due to the fact the Chado and BioPerl have
>>> different ideas of what should happen.  While I could (probably)
>>> modify BioPerl to do the right thing (from my perspective), I am
>>> reluctant to do that at the moment since that section of BioPerl is
>>> slated to be refactored.
>>>
>>> Scott
>>>
>>>
>>> On Tue, Jul 20, 2010 at 6:55 PM, Dave Clements, GMOD Help Desk
>>> <help at gmod.org> wrote:
>>>> Hi Jonathan,
>>>> I've created a bug report on this:
>>>>   http://sourceforge.net/tracker/?func=detail&aid=3032325&group_id=27707&atid=391291
>>>> This is interesting because the code says:
>>>>   This script does not use sequence-region directives for anything.
>>>>   If it represents a feature that needs to be inserted into the database,
>>>>   it should be represented with a full GFF line.
>>>> Dave C.
>>>> On Fri, Jul 16, 2010 at 1:31 PM, Jonathan Leto <jaleto at gmail.com> wrote:
>>>>>
>>>>> Howdy,
>>>>>
>>>>> I have been attempting to load the ITAG GFF3 [0] files, which contain
>>>>> ##sequence-region directives, but I run into errors like this:
>>>>>
>>>>> $ ./gmod_bulk_load_gff3.pl --gfffile
>>>>> ~/git/ITAG1_release/ITAG1_gene_models_sample.gff3 --organism tomato
>>>>> --noexon --recreate_cache --analysis --remove_lock --save_tmpfiles
>>>>> (Re)creating the uniquename cache in the database...
>>>>> Creating table...
>>>>> Populating table...
>>>>> Creating indexes...
>>>>> Adjusting the primary key sequences (if necessary)...Done.
>>>>>
>>>>> --------------------- WARNING ---------------------
>>>>> MSG: '##feature-ontology' directive handling not yet implemented
>>>>> ---------------------------------------------------
>>>>> Preparing data for inserting into the cxgn database
>>>>> (This may take a while ...)
>>>>> Loading data into feature table ...
>>>>>        COPY feature
>>>>> (feature_id,organism_id,name,uniquename,type_id,is_analysis,seqlen,dbxref_id)
>>>>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>>>>> line 3210.
>>>>> Loading data into featureloc table ...
>>>>>        COPY featureloc
>>>>>
>>>>> (featureloc_id,feature_id,srcfeature_id,fmin,fmax,strand,phase,rank,locgroup)
>>>>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>>>>> line 3210.
>>>>> DBD::Pg::db pg_endcopy failed: ERROR:  invalid input syntax for integer:
>>>>> ""
>>>>> CONTEXT:  COPY featureloc, line 1, column strand: "" at
>>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm line 3222, <$fh>
>>>>> line 3.
>>>>>
>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>> MSG: calling endcopy for featureloc failed:
>>>>> STACK: Error::throw
>>>>> STACK: Bio::Root::Root::throw
>>>>> /home/leto/local-lib/lib/perl5/Bio/Root/Root.pm:368
>>>>> STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
>>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3222
>>>>> STACK: Bio::GMOD::DB::Adapter::load_data
>>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3144
>>>>> STACK: ./gmod_bulk_load_gff3.pl:1060
>>>>> -----------------------------------------------------------
>>>>>
>>>>> The salient information is that somehow a strand of "" is attempting
>>>>> to be inserted into the database, which fails. Note that I have also
>>>>> uncommented
>>>>> a warning statement that shows the SQL query that is being executed.
>>>>>
>>>>> I have traced this issue to be caused by the sequence-region
>>>>> directive. When I remove the line, the file loads fine. As another
>>>>> test, I created a file with nothing but a sequence-region directive,
>>>>> and the same error occurs. I have attached that file and  the temp
>>>>> data file that gmod_bulk_load_gff3.pl creates as well. The 6th column
>>>>> of that file is the strand, and it has a value of "\N, which is the
>>>>> text representation of NULL.
>>>>>
>>>>> It seems to me that something is stringifying the NULL into "" and
>>>>> then attempting to insert the empty string into strand, which has a
>>>>> type of smallint. This is what causes the failure.
>>>>>
>>>>> I would greatly appreciate any thoughts or comments on how to make the
>>>>> bulk loading script support the sequence-region directive.
>>>>>
>>>>> Thanks
>>>>>
>>>>> [0] ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG1_release/
>>>>>
>>>>> --
>>>>> Jonathan "Duke" Leto
>>>>> jonathan at leto.net
>>>>> http://leto.net
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> This SF.net email is sponsored by Sprint
>>>>> What will you do first with EVO, the first 4G phone?
>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>>> _______________________________________________
>>>>> Gmod-schema mailing list
>>>>> Gmod-schema at lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> ===> PLEASE KEEP RESPONSES ON THE LIST <===
>>>> http://gmod.org/wiki/GMOD_News
>>>> http://gmod.org/wiki/Calendar
>>>> http://gmod.org/wiki/Help_Desk_Feedback
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> This SF.net email is sponsored by Sprint
>>>> What will you do first with EVO, the first 4G phone?
>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>> _______________________________________________
>>>> Gmod-schema mailing list
>>>> Gmod-schema at lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>>> Ontario Institute for Cancer Research
>>>
>>
>>
>>
>> --
>> Jonathan "Duke" Leto
>> jonathan at leto.net
>> http://leto.net
>>
>> ------------------------------------------------------------------------------
>> The Palm PDK Hot Apps Program offers developers who use the
>> Plug-In Development Kit to bring their C/C++ apps to Palm for a share
>> of $1 Million in cash or HP Products. Visit us here for more details:
>> http://ad.doubleclick.net/clk;226879339;13503038;l?
>> http://clk.atdmt.com/CRS/go/247765532/direct/01/
>> _______________________________________________
>> Gmod-schema mailing list
>> Gmod-schema at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>
>



-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research




More information about the Gmod-help mailing list