[Gmod-help] Re: [Gmod-schema] Error using gmod_bulk_load_gff3.pl with a ##sequence-region directive

Scott Cain scott at scottcain.net
Wed Aug 11 15:55:05 EDT 2010


Hi Duke,

I made the changes in bioperl-live and the schema repositories so that
##sequence-region directives are always ignored by the GFF3 bulk
loader.

Scott


On Mon, Aug 9, 2010 at 2:06 PM, Jonathan Leto <jaleto at gmail.com> wrote:
> Howdy,
>
> There is actually a flag called -ignore_seqregion in
> Bio::DB::SeqFeature::Store::GFF3Loader .
>
> It would be nice if gmod_bulk_load_gff3.pl could take that as a
> command-line argument
> and do the right thing with it.
>
> Duke
>
>
>
> On Wed, Jul 28, 2010 at 12:20 PM, Scott Cain <scott at scottcain.net> wrote:
>> An additional (though probably somewhat easy to fix) issue is
>> Bio::FeatureIO's insistence that ##sequence-region directives get
>> turned into features.  These bits of data are not sufficient to create
>> a full fledged feature that Chado requires, which is why the loader
>> (should) ignore them.  Only it can't, because it defers to
>> Bio::FeatureIO for file parsing.  If the constructor had a flag to
>> ignore those directives, that would make life a little better.  Even
>> better than that would be if Bio::FeatureIO could return a message
>> stating that a ##sequence-region directive was found but was being
>> ignored, so that message could be relayed to the user.
>>
>> On the other hand, I was unaware of Bio::FeatureIO dropping features;
>> that's somewhat unpleasant.  I recall an issue with skipping
>> sequences, but I thought that was fixed already.
>>
>> Scott
>>
>>
>> On Wed, Jul 28, 2010 at 12:53 AM, Chris Fields <cjfields at illinois.edu> wrote:
>>> I think the part of BioPerl Scott is referring to for significant refactoring is Bio::FeatureIO.  Scott, is that correct?
>>>
>>> Having some tests would really help.  I can always sync them over to the Bio-FeatureIO repo, which is separate from core ATM.  I did uncover some pretty significant bugs during my first round of FeatureIO work which are now fixed (skipping features and/or sequences was one).  Now just waiting on tuits...
>>>
>>> chris
>>>
>>> On Jul 27, 2010, at 6:39 PM, Jonathan Leto wrote:
>>>
>>>> Howdy,
>>>>
>>>> Could you explain what exactly Chado and BioPerl are disagreeing on?
>>>> If modifying BioPerl does not make any BioPerl tests fail and allows the loading
>>>> of sequence-region directives, I think it should be done.
>>>>
>>>> If the part of BioPerl that needs to be modified has no or few tests, I can add
>>>> some and ask the BioPerl people what they think.
>>>>
>>>> Duke
>>>>
>>>>
>>>> On Fri, Jul 23, 2010 at 10:52 AM, Scott Cain <scott at scottcain.net> wrote:
>>>>> This is in fact a current bug; the easiest work around is to get rid
>>>>> of sequence-region directives.  Actually fixing the bug is a little
>>>>> trickier since it is due to the fact the Chado and BioPerl have
>>>>> different ideas of what should happen.  While I could (probably)
>>>>> modify BioPerl to do the right thing (from my perspective), I am
>>>>> reluctant to do that at the moment since that section of BioPerl is
>>>>> slated to be refactored.
>>>>>
>>>>> Scott
>>>>>
>>>>>
>>>>> On Tue, Jul 20, 2010 at 6:55 PM, Dave Clements, GMOD Help Desk
>>>>> <help at gmod.org> wrote:
>>>>>> Hi Jonathan,
>>>>>> I've created a bug report on this:
>>>>>>   http://sourceforge.net/tracker/?func=detail&aid=3032325&group_id=27707&atid=391291
>>>>>> This is interesting because the code says:
>>>>>>   This script does not use sequence-region directives for anything.
>>>>>>   If it represents a feature that needs to be inserted into the database,
>>>>>>   it should be represented with a full GFF line.
>>>>>> Dave C.
>>>>>> On Fri, Jul 16, 2010 at 1:31 PM, Jonathan Leto <jaleto at gmail.com> wrote:
>>>>>>>
>>>>>>> Howdy,
>>>>>>>
>>>>>>> I have been attempting to load the ITAG GFF3 [0] files, which contain
>>>>>>> ##sequence-region directives, but I run into errors like this:
>>>>>>>
>>>>>>> $ ./gmod_bulk_load_gff3.pl --gfffile
>>>>>>> ~/git/ITAG1_release/ITAG1_gene_models_sample.gff3 --organism tomato
>>>>>>> --noexon --recreate_cache --analysis --remove_lock --save_tmpfiles
>>>>>>> (Re)creating the uniquename cache in the database...
>>>>>>> Creating table...
>>>>>>> Populating table...
>>>>>>> Creating indexes...
>>>>>>> Adjusting the primary key sequences (if necessary)...Done.
>>>>>>>
>>>>>>> --------------------- WARNING ---------------------
>>>>>>> MSG: '##feature-ontology' directive handling not yet implemented
>>>>>>> ---------------------------------------------------
>>>>>>> Preparing data for inserting into the cxgn database
>>>>>>> (This may take a while ...)
>>>>>>> Loading data into feature table ...
>>>>>>>        COPY feature
>>>>>>> (feature_id,organism_id,name,uniquename,type_id,is_analysis,seqlen,dbxref_id)
>>>>>>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>>>>>>> line 3210.
>>>>>>> Loading data into featureloc table ...
>>>>>>>        COPY featureloc
>>>>>>>
>>>>>>> (featureloc_id,feature_id,srcfeature_id,fmin,fmax,strand,phase,rank,locgroup)
>>>>>>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>>>>>>> line 3210.
>>>>>>> DBD::Pg::db pg_endcopy failed: ERROR:  invalid input syntax for integer:
>>>>>>> ""
>>>>>>> CONTEXT:  COPY featureloc, line 1, column strand: "" at
>>>>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm line 3222, <$fh>
>>>>>>> line 3.
>>>>>>>
>>>>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>>>>> MSG: calling endcopy for featureloc failed:
>>>>>>> STACK: Error::throw
>>>>>>> STACK: Bio::Root::Root::throw
>>>>>>> /home/leto/local-lib/lib/perl5/Bio/Root/Root.pm:368
>>>>>>> STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
>>>>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3222
>>>>>>> STACK: Bio::GMOD::DB::Adapter::load_data
>>>>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3144
>>>>>>> STACK: ./gmod_bulk_load_gff3.pl:1060
>>>>>>> -----------------------------------------------------------
>>>>>>>
>>>>>>> The salient information is that somehow a strand of "" is attempting
>>>>>>> to be inserted into the database, which fails. Note that I have also
>>>>>>> uncommented
>>>>>>> a warning statement that shows the SQL query that is being executed.
>>>>>>>
>>>>>>> I have traced this issue to be caused by the sequence-region
>>>>>>> directive. When I remove the line, the file loads fine. As another
>>>>>>> test, I created a file with nothing but a sequence-region directive,
>>>>>>> and the same error occurs. I have attached that file and  the temp
>>>>>>> data file that gmod_bulk_load_gff3.pl creates as well. The 6th column
>>>>>>> of that file is the strand, and it has a value of "\N, which is the
>>>>>>> text representation of NULL.
>>>>>>>
>>>>>>> It seems to me that something is stringifying the NULL into "" and
>>>>>>> then attempting to insert the empty string into strand, which has a
>>>>>>> type of smallint. This is what causes the failure.
>>>>>>>
>>>>>>> I would greatly appreciate any thoughts or comments on how to make the
>>>>>>> bulk loading script support the sequence-region directive.
>>>>>>>
>>>>>>> Thanks
>>>>>>>
>>>>>>> [0] ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG1_release/
>>>>>>>
>>>>>>> --
>>>>>>> Jonathan "Duke" Leto
>>>>>>> jonathan at leto.net
>>>>>>> http://leto.net
>>>>>>>
>>>>>>>
>>>>>>> ------------------------------------------------------------------------------
>>>>>>> This SF.net email is sponsored by Sprint
>>>>>>> What will you do first with EVO, the first 4G phone?
>>>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>>>>> _______________________________________________
>>>>>>> Gmod-schema mailing list
>>>>>>> Gmod-schema at lists.sourceforge.net
>>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> ===> PLEASE KEEP RESPONSES ON THE LIST <===
>>>>>> http://gmod.org/wiki/GMOD_News
>>>>>> http://gmod.org/wiki/Calendar
>>>>>> http://gmod.org/wiki/Help_Desk_Feedback
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> This SF.net email is sponsored by Sprint
>>>>>> What will you do first with EVO, the first 4G phone?
>>>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>>>> _______________________________________________
>>>>>> Gmod-schema mailing list
>>>>>> Gmod-schema at lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> ------------------------------------------------------------------------
>>>>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>>>>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>>>>> Ontario Institute for Cancer Research
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Jonathan "Duke" Leto
>>>> jonathan at leto.net
>>>> http://leto.net
>>>>
>>>> ------------------------------------------------------------------------------
>>>> The Palm PDK Hot Apps Program offers developers who use the
>>>> Plug-In Development Kit to bring their C/C++ apps to Palm for a share
>>>> of $1 Million in cash or HP Products. Visit us here for more details:
>>>> http://ad.doubleclick.net/clk;226879339;13503038;l?
>>>> http://clk.atdmt.com/CRS/go/247765532/direct/01/
>>>> _______________________________________________
>>>> Gmod-schema mailing list
>>>> Gmod-schema at lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>>
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>>
>
>
>
> --
> Jonathan "Duke" Leto
> jonathan at leto.net
> http://leto.net
>



-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research




More information about the Gmod-help mailing list