[Gmod-help] Re: [Gmod-schema] Error using gmod_bulk_load_gff3.pl with a ##sequence-region directive

Chris Fields cjfields at illinois.edu
Wed Jul 28 00:53:46 EDT 2010


I think the part of BioPerl Scott is referring to for significant refactoring is Bio::FeatureIO.  Scott, is that correct?  

Having some tests would really help.  I can always sync them over to the Bio-FeatureIO repo, which is separate from core ATM.  I did uncover some pretty significant bugs during my first round of FeatureIO work which are now fixed (skipping features and/or sequences was one).  Now just waiting on tuits...

chris

On Jul 27, 2010, at 6:39 PM, Jonathan Leto wrote:

> Howdy,
> 
> Could you explain what exactly Chado and BioPerl are disagreeing on?
> If modifying BioPerl does not make any BioPerl tests fail and allows the loading
> of sequence-region directives, I think it should be done.
> 
> If the part of BioPerl that needs to be modified has no or few tests, I can add
> some and ask the BioPerl people what they think.
> 
> Duke
> 
> 
> On Fri, Jul 23, 2010 at 10:52 AM, Scott Cain <scott at scottcain.net> wrote:
>> This is in fact a current bug; the easiest work around is to get rid
>> of sequence-region directives.  Actually fixing the bug is a little
>> trickier since it is due to the fact the Chado and BioPerl have
>> different ideas of what should happen.  While I could (probably)
>> modify BioPerl to do the right thing (from my perspective), I am
>> reluctant to do that at the moment since that section of BioPerl is
>> slated to be refactored.
>> 
>> Scott
>> 
>> 
>> On Tue, Jul 20, 2010 at 6:55 PM, Dave Clements, GMOD Help Desk
>> <help at gmod.org> wrote:
>>> Hi Jonathan,
>>> I've created a bug report on this:
>>>   http://sourceforge.net/tracker/?func=detail&aid=3032325&group_id=27707&atid=391291
>>> This is interesting because the code says:
>>>   This script does not use sequence-region directives for anything.
>>>   If it represents a feature that needs to be inserted into the database,
>>>   it should be represented with a full GFF line.
>>> Dave C.
>>> On Fri, Jul 16, 2010 at 1:31 PM, Jonathan Leto <jaleto at gmail.com> wrote:
>>>> 
>>>> Howdy,
>>>> 
>>>> I have been attempting to load the ITAG GFF3 [0] files, which contain
>>>> ##sequence-region directives, but I run into errors like this:
>>>> 
>>>> $ ./gmod_bulk_load_gff3.pl --gfffile
>>>> ~/git/ITAG1_release/ITAG1_gene_models_sample.gff3 --organism tomato
>>>> --noexon --recreate_cache --analysis --remove_lock --save_tmpfiles
>>>> (Re)creating the uniquename cache in the database...
>>>> Creating table...
>>>> Populating table...
>>>> Creating indexes...
>>>> Adjusting the primary key sequences (if necessary)...Done.
>>>> 
>>>> --------------------- WARNING ---------------------
>>>> MSG: '##feature-ontology' directive handling not yet implemented
>>>> ---------------------------------------------------
>>>> Preparing data for inserting into the cxgn database
>>>> (This may take a while ...)
>>>> Loading data into feature table ...
>>>>        COPY feature
>>>> (feature_id,organism_id,name,uniquename,type_id,is_analysis,seqlen,dbxref_id)
>>>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>>>> line 3210.
>>>> Loading data into featureloc table ...
>>>>        COPY featureloc
>>>> 
>>>> (featureloc_id,feature_id,srcfeature_id,fmin,fmax,strand,phase,rank,locgroup)
>>>> FROM STDIN; at /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm
>>>> line 3210.
>>>> DBD::Pg::db pg_endcopy failed: ERROR:  invalid input syntax for integer:
>>>> ""
>>>> CONTEXT:  COPY featureloc, line 1, column strand: "" at
>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm line 3222, <$fh>
>>>> line 3.
>>>> 
>>>> ------------- EXCEPTION: Bio::Root::Exception -------------
>>>> MSG: calling endcopy for featureloc failed:
>>>> STACK: Error::throw
>>>> STACK: Bio::Root::Root::throw
>>>> /home/leto/local-lib/lib/perl5/Bio/Root/Root.pm:368
>>>> STACK: Bio::GMOD::DB::Adapter::copy_from_stdin
>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3222
>>>> STACK: Bio::GMOD::DB::Adapter::load_data
>>>> /home/leto/local-lib/lib/perl5/Bio/GMOD/DB/Adapter.pm:3144
>>>> STACK: ./gmod_bulk_load_gff3.pl:1060
>>>> -----------------------------------------------------------
>>>> 
>>>> The salient information is that somehow a strand of "" is attempting
>>>> to be inserted into the database, which fails. Note that I have also
>>>> uncommented
>>>> a warning statement that shows the SQL query that is being executed.
>>>> 
>>>> I have traced this issue to be caused by the sequence-region
>>>> directive. When I remove the line, the file loads fine. As another
>>>> test, I created a file with nothing but a sequence-region directive,
>>>> and the same error occurs. I have attached that file and  the temp
>>>> data file that gmod_bulk_load_gff3.pl creates as well. The 6th column
>>>> of that file is the strand, and it has a value of "\N, which is the
>>>> text representation of NULL.
>>>> 
>>>> It seems to me that something is stringifying the NULL into "" and
>>>> then attempting to insert the empty string into strand, which has a
>>>> type of smallint. This is what causes the failure.
>>>> 
>>>> I would greatly appreciate any thoughts or comments on how to make the
>>>> bulk loading script support the sequence-region directive.
>>>> 
>>>> Thanks
>>>> 
>>>> [0] ftp://ftp.solgenomics.net/tomato_genome/annotation/ITAG1_release/
>>>> 
>>>> --
>>>> Jonathan "Duke" Leto
>>>> jonathan at leto.net
>>>> http://leto.net
>>>> 
>>>> 
>>>> ------------------------------------------------------------------------------
>>>> This SF.net email is sponsored by Sprint
>>>> What will you do first with EVO, the first 4G phone?
>>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>>> _______________________________________________
>>>> Gmod-schema mailing list
>>>> Gmod-schema at lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>> 
>>> 
>>> 
>>> 
>>> --
>>> ===> PLEASE KEEP RESPONSES ON THE LIST <===
>>> http://gmod.org/wiki/GMOD_News
>>> http://gmod.org/wiki/Calendar
>>> http://gmod.org/wiki/Help_Desk_Feedback
>>> 
>>> 
>>> ------------------------------------------------------------------------------
>>> This SF.net email is sponsored by Sprint
>>> What will you do first with EVO, the first 4G phone?
>>> Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> Gmod-schema at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>> 
>>> 
>> 
>> 
>> 
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>> 
> 
> 
> 
> -- 
> Jonathan "Duke" Leto
> jonathan at leto.net
> http://leto.net
> 
> ------------------------------------------------------------------------------
> The Palm PDK Hot Apps Program offers developers who use the
> Plug-In Development Kit to bring their C/C++ apps to Palm for a share 
> of $1 Million in cash or HP Products. Visit us here for more details:
> http://ad.doubleclick.net/clk;226879339;13503038;l?
> http://clk.atdmt.com/CRS/go/247765532/direct/01/
> _______________________________________________
> Gmod-schema mailing list
> Gmod-schema at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-schema





More information about the Gmod-help mailing list