[Gmod-help] Loading new organisms into Chado

Scott Cain scott at scottcain.net
Fri Apr 24 13:40:32 EDT 2009


Hi Barry,

If you specify "--organism fromdata" in your load command, then every
line in your GFF file must have in the ninth column "organism=xxx"
where xxx is the name of the organism (I suspect that should be full
Genus species, but I haven't looked at the code to be sure).  This
option was created so that a GFF file from mixed organisms (like if
you wanted to load a whole clade's worth of ESTs at one go).

What you really want to to create an entry for your organism in the
database before you load your gff file and then specify the organism
name on the command line, or add the name of the organism to the gmod
conf file if you are only going to be working with one organism.  So,
you want to do something like this in the psql shell (from the perldoc
of gmod_bulk_load_gff3.pl):

  insert into organism (abbreviation, genus, species, common_name)
                     values ('H.sapiens', 'Homo','sapiens','Human');

Then when you do your data load, you specify "--organism Human" (the
common name) on the command line.  If you are working only with a
single organism and don't want to do the --organism thing every time
you load, you can edit the default.conf file at $GMOD_ROOT/conf, and
add the common name after the equals sign in DBORGANISM=.

Scott


On Fri, Apr 24, 2009 at 1:27 PM, Dave Clements, GMOD Help Desk
<gmodhelp at googlemail.com> wrote:
> Hi Barry,
> I'm cross-posting my response to Don Gilbert, creator of GMODTools, and the
> Chado list where others can see it.
> On Fri, Apr 24, 2009 at 7:50 AM, Barry Dancis <bdancis at digiconasp.com>
> wrote:
>>
>> Hi --
>>
>>  I am trying to learn the Bulk files processing
>> methods(http://gmod.org/wiki/GMODTools_TestCase) but when I ran the upload
>> on the converted file into my exisitng Chado db
>>
>>  perl $lbin/gmod_bulk_load_gff3.pl -gfffile gff3/EU852811.gbk.gff
>> -organism fromdata -dbname Chado -debug >& gmod-load.log
>>
>>  I got the following errors from in my log file:
>>
>> Preparing data for inserting into the Chado database
>> (This may take a while ...)
>> Organism Bio::Annotation::SimpleValue=HASH(0xa37a7d8) from data
>> DBD::Pg::st fetchrow_array failed: no statement executing at
>> /usr/local/share/pe
>> rl/5.10.0/Bio/GMOD/DB/Adapter.pm line 1112, <GEN0> line 7.
>> Bio::Annotation::SimpleValue=HASH(0xa37a7d8) organism not found in the
>> database
>> at /usr/local/bin/gmod_bulk_load_gff3.pl line 711, <GEN0> line 7.
>> Issuing rollback() for database handle being DESTROY'd without explicit
>> disconne
>> ct() at /usr/local/bin/gmod_bulk_load_gff3.pl line 711.
>>
>>  The top of the original genbank file is
>> LOCUS       EU852811               70578 bp    DNA     circular PLN
>> 05-MAR-2009
>> DEFINITION  Saccharomyces pastorianus Weihenstephan 34/70 mitochondrion,
>>            complete genome.
>> ACCESSION   EU852811
>>
>> and the organism is not one of the 12 that got loaded in the organism
>> table during the installation of the Chado db.
>>
>> Is there a separate source for allowable organisms or should the bulk
>> loader be able to load organisms not in the organism table?
>
> You can define any organism by inserting it into the organism table
> (http://gmod.org/wiki/Chado_Organism_Module#Table:_organism).  What I don't
> know is how GMODTools maps the DEFINITION line to the organism table.  That
> is, I don't know what part of "Saccharomyces pastorianus Weihenstephan 34/70
> mitochondrion, complete genome." GMODTools looks for in the organism table.
> Don, any advice on this?
>
>>
>> Looking at the GBrowse Configuration HOWTO document(
>>
>> http://gmod.org/wiki/GBrowse_Configuration_HOWTO) it describes the use of
>> bp_bulk_load_gff.pl for creating databases from scratch, but when I run
>
> bp_full_load_gff.pl creates a database specifically tuned to power GBrowse.
>  This is an entirely different beast than Chado.  For loading Chado, stick
> with GMODTools.
> Dave C.
>
>>
>>  perl $lbin/bp_bulk_load_gff3.pl -gfffile gff3/EU852811.gbk.gff -organism
>> fromdata -dbname Chado -user postgres -passwd postgres -debug >&
>> gmod-load.log
>>
>> I get
>>
>> Can't open perl script "/usr/local/bin/bp_bulk_load_gff3.pl": No such file
>> or directory
>>
>> and I can't find the file in either the GMODTools tree or the GMOD
>> installation kit.
>>
>> What is the diff betw gmod_bulk_load_gff3.pl and bp_bulk_load_gff3.pl? Do
>>  they have diff purposes?
>>
>> Thanks,
>>
>> Barry
>>
>> ps I also posted this on
>> http://osdir.com/ml/science.biology.gmod.devel/2008-02/msg00000.html but I
>> got no reply so I am posting it again here.
>>
>
> Was this helpful?  Let us know at http://gmod.org/wiki/Help_Desk_Feedback
>
> Learn more about GMOD at SMBE & Arthropod Genomics:
>  http://ccg.biology.uiowa.edu/smbe/symposia.php?action=view&sym_ID=27
>  http://www.k-state.edu/agc/symp2009/seminar.html



-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research




More information about the Gmod-help mailing list