[Gmod-help] Loading WormBase GFF3 into Chado

Scott Cain scott at scottcain.net
Wed Aug 25 22:39:27 EDT 2010


Hi Florian,

When using BioPerl 1.6.1, the sequence-region directive is not
supported in GFF3 files, and in fact causes errors exactly like that.
I've fixed that problem in bioperl-live in github, but the easy work
around is to just delete them.

Not that I think that will be the end of the issues with the WormBase
GFF... :-)  (I'm just guessing; I know that a few years ago it was
difficult, but maybe it's better now.)

Scott


On Wed, Aug 25, 2010 at 7:51 PM, Florian Wagner <email at florianwagner.eu> wrote:
> Hi,
>
> I could use some help here...I'm trying to load the GFF3 file of the latest
> C. elegans genome release from WormBase (WS217) into Chado, using
>
> gmod_bulk_load_gff3.pl --gfffile c_elegans.WS217.gff3 --fastafile
> c_elegans.WS217.dna.fa --organism CELE --dbname chado
>
> This gives the error message:
>
> Preparing data for inserting into the chado database
> (This may take a while...)
> Unable to find srcfeature I in the database.
> ... at /usr/local/share/perl/5.10.1/Bio/GMOD/DB/Adapter.pm line 4555
> ... called at /usr/local/bin/gmod_bulk_load_gff3.pl line 841
> Abnormal termination, trying to clean up...
>
> I've read Scott's answer to a similar problem here:
> http://osdir.com/ml/science.biology.gmod.gbrowse/2008-07/msg00014.html
>
> However, adding the chromosomes manually to the top of the GFF file, as
> suggested there, does not solve the problem. Actually, the original GFF file
> comes with these kinds of lines (sowhere in the file), e.g.:
>
> I       Reference       chromosome      1       15072423        .       +
>     .       ID=I;Name=I
>
> So this doesn't seem to be a problem with the GFF file, but with the loader.
> Do you have any ideas how to fix this?
>
> Best, Florian
>
> ps.
>
> I'm using chado 1.11 and bioperl 1.6.1.
>
> The GFF3 file starts like this:
> ##gff-version 3
> ##sequence-region I 1 15072423
> ##sequence-region II 1 15279345
> ##sequence-region III 1 13783700
> ##sequence-region IV 1 17493793
> ##sequence-region MtDNA 1 13794
> ##sequence-region V 1 20924149
> ##sequence-region X 1 17718866
> ...
>
> The original annotation file is available here:
> ftp://ftp.wormbase.org/pub/wormbase/genomes/c_elegans/genome_feature_tables/GFF3/c_elegans.WS217.gff3.gz
>



-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research




More information about the Gmod-help mailing list