[Gmod-help] Re: "Table 'meta' doesn't exist" error from bp_seqfeature_gff3.PLS
Scott Cain
cain.cshl at gmail.com
Wed Sep 24 11:53:13 EDT 2008
Hi Burcu,
Bio::DB::GFF and Bio::DB::SeqFeature::Store databases don't mix. I
wasn't aware of existing data, so now we have to come up with a
solution for what to do with your existing data: what is it? Can it
be converted to GFF3? Loading the GFF3 you created into a
Bio::DB::GFF database is possible but often problematic.
Scott
On Wed, Sep 24, 2008 at 11:45 AM, Bakir, Burcu <BBakir at hmgc.mcw.edu> wrote:
>
> Hi Scott,
>
> I successfully downloaded bioperl-live from SVN and made the necessary
> settings. Then I ran genbank2gff3.PLS under
> bioperl-live/scripts/Bio-DB-GFF as:
> ./genbank2gff3.PLS -o /rgd_home/3.0/TOOLS/Gbrowse/test/chromosome10/
> rn_ref_chr10.gbk.gz
>
> This time I got a gff3 file more similar to what you had. I'll be
> pasting few lines of it at the end of this email. Then I tried to load
> it to the database using bp_seqfeature_gff3.PLS script under
> bioperl-live/scripts/Bio-SeqFeature-Store as:
>
> -bash-3.00$ ./bp_seqfeature_gff3.PLS -dsn
> "dbi:mysql:database=rgd_904_e;host=forte.hmgc.mcw.edu" -user ZZZ -pass
> MMM
> ../../../../3.0/TOOLS/Gbrowse/test/chromosome10/rn_ref_chr10.gbk.gz.gff3
>
> Here is the error I get:
>
> DBD::mysql::st execute failed: Table 'rgd_904_e.meta' doesn't exist at
> /rgd_home/bioperl-live/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm
> line 1217.
> -------------------- EXCEPTION --------------------
> MSG: Table 'rgd_904_e.meta' doesn't exist
> STACK Bio::DB::SeqFeature::Store::DBI::mysql::setting
> /rgd_home/bioperl-live/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm
> :1217
> STACK Bio::DB::SeqFeature::Store::serializer
> /rgd_home/bioperl-live/bioperl-live/Bio/DB/SeqFeature/Store.pm:1507
> STACK Bio::DB::SeqFeature::Store::default_settings
> /rgd_home/bioperl-live/bioperl-live/Bio/DB/SeqFeature/Store.pm:2066
> STACK Bio::DB::SeqFeature::Store::DBI::mysql::default_settings
> /rgd_home/bioperl-live/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm
> :327
> STACK Bio::DB::SeqFeature::Store::DBI::mysql::init
> /rgd_home/bioperl-live/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm
> :219
> STACK Bio::DB::SeqFeature::Store::new
> /rgd_home/bioperl-live/bioperl-live/Bio/DB/SeqFeature/Store.pm:358
> STACK toplevel ./bp_seqfeature_gff3.PLS:45
>
> It complains about non-existing 'rgd_904_e.meta table. My database has
> fattribute, fattribute_to_feature, fdata, fdna, fgroup, ftype, fmeta
> tables.
>
> I also have another question: Does running bp_seqfeature_gff3.PLS script
> on a database that is already filled with bp_load_gff.pl script cause
> any troubles? Because the data for my previous tracks have been loaded
> via bp_load_gff.pl script.
>
> Thanks,
>
> Burcu
>
> ##gff-version 3
> # sequence-region NW_047337 1 1380475
> # conversion-by bp_genbank2gff3.pl
> # organism Rattus norvegicus
> # date 22-JUN-2006
> # Note Rattus norvegicus chromosome 10 genomic contig, reference
> assembly (based on RGSC v3.4).
> Chr10 GenBank chromosome 83193107 84573581 .
> + . ID=NW_047337;Alias=10;Dbxref=taxon:10116;Note=Rattus n
> orvegicus chromosome 10 genomic contig%2C reference assembly (based on
> RGSC v3.4).;chromosome=10;comment1=Bio::Annotation::Comment%3DHASH(0x94
> e94c);date=22-JUN-2006;mol_type=genomic DNA;organism=Rattus
> norvegicus;strain=BN/SsNHsdMCW
> Chr10 GenBank STS 83194370 83194502 . +
> . ID=GenBank:STS:NW_047337:1264:1396;Dbxref=UniSTS:250658;standa
> rd_name=AI010027
> Chr10 GenBank gene 83194326 83238284 . -
> . ID=LOC619561;Dbxref=GeneID:619561,RGD:1562656;Note=Derived by
> automated computational analysis using gene prediction method:
> BestRefseq. Supporting evidence includes similarity to: 1
> mRNA;gene=LOC619561
> Chr10 GenBank mRNA 83194326 83238284 . -
> . ID=LOC619561.t01;Parent=LOC619561;Dbxref=GI:77993367,GeneID:61
> 9561,RGD:1562656;Note=Derived by automated computational analysis using
> gene prediction method: BestRefseq. Supporting evidence includes simil
> arity to: 1 mRNA;exception=unclassified transcription
> discrepancy;gene=LOC619561;product=hypothetical protein
> LOC619561;transcript_id=NM_00103
> 4951.1
> Chr10 GenBank CDS 83195432 83195482 . -
> . ID=LOC619561.p01;Parent=LOC619561.t01;Dbxref=GI:77993368,GeneI
> D:619561,RGD:1562656;codon_start=1;gene=LOC619561;product=hypothetical
> protein LOC619561;protein_id=NP_001030123.1
>
>
> -----Original Message-----
> From: Scott Cain [mailto:cain.cshl at gmail.com]
> Sent: Monday, September 22, 2008 12:14 PM
> To: Bakir, Burcu
> Subject: Re: [Gmod-gbrowse] multi-segmented feature looks fine at low
> power, but connecting lines between segments disappear when zoomed in
>
> Hi Burcu,
>
> Certainly you can install bioperl-live in a local folder; I'm
> reasonably sure that the bioperl website has directions for do that.
> The 1.69 release of GBrowse requires it though, so if you are planning
> on installing the new GBrowse, you'll want to address that as well (it
> is possible to have a bioperl in place just for GBrowse to use, but it
> requires a little forethougth and planning).
>
> Scott
>
>
> On Mon, Sep 22, 2008 at 1:11 PM, Bakir, Burcu <BBakir at hmgc.mcw.edu>
> wrote:
>> Hi Scott,
>>
>> Or how about installing bioperl-live to a local folder (not as root)
> and
>> make sure it gets first in @INC variable. Hence I can keep other
>> previously installed BioPerls. I don't know what other programs using
>> the other previously installed BioPerls. I don't want to get rid of
> them
>> suddenly.
>>
>> Thanks,
>>
>> Burcu
>>
>> -----Original Message-----
>> From: Scott Cain [mailto:cain.cshl at gmail.com]
>> Sent: Monday, September 22, 2008 11:19 AM
>> To: Bakir, Burcu
>> Cc: Don Gilbert; help at gmod.org; gmod-gbrowse at lists.sourceforge.net
>> Subject: Re: [Gmod-gbrowse] multi-segmented feature looks fine at low
>> power, but connecting lines between segments disappear when zoomed in
>>
>> Hi Burcu,
>>
>> What version of BioPerl are you using? If you think you are using
>> bioperl-live, is it possible that there is more than one BioPerl
>> installed?
>>
>> Scott
>>
>> On Mon, Sep 22, 2008 at 12:08 PM, Bakir, Burcu <BBakir at hmgc.mcw.edu>
>> wrote:
>>> Hi Scott,
>>>
>>> Thanks for your explanations. I think still something wrong with my
>> gff3
>>> file. I ran the GenBank record for rn_ref_chr1.gbk file through the
>>> BioPerl bp_genbank2gff3.pl script as following:
>>>
>>> bp_genbank2gff3.pl -y rn_ref_chr1.gbk
>>>
>>> where --split -y option is documented to split output to seperate
>> GFF
>>> and fasta files for each genbank record
>>>
>>> My gff3 file differs than what you pasted here in the following
> terms:
>>> Your first uncommented line is for chromosome, mine is region and
> then
>>> next is contig. Your features have "ID", whereas mine have "iD". Your
>>> CDS also has "ID", whereas mine has no ID or iD but has Parent. I
>> don't
>>> know why those are different than your gff3 output. Did you run the
>>> script just for NW_047331? Is this the difference that I'm running
> the
>>> script for the whole chromosome(rn_ref_chr1.gbk)?
>>>
>>> Using rn_ref_chr1.gbk should be fine. The bp_genbank2gff3.pl
>>> documentation states as following:
>>> The input files are assumed to be gzipped GenBank flatfiles for
> refseq
>>> contigs. The files may contain multiple GenBank records. Either a
>>> single file or an entire directory can be processed. By default, the
>>> DNA sequence is embedded in the GFF but it can be saved into seperate
>>> fasta file with the --split(-y) option.
>>>
>>> Thanks,
>>>
>>> Burcu
>>>
>>> I'm pasting here first few lines of my NW_047331.gff3 file.
>>>
>>> ##gff-version 3
>>> ##sequence-region NW_047331 1 1994762
>>> ##source bp_genbank2gff3.pl
>>> NW_047331 GenBank region 1 1994762 . . .
>>> ID=NW_047331
>>> NW_047331 GenBank contig 1 1994762 . + .
>>> iD=GenBank:contig:NW_047331:1:1994762;mol_type=genomic
>>>
>>
> DNA;db_xref=taxon:10116;strain=BN/SsNHsdMCW;chromosome=10;organism=Rattu
>>> s norvegicus
>>> NW_047331 GenBank gap 2014 3020 . + .
>>> iD=GenBank:gap:NW_047331:2014:3020;estimated_length=1007
>>> NW_047331 GenBank gene 49 13315 . + .
>>> iD=Bfar;db_xref=GeneID:304709,RGD:1304791;gene=Bfar;note=Derived by
>>> automated computational analysis using gene prediction method:
>>> BestRefseq. Supporting evidence includes similarity to: 1 mRNA
>>> NW_047331 GenBank mRNA 49 13315 . + .
>>> iD=Bfar.t01;Parent=Bfar;gene=Bfar;note=Derived by automated
>>> computational analysis using gene prediction method: BestRefseq.
>>> Supporting evidence includes similarity to: 1
>>>
>>
> mRNA;db_xref=GI:61557020,GeneID:304709,RGD:1304791;exception=unclassifie
>>> d transcription discrepancy;product=bifunctional apoptosis
>>> regulator;transcript_id=NM_001013125.1
>>> NW_047331 GenBank CDS 49 216 . + .
>>> Parent=Bfar.t01;go_function=zinc ion binding [goid 0008270] [evidence
>>> IEA],structural molecule activity [goid 0005198] [evidence
>>> IEA],ubiquitin-protein ligase activity [goid 0004842] [evidence
>>> IEA];protein_id=NP_001013143.1;gene=Bfar;go_process=anti-apoptosis
>> [goid
>>> 0006916] [evidence IEA],protein ubiquitination [goid 0016567]
>> [evidence
>>>
>>
> IEA];db_xref=GI:61557021,GeneID:304709,RGD:1304791;go_component=membrane
>>> fraction [goid 0005624] [evidence IEA],ubiquitin ligase complex [goid
>>> 0000151] [evidence IEA],integral to plasma membrane [goid 0005887]
>>> [evidence IEA];codon_start=1;exception=unclassified translation
>>> discrepancy;product=bifunctional apoptosis regulator (predicted)
>>> NW_047331 GenBank exon 49 216 . + .
>>> Parent=Bfar.t01;gene=Bfar
>>>
>>>
>>> -----Original Message-----
>>> From: Scott Cain [mailto:cain.cshl at gmail.com]
>>> Sent: Thursday, September 18, 2008 4:54 PM
>>> To: Don Gilbert
>>> Cc: Bakir, Burcu; help at gmod.org; gmod-gbrowse at lists.sourceforge.net
>>> Subject: Re: [Gmod-gbrowse] multi-segmented feature looks fine at low
>>> power, but connecting lines between segments disappear when zoomed in
>>>
>>> Hi Burcu,
>>>
>>> I just ran the GenBank record for NW_047331 through the BioPerl
>>> bp_genbank2gff3.pl script and got perfectly acceptable GFF3 (I'll
>>> paste a few lines below); did you get your GFF3 from running a
> BioPerl
>>> script or somewhere else?
>>>
>>> Thanks,
>>> Scott
>>>
>>> Here's what the first several lines of the GFF3 looked like:
>>> ##gff-version 3
>>> # sequence-region NW_047331 1 1994762
>>> # conversion-by bp_genbank2gff3.pl
>>> # organism Rattus norvegicus
>>> # date 22-JUN-2006
>>> # Note Rattus norvegicus chromosome 10 genomic contig, reference
>>> assembly (based on RGSC v3.4).
>>> NW_047331 GenBank chromosome 1 1994762 . +
>>> . ID=NW_047331;Alias=10;Dbxref=taxon:10116;Note=Rattus
>>> norvegicus chromosome 10 genomic contig%2C reference assembly (based
>>> on RGSC
>>>
>>
> v3.4).;chromosome=10;comment1=Bio::Annotation::Comment%3DHASH(0x8a564e4)
>>> ;date=22-JUN-2006;mol_type=genomic
>>> DNA;organism=Rattus norvegicus;strain=BN/SsNHsdMCW
>>> NW_047331 GenBank gap 2014 3020 . + .
>>> ID=GenBank:gap:NW_047331:2014:3020;estimated_length=1007
>>> NW_047331 GenBank gap 5534 5583 . + .
>>> ID=GenBank:gap:NW_047331:5534:5583;estimated_length=50
>>> NW_047331 GenBank gene 49 13315 . + .
>>> ID=Bfar;Dbxref=GeneID:304709,RGD:1304791;Note=Derived by automated
>>> computational analysis using gene prediction method: BestRefseq.
>>> Supporting evidence includes similarity to: 1 mRNA;gene=Bfar
>>> NW_047331 GenBank mRNA 49 13315 . + .
>>>
>>>
>>
> ID=Bfar.t01;Parent=Bfar;Dbxref=GI:61557020,GeneID:304709,RGD:1304791;Not
>>> e=Derived
>>> by automated computational analysis using gene prediction method:
>>> BestRefseq. Supporting evidence includes similarity to: 1
>>> mRNA;exception=unclassified transcription
>>> discrepancy;gene=Bfar;product=bifunctional apoptosis
>>> regulator;transcript_id=NM_001013125.1
>>> NW_047331 GenBank CDS 49 216 . + .
>>>
>>>
>>
> ID=Bfar.p01;Parent=Bfar.t01;Dbxref=GI:61557021,GeneID:304709,RGD:1304791
>>> ;gO_component=integral
>>> to plasma membrane%3B membrane fraction%3B ubiquitin ligase
>>> complex;gO_function=structural molecule activity%3B ubiquitin-protein
>>> ligase activity%3B zinc ion binding;gO_process=anti-apoptosis%3B
>>> protein ubiquitination;codon_start=1;exception=unclassified
>>> translation discrepancy;gene=Bfar;product=bifunctional apoptosis
>>> regulator (predicted);protein_id=NP_001013143.1
>>>
>>>
>>> On Thu, Sep 18, 2008 at 5:37 PM, Scott Cain <cain.cshl at gmail.com>
>> wrote:
>>>> Hi Burcu,
>>>>
>>>> I finally got around to looking at the sample data and config file
>> you
>>>> sent me a few days ago. There were a few problems I had to fix:
>>>>
>>>> 1. GFF3 and the Bio::DB::GFF adaptor doesn't always get along, and I
>>>> think your GFF3 is one example of that happening. I switched to the
>>>> Bio::DB::SeqFeature::Store adaptor.
>>>>
>>>> 2. With the switch to SeqFeature::Store, you don't need aggregators
>>>> any more, so I switched the [EntrezGene] track to use gene:GenBank
>>>> features and the gene glyph.
>>>>
>>>> 3. I changed all occurrences of 'iD' to 'ID' (I'm off to BioPerl
> next
>>>> to see what caused this so I can make it stop).
>>>>
>>>> 4. I added a reference seqeunce line; you had a line like this:
>>>>
>>>> Chr10 GenBank region 1 1994762 . . . ID=NW_047331
>>>>
>>>> I changed it to this:
>>>>
>>>> Chr10 GenBank region 1 1994762 . . . ID=Chr10;Name=Chr10
>>>>
>>>> (of course, I suspect that rat chromosome 10 is bigger than that;
> the
>>>> alternative would be to change column 1 to NW_047331 throughout the
>>>> file.)
>>>>
>>>> Switching to SeqFeature::Store will require a few changes, but not
>> too
>>>> many; basically, the only things that would be affected are the
>> tracks
>>>> that currently use aggregators. Please let me know if you need any
>>>> help with the transition.
>>>>
>>>> Scott
>>>>
>>>>
>>>> On Mon, Sep 15, 2008 at 4:29 PM, Don Gilbert
>>>> <gilbertd at cricket.bio.indiana.edu> wrote:
>>>>>
>>>>>
>>>>> Burcu,
>>>>>
>>>>> It may be you have to work thru a few changes. The 'iD' problem
>>> likely was
>>>>> part of it, your aggregator also needs to be updated with
>> corrections
>>> for ID/Parent
>>>>> tags.
>>>>>
>>>>>>> EntrezGene{CDS,exon/mRNA}
>>>>>
>>>>> This one should work when CDS,exon have Parent=mRNA.ID and mRNA has
>>> ID=
>>>>> This is equivalent to the processed_transcript aggregator
>>>>> Bio/DB/GFF/Aggregator/processed_transcript.pm
>>>>>
>>>>> Aggregators are good when using Bio/DB/GFF databases; the
>>> Bio/DB/SeqFeature/Store
>>>>> databases do not use aggregators.
>>>>>
>>>>> PS, one of the tools, likely bp_genbank2gff3, created those funky
>>> 'iD' tags,
>>>>> for reasons of its own.
>>>>>
>>>>> - Don Gilbert
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>
>>
> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D. cain.cshl at gmail.com
>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>>
>>>
>>>
>>>
>>> --
>>>
>>
> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D. cain.cshl at gmail.com
>>> GMOD Coordinator (http://gmod.org/) 216-392-3087
>>> Cold Spring Harbor Laboratory
>>>
>>
>>
>>
>> --
>>
> ------------------------------------------------------------------------
>> Scott Cain, Ph. D. cain.cshl at gmail.com
>> GMOD Coordinator (http://gmod.org/) 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D. cain.cshl at gmail.com
> GMOD Coordinator (http://gmod.org/) 216-392-3087
> Cold Spring Harbor Laboratory
>
--
------------------------------------------------------------------------
Scott Cain, Ph. D. cain.cshl at gmail.com
GMOD Coordinator (http://gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
More information about the Gmod-help
mailing list