[Gmod-help] RE: "Table 'meta' doesn't exist" error from bp_seqfeature_gff3.PLS

Wed Sep 24 12:08:06 EDT 2008

Hi Burcu,

Using the processed_transcript glyph should work with the GFF3 you
created.  It works with mRNA and CDS features, and it looks like the
bioperl script to produce the GFF3 worked correctly.

Scott

On Wed, Sep 24, 2008 at 12:03 PM, Bakir, Burcu <BBakir at hmgc.mcw.edu> wrote:
> Hi Scott,
>
> For some tracks like QTLs, miRNAs, SNPs, etc. I have created the GFF3
> file and loaded into database via bp_load_gff.pl script. But some other
> features like Ensembl genes, Genscan predictions, ESTs, mRNAs were
> loaded into database before I start this project. So if I decide to use
> Bio::DB::SeqFeature::Store databases, I can do the transition for the
> tracks I have created but not so sure how further can I go with the
> other ones. I don't have their GFF3 files.
>
> Then may be continue using Bio::DB::GFF database and loading via
> load_gff.PL script is the best option for me. But if I do so, I want to
> make sure I won't have multi-segmented feature looks fine at low power,
> but connecting lines between segments disappear when zoomed in problem.
> Since I recreated GFF3 files with bioperl-live. Should I go with this
> direction, use which aggregator, glyph? Processed_transcript?
>
> What would you suggest?
>
> Thanks,
>
> Burcu
>
> -----Original Message-----
> From: Scott Cain [mailto:cain.cshl at gmail.com]
> Sent: Wednesday, September 24, 2008 10:53 AM
> To: Bakir, Burcu
> Cc: Don Gilbert; help at gmod.org; gmod-gbrowse at lists.sourceforge.net
> Subject: Re: "Table 'meta' doesn't exist" error from
> bp_seqfeature_gff3.PLS
>
> Hi Burcu,
>
> Bio::DB::GFF and Bio::DB::SeqFeature::Store databases don't mix.  I
> wasn't aware of existing data, so now we have to come up with a
> solution for what to do with your existing data: what is it?  Can it
> be converted to GFF3?  Loading the GFF3 you created into a
> Bio::DB::GFF database is possible but often problematic.
>
> Scott
>
>
> On Wed, Sep 24, 2008 at 11:45 AM, Bakir, Burcu <BBakir at hmgc.mcw.edu>
> wrote:
>>
>> Hi Scott,
>>
>> I successfully downloaded bioperl-live from SVN and made the necessary
>> settings. Then I ran genbank2gff3.PLS under
>> bioperl-live/scripts/Bio-DB-GFF as:
>> ./genbank2gff3.PLS -o /rgd_home/3.0/TOOLS/Gbrowse/test/chromosome10/
>> rn_ref_chr10.gbk.gz
>>
>> This time I got a gff3 file more similar to what you had. I'll be
>> pasting few lines of it at the end of this email. Then I tried to load
>> it to the database using bp_seqfeature_gff3.PLS script under
>> bioperl-live/scripts/Bio-SeqFeature-Store as:
>>
>> -bash-3.00$ ./bp_seqfeature_gff3.PLS -dsn
>> "dbi:mysql:database=rgd_904_e;host=forte.hmgc.mcw.edu" -user ZZZ -pass
>> MMM
>>
> ../../../../3.0/TOOLS/Gbrowse/test/chromosome10/rn_ref_chr10.gbk.gz.gff3
>>
>> Here is the error I get:
>>
>> DBD::mysql::st execute failed: Table 'rgd_904_e.meta' doesn't exist at
>>
> /rgd_home/bioperl-live/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm
>> line 1217.
>> -------------------- EXCEPTION --------------------
>> MSG: Table 'rgd_904_e.meta' doesn't exist
>> STACK Bio::DB::SeqFeature::Store::DBI::mysql::setting
>>
> /rgd_home/bioperl-live/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm
>> :1217
>> STACK Bio::DB::SeqFeature::Store::serializer
>> /rgd_home/bioperl-live/bioperl-live/Bio/DB/SeqFeature/Store.pm:1507
>> STACK Bio::DB::SeqFeature::Store::default_settings
>> /rgd_home/bioperl-live/bioperl-live/Bio/DB/SeqFeature/Store.pm:2066
>> STACK Bio::DB::SeqFeature::Store::DBI::mysql::default_settings
>>
> /rgd_home/bioperl-live/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm
>> :327
>> STACK Bio::DB::SeqFeature::Store::DBI::mysql::init
>>
> /rgd_home/bioperl-live/bioperl-live/Bio/DB/SeqFeature/Store/DBI/mysql.pm
>> :219
>> STACK Bio::DB::SeqFeature::Store::new
>> /rgd_home/bioperl-live/bioperl-live/Bio/DB/SeqFeature/Store.pm:358
>> STACK toplevel ./bp_seqfeature_gff3.PLS:45
>>
>> It complains about non-existing 'rgd_904_e.meta table. My database has
>> fattribute, fattribute_to_feature, fdata, fdna, fgroup, ftype, fmeta
>> tables.
>>
>> I also have another question: Does running bp_seqfeature_gff3.PLS
> script
>> on a database that is already filled with bp_load_gff.pl script cause
>> any troubles? Because the data for my previous tracks have been loaded
>> via bp_load_gff.pl script.
>>
>> Thanks,
>>
>> Burcu
>>
>> ##gff-version 3
>> # sequence-region NW_047337 1 1380475
>> # conversion-by bp_genbank2gff3.pl
>> # organism Rattus norvegicus
>> # date 22-JUN-2006
>> # Note Rattus norvegicus chromosome 10 genomic contig, reference
>> assembly (based on RGSC v3.4).
>> Chr10   GenBank chromosome      83193107        84573581        .
>> +       .       ID=NW_047337;Alias=10;Dbxref=taxon:10116;Note=Rattus n
>> orvegicus chromosome 10 genomic contig%2C reference assembly (based on
>> RGSC
> v3.4).;chromosome=10;comment1=Bio::Annotation::Comment%3DHASH(0x94
>> e94c);date=22-JUN-2006;mol_type=genomic DNA;organism=Rattus
>> norvegicus;strain=BN/SsNHsdMCW
>> Chr10   GenBank STS     83194370        83194502        .       +
>> .       ID=GenBank:STS:NW_047337:1264:1396;Dbxref=UniSTS:250658;standa
>> rd_name=AI010027
>> Chr10   GenBank gene    83194326        83238284        .       -
>> .       ID=LOC619561;Dbxref=GeneID:619561,RGD:1562656;Note=Derived by
>> automated computational analysis using gene prediction method:
>> BestRefseq. Supporting evidence includes similarity to: 1
>> mRNA;gene=LOC619561
>> Chr10   GenBank mRNA    83194326        83238284        .       -
>> .       ID=LOC619561.t01;Parent=LOC619561;Dbxref=GI:77993367,GeneID:61
>> 9561,RGD:1562656;Note=Derived by automated computational analysis
> using
>> gene prediction method: BestRefseq. Supporting evidence includes simil
>> arity to: 1 mRNA;exception=unclassified transcription
>> discrepancy;gene=LOC619561;product=hypothetical protein
>> LOC619561;transcript_id=NM_00103
>> 4951.1
>> Chr10   GenBank CDS     83195432        83195482        .       -
>> .       ID=LOC619561.p01;Parent=LOC619561.t01;Dbxref=GI:77993368,GeneI
>> D:619561,RGD:1562656;codon_start=1;gene=LOC619561;product=hypothetical
>> protein LOC619561;protein_id=NP_001030123.1
>>
>>
>> -----Original Message-----
>> From: Scott Cain [mailto:cain.cshl at gmail.com]
>> Sent: Monday, September 22, 2008 12:14 PM
>> To: Bakir, Burcu
>> Subject: Re: [Gmod-gbrowse] multi-segmented feature looks fine at low
>> power, but connecting lines between segments disappear when zoomed in
>>
>> Hi Burcu,
>>
>> Certainly you can install bioperl-live in a local folder; I'm
>> reasonably sure that the bioperl website has directions for do that.
>> The 1.69 release of GBrowse requires it though, so if you are planning
>> on installing the new GBrowse, you'll want to address that as well (it
>> is possible to have a bioperl in place just for GBrowse to use, but it
>> requires a little forethougth and planning).
>>
>> Scott
>>
>>
>> On Mon, Sep 22, 2008 at 1:11 PM, Bakir, Burcu <BBakir at hmgc.mcw.edu>
>> wrote:
>>> Hi Scott,
>>>
>>> Or how about installing bioperl-live to a local folder (not as root)
>> and
>>> make sure it gets first in @INC variable. Hence I can keep other
>>> previously installed BioPerls. I don't know what other programs using
>>> the other previously installed BioPerls. I don't want to get rid of
>> them
>>> suddenly.
>>>
>>> Thanks,
>>>
>>> Burcu
>>>
>>> -----Original Message-----
>>> From: Scott Cain [mailto:cain.cshl at gmail.com]
>>> Sent: Monday, September 22, 2008 11:19 AM
>>> To: Bakir, Burcu
>>> Cc: Don Gilbert; help at gmod.org; gmod-gbrowse at lists.sourceforge.net
>>> Subject: Re: [Gmod-gbrowse] multi-segmented feature looks fine at low
>>> power, but connecting lines between segments disappear when zoomed in
>>>
>>> Hi Burcu,
>>>
>>> What version of BioPerl are you using?  If you think you are using
>>> bioperl-live, is it possible that there is more than one BioPerl
>>> installed?
>>>
>>> Scott
>>>
>>> On Mon, Sep 22, 2008 at 12:08 PM, Bakir, Burcu <BBakir at hmgc.mcw.edu>
>>> wrote:
>>>> Hi Scott,
>>>>
>>>> Thanks for your explanations. I think still something wrong with my
>>> gff3
>>>> file. I ran the GenBank record for rn_ref_chr1.gbk file through the
>>>> BioPerl bp_genbank2gff3.pl script as following:
>>>>
>>>> bp_genbank2gff3.pl -y rn_ref_chr1.gbk
>>>>
>>>> where --split   -y  option is documented to split output to seperate
>>> GFF
>>>> and fasta files for each genbank record
>>>>
>>>> My gff3 file differs than what you pasted here in the following
>> terms:
>>>> Your first uncommented line is for chromosome, mine is region and
>> then
>>>> next is contig. Your features have "ID", whereas mine have "iD".
> Your
>>>> CDS also has "ID", whereas mine has no ID or iD but has Parent. I
>>> don't
>>>> know why those are different than your gff3 output. Did you run the
>>>> script just for NW_047331? Is this the difference that I'm running
>> the
>>>> script for the whole chromosome(rn_ref_chr1.gbk)?
>>>>
>>>> Using rn_ref_chr1.gbk should be fine. The bp_genbank2gff3.pl
>>>> documentation states as following:
>>>> The input files are assumed to be gzipped GenBank flatfiles for
>> refseq
>>>> contigs.  The files may contain multiple GenBank records.  Either a
>>>> single file or an entire directory can be processed.  By default,
> the
>>>> DNA sequence is embedded in the GFF but it can be saved into
> seperate
>>>> fasta file with the --split(-y) option.
>>>>
>>>> Thanks,
>>>>
>>>> Burcu
>>>>
>>>> I'm pasting here first few lines of my NW_047331.gff3 file.
>>>>
>>>> ##gff-version 3
>>>> ##sequence-region NW_047331 1 1994762
>>>> ##source bp_genbank2gff3.pl
>>>> NW_047331       GenBank region  1       1994762 .       .       .
>>>> ID=NW_047331
>>>> NW_047331       GenBank contig  1       1994762 .       +       .
>>>> iD=GenBank:contig:NW_047331:1:1994762;mol_type=genomic
>>>>
>>>
>>
> DNA;db_xref=taxon:10116;strain=BN/SsNHsdMCW;chromosome=10;organism=Rattu
>>>> s norvegicus
>>>> NW_047331       GenBank gap     2014    3020    .       +       .
>>>> iD=GenBank:gap:NW_047331:2014:3020;estimated_length=1007
>>>> NW_047331       GenBank gene    49      13315   .       +       .
>>>> iD=Bfar;db_xref=GeneID:304709,RGD:1304791;gene=Bfar;note=Derived by
>>>> automated computational analysis using gene prediction method:
>>>> BestRefseq. Supporting evidence includes similarity to: 1 mRNA
>>>> NW_047331       GenBank mRNA    49      13315   .       +       .
>>>> iD=Bfar.t01;Parent=Bfar;gene=Bfar;note=Derived by automated
>>>> computational analysis using gene prediction method: BestRefseq.
>>>> Supporting evidence includes similarity to: 1
>>>>
>>>
>>
> mRNA;db_xref=GI:61557020,GeneID:304709,RGD:1304791;exception=unclassifie
>>>> d transcription discrepancy;product=bifunctional apoptosis
>>>> regulator;transcript_id=NM_001013125.1
>>>> NW_047331       GenBank CDS     49      216     .       +       .
>>>> Parent=Bfar.t01;go_function=zinc ion binding [goid 0008270]
> [evidence
>>>> IEA],structural molecule activity [goid 0005198] [evidence
>>>> IEA],ubiquitin-protein ligase activity [goid 0004842] [evidence
>>>> IEA];protein_id=NP_001013143.1;gene=Bfar;go_process=anti-apoptosis
>>> [goid
>>>> 0006916] [evidence IEA],protein ubiquitination [goid 0016567]
>>> [evidence
>>>>
>>>
>>
> IEA];db_xref=GI:61557021,GeneID:304709,RGD:1304791;go_component=membrane
>>>> fraction [goid 0005624] [evidence IEA],ubiquitin ligase complex
> [goid
>>>> 0000151] [evidence IEA],integral to plasma membrane [goid 0005887]
>>>> [evidence IEA];codon_start=1;exception=unclassified translation
>>>> discrepancy;product=bifunctional apoptosis regulator (predicted)
>>>> NW_047331       GenBank exon    49      216     .       +       .
>>>> Parent=Bfar.t01;gene=Bfar
>>>>
>>>>
>>>> -----Original Message-----
>>>> From: Scott Cain [mailto:cain.cshl at gmail.com]
>>>> Sent: Thursday, September 18, 2008 4:54 PM
>>>> To: Don Gilbert
>>>> Cc: Bakir, Burcu; help at gmod.org; gmod-gbrowse at lists.sourceforge.net
>>>> Subject: Re: [Gmod-gbrowse] multi-segmented feature looks fine at
> low
>>>> power, but connecting lines between segments disappear when zoomed
> in
>>>>
>>>> Hi Burcu,
>>>>
>>>> I just ran the GenBank record for NW_047331 through the BioPerl
>>>> bp_genbank2gff3.pl script and got perfectly acceptable GFF3 (I'll
>>>> paste a few lines below); did you get your GFF3 from running a
>> BioPerl
>>>> script or somewhere else?
>>>>
>>>> Thanks,
>>>> Scott
>>>>
>>>> Here's what the first several lines of the GFF3 looked like:
>>>> ##gff-version 3
>>>> # sequence-region NW_047331 1 1994762
>>>> # conversion-by bp_genbank2gff3.pl
>>>> # organism Rattus norvegicus
>>>> # date 22-JUN-2006
>>>> # Note Rattus norvegicus chromosome 10 genomic contig, reference
>>>> assembly (based on RGSC v3.4).
>>>> NW_047331       GenBank chromosome      1       1994762 .       +
>>>>  .       ID=NW_047331;Alias=10;Dbxref=taxon:10116;Note=Rattus
>>>> norvegicus chromosome 10 genomic contig%2C reference assembly (based
>>>> on RGSC
>>>>
>>>
>>
> v3.4).;chromosome=10;comment1=Bio::Annotation::Comment%3DHASH(0x8a564e4)
>>>> ;date=22-JUN-2006;mol_type=genomic
>>>> DNA;organism=Rattus norvegicus;strain=BN/SsNHsdMCW
>>>> NW_047331       GenBank gap     2014    3020    .       +       .
>>>>  ID=GenBank:gap:NW_047331:2014:3020;estimated_length=1007
>>>> NW_047331       GenBank gap     5534    5583    .       +       .
>>>>  ID=GenBank:gap:NW_047331:5534:5583;estimated_length=50
>>>> NW_047331       GenBank gene    49      13315   .       +       .
>>>>  ID=Bfar;Dbxref=GeneID:304709,RGD:1304791;Note=Derived by automated
>>>> computational analysis using gene prediction method: BestRefseq.
>>>> Supporting evidence includes similarity to: 1 mRNA;gene=Bfar
>>>> NW_047331       GenBank mRNA    49      13315   .       +       .
>>>>
>>>>
>>>
>>
> ID=Bfar.t01;Parent=Bfar;Dbxref=GI:61557020,GeneID:304709,RGD:1304791;Not
>>>> e=Derived
>>>> by automated computational analysis using gene prediction method:
>>>> BestRefseq. Supporting evidence includes similarity to: 1
>>>> mRNA;exception=unclassified transcription
>>>> discrepancy;gene=Bfar;product=bifunctional apoptosis
>>>> regulator;transcript_id=NM_001013125.1
>>>> NW_047331       GenBank CDS     49      216     .       +       .
>>>>
>>>>
>>>
>>
> ID=Bfar.p01;Parent=Bfar.t01;Dbxref=GI:61557021,GeneID:304709,RGD:1304791
>>>> ;gO_component=integral
>>>> to plasma membrane%3B membrane fraction%3B ubiquitin ligase
>>>> complex;gO_function=structural molecule activity%3B
> ubiquitin-protein
>>>> ligase activity%3B zinc ion binding;gO_process=anti-apoptosis%3B
>>>> protein ubiquitination;codon_start=1;exception=unclassified
>>>> translation discrepancy;gene=Bfar;product=bifunctional apoptosis
>>>> regulator (predicted);protein_id=NP_001013143.1
>>>>
>>>>
>>>> On Thu, Sep 18, 2008 at 5:37 PM, Scott Cain <cain.cshl at gmail.com>
>>> wrote:
>>>>> Hi Burcu,
>>>>>
>>>>> I finally got around to looking at the sample data and config file
>>> you
>>>>> sent me a few days ago.  There were a few problems I had to fix:
>>>>>
>>>>> 1. GFF3 and the Bio::DB::GFF adaptor doesn't always get along, and
> I
>>>>> think your GFF3 is one example of that happening.  I switched to
> the
>>>>> Bio::DB::SeqFeature::Store adaptor.
>>>>>
>>>>> 2. With the switch to SeqFeature::Store, you don't need aggregators
>>>>> any more, so I switched the [EntrezGene] track to use gene:GenBank
>>>>> features and the gene glyph.
>>>>>
>>>>> 3. I changed all occurrences of 'iD' to 'ID' (I'm off to BioPerl
>> next
>>>>> to see what caused this so I can make it stop).
>>>>>
>>>>> 4. I added a reference seqeunce line; you had a line like this:
>>>>>
>>>>> Chr10  GenBank region  1   1994762 .    .    .   ID=NW_047331
>>>>>
>>>>> I changed it to this:
>>>>>
>>>>> Chr10  GenBank region  1   1994762 .    .    .
> ID=Chr10;Name=Chr10
>>>>>
>>>>> (of course, I suspect that rat chromosome 10 is bigger than that;
>> the
>>>>> alternative would be to change column 1 to NW_047331 throughout the
>>>>> file.)
>>>>>
>>>>> Switching to SeqFeature::Store will require a few changes, but not
>>> too
>>>>> many; basically, the only things that would be affected are the
>>> tracks
>>>>> that currently use aggregators.  Please let me know if you need any
>>>>> help with the transition.
>>>>>
>>>>> Scott
>>>>>
>>>>>
>>>>> On Mon, Sep 15, 2008 at 4:29 PM, Don Gilbert
>>>>> <gilbertd at cricket.bio.indiana.edu> wrote:
>>>>>>
>>>>>>
>>>>>> Burcu,
>>>>>>
>>>>>> It may be you have to work thru a few changes.  The 'iD' problem
>>>> likely was
>>>>>> part of it, your aggregator also needs to be updated with
>>> corrections
>>>> for ID/Parent
>>>>>> tags.
>>>>>>
>>>>>>>> EntrezGene{CDS,exon/mRNA}
>>>>>>
>>>>>> This one should work when CDS,exon have Parent=mRNA.ID and mRNA
> has
>>>> ID=
>>>>>> This is equivalent to the processed_transcript aggregator
>>>>>> Bio/DB/GFF/Aggregator/processed_transcript.pm
>>>>>>
>>>>>> Aggregators are good when using Bio/DB/GFF databases; the
>>>> Bio/DB/SeqFeature/Store
>>>>>> databases do not use aggregators.
>>>>>>
>>>>>> PS, one of the tools, likely bp_genbank2gff3, created those funky
>>>> 'iD' tags,
>>>>>> for reasons of its own.
>>>>>>
>>>>>> - Don Gilbert
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>
>>>
>>
> ------------------------------------------------------------------------
>>>>> Scott Cain, Ph. D. cain.cshl at gmail.com
>>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087
>>>>> Cold Spring Harbor Laboratory
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>>
>>>
>>
> ------------------------------------------------------------------------
>>>> Scott Cain, Ph. D. cain.cshl at gmail.com
>>>> GMOD Coordinator (http://gmod.org/) 216-392-3087
>>>> Cold Spring Harbor Laboratory
>>>>
>>>
>>>
>>>
>>> --
>>>
>>
> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D. cain.cshl at gmail.com
>>> GMOD Coordinator (http://gmod.org/) 216-392-3087
>>> Cold Spring Harbor Laboratory
>>>
>>
>>
>>
>> --
>>
> ------------------------------------------------------------------------
>> Scott Cain, Ph. D. cain.cshl at gmail.com
>> GMOD Coordinator (http://gmod.org/) 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D. cain.cshl at gmail.com
> GMOD Coordinator (http://gmod.org/) 216-392-3087
> Cold Spring Harbor Laboratory
>
>

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D. cain.cshl at gmail.com
GMOD Coordinator (http://gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory