[Gmod-help] Re: [Gmod-gbrowse] multi-segmented feature looks fine at low power, but connecting lines between segments disappear when zoomed in

Scott Cain cain.cshl at gmail.com
Mon Sep 22 12:18:34 EDT 2008


Hi Burcu,

What version of BioPerl are you using?  If you think you are using
bioperl-live, is it possible that there is more than one BioPerl
installed?

Scott

On Mon, Sep 22, 2008 at 12:08 PM, Bakir, Burcu <BBakir at hmgc.mcw.edu> wrote:
> Hi Scott,
>
> Thanks for your explanations. I think still something wrong with my gff3
> file. I ran the GenBank record for rn_ref_chr1.gbk file through the
> BioPerl bp_genbank2gff3.pl script as following:
>
> bp_genbank2gff3.pl -y rn_ref_chr1.gbk
>
> where --split   -y  option is documented to split output to seperate GFF
> and fasta files for each genbank record
>
> My gff3 file differs than what you pasted here in the following terms:
> Your first uncommented line is for chromosome, mine is region and then
> next is contig. Your features have "ID", whereas mine have "iD". Your
> CDS also has "ID", whereas mine has no ID or iD but has Parent. I don't
> know why those are different than your gff3 output. Did you run the
> script just for NW_047331? Is this the difference that I'm running the
> script for the whole chromosome(rn_ref_chr1.gbk)?
>
> Using rn_ref_chr1.gbk should be fine. The bp_genbank2gff3.pl
> documentation states as following:
> The input files are assumed to be gzipped GenBank flatfiles for refseq
> contigs.  The files may contain multiple GenBank records.  Either a
> single file or an entire directory can be processed.  By default, the
> DNA sequence is embedded in the GFF but it can be saved into seperate
> fasta file with the --split(-y) option.
>
> Thanks,
>
> Burcu
>
> I'm pasting here first few lines of my NW_047331.gff3 file.
>
> ##gff-version 3
> ##sequence-region NW_047331 1 1994762
> ##source bp_genbank2gff3.pl
> NW_047331       GenBank region  1       1994762 .       .       .
> ID=NW_047331
> NW_047331       GenBank contig  1       1994762 .       +       .
> iD=GenBank:contig:NW_047331:1:1994762;mol_type=genomic
> DNA;db_xref=taxon:10116;strain=BN/SsNHsdMCW;chromosome=10;organism=Rattu
> s norvegicus
> NW_047331       GenBank gap     2014    3020    .       +       .
> iD=GenBank:gap:NW_047331:2014:3020;estimated_length=1007
> NW_047331       GenBank gene    49      13315   .       +       .
> iD=Bfar;db_xref=GeneID:304709,RGD:1304791;gene=Bfar;note=Derived by
> automated computational analysis using gene prediction method:
> BestRefseq. Supporting evidence includes similarity to: 1 mRNA
> NW_047331       GenBank mRNA    49      13315   .       +       .
> iD=Bfar.t01;Parent=Bfar;gene=Bfar;note=Derived by automated
> computational analysis using gene prediction method: BestRefseq.
> Supporting evidence includes similarity to: 1
> mRNA;db_xref=GI:61557020,GeneID:304709,RGD:1304791;exception=unclassifie
> d transcription discrepancy;product=bifunctional apoptosis
> regulator;transcript_id=NM_001013125.1
> NW_047331       GenBank CDS     49      216     .       +       .
> Parent=Bfar.t01;go_function=zinc ion binding [goid 0008270] [evidence
> IEA],structural molecule activity [goid 0005198] [evidence
> IEA],ubiquitin-protein ligase activity [goid 0004842] [evidence
> IEA];protein_id=NP_001013143.1;gene=Bfar;go_process=anti-apoptosis [goid
> 0006916] [evidence IEA],protein ubiquitination [goid 0016567] [evidence
> IEA];db_xref=GI:61557021,GeneID:304709,RGD:1304791;go_component=membrane
> fraction [goid 0005624] [evidence IEA],ubiquitin ligase complex [goid
> 0000151] [evidence IEA],integral to plasma membrane [goid 0005887]
> [evidence IEA];codon_start=1;exception=unclassified translation
> discrepancy;product=bifunctional apoptosis regulator (predicted)
> NW_047331       GenBank exon    49      216     .       +       .
> Parent=Bfar.t01;gene=Bfar
>
>
> -----Original Message-----
> From: Scott Cain [mailto:cain.cshl at gmail.com]
> Sent: Thursday, September 18, 2008 4:54 PM
> To: Don Gilbert
> Cc: Bakir, Burcu; help at gmod.org; gmod-gbrowse at lists.sourceforge.net
> Subject: Re: [Gmod-gbrowse] multi-segmented feature looks fine at low
> power, but connecting lines between segments disappear when zoomed in
>
> Hi Burcu,
>
> I just ran the GenBank record for NW_047331 through the BioPerl
> bp_genbank2gff3.pl script and got perfectly acceptable GFF3 (I'll
> paste a few lines below); did you get your GFF3 from running a BioPerl
> script or somewhere else?
>
> Thanks,
> Scott
>
> Here's what the first several lines of the GFF3 looked like:
> ##gff-version 3
> # sequence-region NW_047331 1 1994762
> # conversion-by bp_genbank2gff3.pl
> # organism Rattus norvegicus
> # date 22-JUN-2006
> # Note Rattus norvegicus chromosome 10 genomic contig, reference
> assembly (based on RGSC v3.4).
> NW_047331       GenBank chromosome      1       1994762 .       +
>  .       ID=NW_047331;Alias=10;Dbxref=taxon:10116;Note=Rattus
> norvegicus chromosome 10 genomic contig%2C reference assembly (based
> on RGSC
> v3.4).;chromosome=10;comment1=Bio::Annotation::Comment%3DHASH(0x8a564e4)
> ;date=22-JUN-2006;mol_type=genomic
> DNA;organism=Rattus norvegicus;strain=BN/SsNHsdMCW
> NW_047331       GenBank gap     2014    3020    .       +       .
>  ID=GenBank:gap:NW_047331:2014:3020;estimated_length=1007
> NW_047331       GenBank gap     5534    5583    .       +       .
>  ID=GenBank:gap:NW_047331:5534:5583;estimated_length=50
> NW_047331       GenBank gene    49      13315   .       +       .
>  ID=Bfar;Dbxref=GeneID:304709,RGD:1304791;Note=Derived by automated
> computational analysis using gene prediction method: BestRefseq.
> Supporting evidence includes similarity to: 1 mRNA;gene=Bfar
> NW_047331       GenBank mRNA    49      13315   .       +       .
>
> ID=Bfar.t01;Parent=Bfar;Dbxref=GI:61557020,GeneID:304709,RGD:1304791;Not
> e=Derived
> by automated computational analysis using gene prediction method:
> BestRefseq. Supporting evidence includes similarity to: 1
> mRNA;exception=unclassified transcription
> discrepancy;gene=Bfar;product=bifunctional apoptosis
> regulator;transcript_id=NM_001013125.1
> NW_047331       GenBank CDS     49      216     .       +       .
>
> ID=Bfar.p01;Parent=Bfar.t01;Dbxref=GI:61557021,GeneID:304709,RGD:1304791
> ;gO_component=integral
> to plasma membrane%3B membrane fraction%3B ubiquitin ligase
> complex;gO_function=structural molecule activity%3B ubiquitin-protein
> ligase activity%3B zinc ion binding;gO_process=anti-apoptosis%3B
> protein ubiquitination;codon_start=1;exception=unclassified
> translation discrepancy;gene=Bfar;product=bifunctional apoptosis
> regulator (predicted);protein_id=NP_001013143.1
>
>
> On Thu, Sep 18, 2008 at 5:37 PM, Scott Cain <cain.cshl at gmail.com> wrote:
>> Hi Burcu,
>>
>> I finally got around to looking at the sample data and config file you
>> sent me a few days ago.  There were a few problems I had to fix:
>>
>> 1. GFF3 and the Bio::DB::GFF adaptor doesn't always get along, and I
>> think your GFF3 is one example of that happening.  I switched to the
>> Bio::DB::SeqFeature::Store adaptor.
>>
>> 2. With the switch to SeqFeature::Store, you don't need aggregators
>> any more, so I switched the [EntrezGene] track to use gene:GenBank
>> features and the gene glyph.
>>
>> 3. I changed all occurrences of 'iD' to 'ID' (I'm off to BioPerl next
>> to see what caused this so I can make it stop).
>>
>> 4. I added a reference seqeunce line; you had a line like this:
>>
>> Chr10  GenBank region  1   1994762 .    .    .   ID=NW_047331
>>
>> I changed it to this:
>>
>> Chr10  GenBank region  1   1994762 .    .    .   ID=Chr10;Name=Chr10
>>
>> (of course, I suspect that rat chromosome 10 is bigger than that; the
>> alternative would be to change column 1 to NW_047331 throughout the
>> file.)
>>
>> Switching to SeqFeature::Store will require a few changes, but not too
>> many; basically, the only things that would be affected are the tracks
>> that currently use aggregators.  Please let me know if you need any
>> help with the transition.
>>
>> Scott
>>
>>
>> On Mon, Sep 15, 2008 at 4:29 PM, Don Gilbert
>> <gilbertd at cricket.bio.indiana.edu> wrote:
>>>
>>>
>>> Burcu,
>>>
>>> It may be you have to work thru a few changes.  The 'iD' problem
> likely was
>>> part of it, your aggregator also needs to be updated with corrections
> for ID/Parent
>>> tags.
>>>
>>>>> EntrezGene{CDS,exon/mRNA}
>>>
>>> This one should work when CDS,exon have Parent=mRNA.ID and mRNA has
> ID=
>>> This is equivalent to the processed_transcript aggregator
>>> Bio/DB/GFF/Aggregator/processed_transcript.pm
>>>
>>> Aggregators are good when using Bio/DB/GFF databases; the
> Bio/DB/SeqFeature/Store
>>> databases do not use aggregators.
>>>
>>> PS, one of the tools, likely bp_genbank2gff3, created those funky
> 'iD' tags,
>>> for reasons of its own.
>>>
>>> - Don Gilbert
>>>
>>
>>
>>
>> --
>>
> ------------------------------------------------------------------------
>> Scott Cain, Ph. D. cain.cshl at gmail.com
>> GMOD Coordinator (http://gmod.org/) 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D. cain.cshl at gmail.com
> GMOD Coordinator (http://gmod.org/) 216-392-3087
> Cold Spring Harbor Laboratory
>



-- 
------------------------------------------------------------------------
Scott Cain, Ph. D. cain.cshl at gmail.com
GMOD Coordinator (http://gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory



More information about the Gmod-help mailing list