[Gmod-help] RE: [Gmod-gbrowse] multi-segmented feature looks fine at low power, but connecting lines between segments disappear when zoomed in
Bakir, Burcu
BBakir at hmgc.mcw.edu
Mon Sep 22 12:08:21 EDT 2008
Hi Scott,
Thanks for your explanations. I think still something wrong with my gff3
file. I ran the GenBank record for rn_ref_chr1.gbk file through the
BioPerl bp_genbank2gff3.pl script as following:
bp_genbank2gff3.pl -y rn_ref_chr1.gbk
where --split -y option is documented to split output to seperate GFF
and fasta files for each genbank record
My gff3 file differs than what you pasted here in the following terms:
Your first uncommented line is for chromosome, mine is region and then
next is contig. Your features have "ID", whereas mine have "iD". Your
CDS also has "ID", whereas mine has no ID or iD but has Parent. I don't
know why those are different than your gff3 output. Did you run the
script just for NW_047331? Is this the difference that I'm running the
script for the whole chromosome(rn_ref_chr1.gbk)?
Using rn_ref_chr1.gbk should be fine. The bp_genbank2gff3.pl
documentation states as following:
The input files are assumed to be gzipped GenBank flatfiles for refseq
contigs. The files may contain multiple GenBank records. Either a
single file or an entire directory can be processed. By default, the
DNA sequence is embedded in the GFF but it can be saved into seperate
fasta file with the --split(-y) option.
Thanks,
Burcu
I'm pasting here first few lines of my NW_047331.gff3 file.
##gff-version 3
##sequence-region NW_047331 1 1994762
##source bp_genbank2gff3.pl
NW_047331 GenBank region 1 1994762 . . .
ID=NW_047331
NW_047331 GenBank contig 1 1994762 . + .
iD=GenBank:contig:NW_047331:1:1994762;mol_type=genomic
DNA;db_xref=taxon:10116;strain=BN/SsNHsdMCW;chromosome=10;organism=Rattu
s norvegicus
NW_047331 GenBank gap 2014 3020 . + .
iD=GenBank:gap:NW_047331:2014:3020;estimated_length=1007
NW_047331 GenBank gene 49 13315 . + .
iD=Bfar;db_xref=GeneID:304709,RGD:1304791;gene=Bfar;note=Derived by
automated computational analysis using gene prediction method:
BestRefseq. Supporting evidence includes similarity to: 1 mRNA
NW_047331 GenBank mRNA 49 13315 . + .
iD=Bfar.t01;Parent=Bfar;gene=Bfar;note=Derived by automated
computational analysis using gene prediction method: BestRefseq.
Supporting evidence includes similarity to: 1
mRNA;db_xref=GI:61557020,GeneID:304709,RGD:1304791;exception=unclassifie
d transcription discrepancy;product=bifunctional apoptosis
regulator;transcript_id=NM_001013125.1
NW_047331 GenBank CDS 49 216 . + .
Parent=Bfar.t01;go_function=zinc ion binding [goid 0008270] [evidence
IEA],structural molecule activity [goid 0005198] [evidence
IEA],ubiquitin-protein ligase activity [goid 0004842] [evidence
IEA];protein_id=NP_001013143.1;gene=Bfar;go_process=anti-apoptosis [goid
0006916] [evidence IEA],protein ubiquitination [goid 0016567] [evidence
IEA];db_xref=GI:61557021,GeneID:304709,RGD:1304791;go_component=membrane
fraction [goid 0005624] [evidence IEA],ubiquitin ligase complex [goid
0000151] [evidence IEA],integral to plasma membrane [goid 0005887]
[evidence IEA];codon_start=1;exception=unclassified translation
discrepancy;product=bifunctional apoptosis regulator (predicted)
NW_047331 GenBank exon 49 216 . + .
Parent=Bfar.t01;gene=Bfar
-----Original Message-----
From: Scott Cain [mailto:cain.cshl at gmail.com]
Sent: Thursday, September 18, 2008 4:54 PM
To: Don Gilbert
Cc: Bakir, Burcu; help at gmod.org; gmod-gbrowse at lists.sourceforge.net
Subject: Re: [Gmod-gbrowse] multi-segmented feature looks fine at low
power, but connecting lines between segments disappear when zoomed in
Hi Burcu,
I just ran the GenBank record for NW_047331 through the BioPerl
bp_genbank2gff3.pl script and got perfectly acceptable GFF3 (I'll
paste a few lines below); did you get your GFF3 from running a BioPerl
script or somewhere else?
Thanks,
Scott
Here's what the first several lines of the GFF3 looked like:
##gff-version 3
# sequence-region NW_047331 1 1994762
# conversion-by bp_genbank2gff3.pl
# organism Rattus norvegicus
# date 22-JUN-2006
# Note Rattus norvegicus chromosome 10 genomic contig, reference
assembly (based on RGSC v3.4).
NW_047331 GenBank chromosome 1 1994762 . +
. ID=NW_047331;Alias=10;Dbxref=taxon:10116;Note=Rattus
norvegicus chromosome 10 genomic contig%2C reference assembly (based
on RGSC
v3.4).;chromosome=10;comment1=Bio::Annotation::Comment%3DHASH(0x8a564e4)
;date=22-JUN-2006;mol_type=genomic
DNA;organism=Rattus norvegicus;strain=BN/SsNHsdMCW
NW_047331 GenBank gap 2014 3020 . + .
ID=GenBank:gap:NW_047331:2014:3020;estimated_length=1007
NW_047331 GenBank gap 5534 5583 . + .
ID=GenBank:gap:NW_047331:5534:5583;estimated_length=50
NW_047331 GenBank gene 49 13315 . + .
ID=Bfar;Dbxref=GeneID:304709,RGD:1304791;Note=Derived by automated
computational analysis using gene prediction method: BestRefseq.
Supporting evidence includes similarity to: 1 mRNA;gene=Bfar
NW_047331 GenBank mRNA 49 13315 . + .
ID=Bfar.t01;Parent=Bfar;Dbxref=GI:61557020,GeneID:304709,RGD:1304791;Not
e=Derived
by automated computational analysis using gene prediction method:
BestRefseq. Supporting evidence includes similarity to: 1
mRNA;exception=unclassified transcription
discrepancy;gene=Bfar;product=bifunctional apoptosis
regulator;transcript_id=NM_001013125.1
NW_047331 GenBank CDS 49 216 . + .
ID=Bfar.p01;Parent=Bfar.t01;Dbxref=GI:61557021,GeneID:304709,RGD:1304791
;gO_component=integral
to plasma membrane%3B membrane fraction%3B ubiquitin ligase
complex;gO_function=structural molecule activity%3B ubiquitin-protein
ligase activity%3B zinc ion binding;gO_process=anti-apoptosis%3B
protein ubiquitination;codon_start=1;exception=unclassified
translation discrepancy;gene=Bfar;product=bifunctional apoptosis
regulator (predicted);protein_id=NP_001013143.1
On Thu, Sep 18, 2008 at 5:37 PM, Scott Cain <cain.cshl at gmail.com> wrote:
> Hi Burcu,
>
> I finally got around to looking at the sample data and config file you
> sent me a few days ago. There were a few problems I had to fix:
>
> 1. GFF3 and the Bio::DB::GFF adaptor doesn't always get along, and I
> think your GFF3 is one example of that happening. I switched to the
> Bio::DB::SeqFeature::Store adaptor.
>
> 2. With the switch to SeqFeature::Store, you don't need aggregators
> any more, so I switched the [EntrezGene] track to use gene:GenBank
> features and the gene glyph.
>
> 3. I changed all occurrences of 'iD' to 'ID' (I'm off to BioPerl next
> to see what caused this so I can make it stop).
>
> 4. I added a reference seqeunce line; you had a line like this:
>
> Chr10 GenBank region 1 1994762 . . . ID=NW_047331
>
> I changed it to this:
>
> Chr10 GenBank region 1 1994762 . . . ID=Chr10;Name=Chr10
>
> (of course, I suspect that rat chromosome 10 is bigger than that; the
> alternative would be to change column 1 to NW_047331 throughout the
> file.)
>
> Switching to SeqFeature::Store will require a few changes, but not too
> many; basically, the only things that would be affected are the tracks
> that currently use aggregators. Please let me know if you need any
> help with the transition.
>
> Scott
>
>
> On Mon, Sep 15, 2008 at 4:29 PM, Don Gilbert
> <gilbertd at cricket.bio.indiana.edu> wrote:
>>
>>
>> Burcu,
>>
>> It may be you have to work thru a few changes. The 'iD' problem
likely was
>> part of it, your aggregator also needs to be updated with corrections
for ID/Parent
>> tags.
>>
>>>> EntrezGene{CDS,exon/mRNA}
>>
>> This one should work when CDS,exon have Parent=mRNA.ID and mRNA has
ID=
>> This is equivalent to the processed_transcript aggregator
>> Bio/DB/GFF/Aggregator/processed_transcript.pm
>>
>> Aggregators are good when using Bio/DB/GFF databases; the
Bio/DB/SeqFeature/Store
>> databases do not use aggregators.
>>
>> PS, one of the tools, likely bp_genbank2gff3, created those funky
'iD' tags,
>> for reasons of its own.
>>
>> - Don Gilbert
>>
>
>
>
> --
>
------------------------------------------------------------------------
> Scott Cain, Ph. D. cain.cshl at gmail.com
> GMOD Coordinator (http://gmod.org/) 216-392-3087
> Cold Spring Harbor Laboratory
>
--
------------------------------------------------------------------------
Scott Cain, Ph. D. cain.cshl at gmail.com
GMOD Coordinator (http://gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
More information about the Gmod-help
mailing list