[Gmod-help] Re: [Gmod-gbrowse] more records that differ only tag name into attributes

Alessandra alessandra.bilardi at gmail.com
Mon Mar 16 06:59:34 EDT 2009


Dear Dave,
thanks for your replay.

> I'm not aware of a particular best practice for this. However, I'm
> going to be facing this shortly too.  Some thoughts:
> * Summarize total coverage for all reads in a wiggle file.
> * What are you trying to communicate to the user?

yes, it is true. we think that gff3 output could be used to:
* visualize data into gbrowse
when we want visualize all reads into gbrowse, we use wiggle format or
we dense in one region all reads with egual name (and/or alignment
position) with score egual to quantity reads and we create a track
with glyph = graded_segments..

* parsering data for secondary analisys
when somebody uses this gff3 output for secondary analisys, needs all
dense data about each read (to economize research about each read
data).. our attributes are: quality read, sequence read, position
about alignment, mismatch and gap positions, name of read, hits (total
number of alignment of that read)
so if the read 1_13_261_F3 aligns genome in chr10 at position
2790867..2790899, only 2..34 with 3 mismatches and zero gaps, now we
write:

chr10   pass    match   2790867 2790899 30      +       .
P="2-34";Note="M:3 -> -1/-1 0/1 17/17 2/0 27/27 2/0 31/31
1/2,G:0";ID=1e3705:94:0;Name=1_13_261_F3;Q=11 2 16 14 3 21 7 31 12 9
24 19 16 11 10 18 11 17 4 5 8 5 13 4 4 11 5 8 4 4 16 5 5 8 ;Hits=1;
Seq=agctagctgatcgatcgatcgatcgatcgatcgat

I use http://www.sequenceontology.org/gff3.shtml and
http://gmod.org/wiki/GBrowse_Configuration_HOWTO like my benchmarks
so I think that we could refine like that:

chr10   pass    match   2790867 2790899 30      +       .
ID=1e3705:94:0;Target= 1_13_261_F3 2 34 +; Hits=1; Q=11 2 16 14 3 21 7
31 12 9 24 19 16 11 10 18 11 17 4 5 8 5 13 4 4 11 5 8 4 4 16 5 5 8
;Seq=agctagctgatcgatcgatcgatcgatcgatcgat

if we have got gaps, then we could use cigar format.. but what could
we use about mismatches?

if one read aligns genome in five regions different, then we are
required to write five time hits, quality and sequence read.. or we
could create a fasta file about sequence read and quality read.. but
if we will wish to use gff3 standard output, then what do we want to
write? I would like know answer  whether data about each read are
dense or not.. for my
cultural background.

thanks in advance.
best,

-- 
 Alessandra Bilardi, Ph. D.
----
 CRIBI, University of Padova, Italy
 http://www.linkedin.com/in/bilardi
----



More information about the Gmod-help mailing list