[Gmod-gbrowse] [Gmod-help] Re: wiggle_xyplot smoothing

Dan Bolser dan.bolser at gmail.com
Fri Oct 8 05:20:13 EDT 2010


Thanks Tim! It now works perfectly!

Comments inline below.


On 7 October 2010 22:12, Timothy Parnell <Timothy.Parnell at hci.utah.edu> wrote:
> Hi Dan,
> It sounds like each feature is getting its own individual plot, complete with scale. This is because xyplot doesn't know that they should all be strung together. Add the xyplot option
> group_on = type
> and it should be able to group them all together (I tried it this morning on my instance, and it worked). Normally you group_on some other attribute, such as display_name, but BigWig features are not fully fleshed out SeqFeature objects, so they don't have the full complement of attributes. Fortunately, grouping on type seems to work.

Here is the 'old' config that loads the quantitative data from a
series of GFF scores:

[qdat_gff]
feature        = qfeature
glyph          = xyplot
graph_type     = histogram
group_on       = display_name
height         = 50
min_score      = 0
max_score      = 10
key            = qfeature from GFF


and here is the config for the much faster and now identical version
that uses the Bio::DB::BigWig adaptor:

[qdat_bigwig]
database       = bigwig
feature        = region
glyph          = xyplot
graph_type     = histogram
group_on       = type
height         = 50
min_score      = 0
max_score      = 10
key            = qfeature from BigWig


I'll add this info somewhere like here (when I get a chance):
http://gmod.org/wiki/GBrowse/Uploading_Wiggle_Tracks


> I've worked a little bit with Bio::DB::BigWig objects outside of the context of GBrowse in assembling a suite of scripts for data analysis. You can read more about the module in its POD documentation (Lincoln is quite thorough in documentation!).

I guess I've fallen into some documentation habits that I need to
break. I didn't see what I wanted in the TOC here
http://search.cpan.org/~lds/Bio-BigFile-1.03/lib/Bio/DB/BigWig.pm
(i.e. a subheading 'feature types' or similar) and I didn't even
scroll down beyond the synopsis.

I don't know if it's just me, but I really like reference documents
rather than use case documents. For example, I love the Apache manual.
The GBrowse 2 configuration manual on the wiki is really great because
its more reference orientated.


> This isn't the normal way of collecting features for displaying graphs in GBrowse (summary feature and wiggle_xyplot was, I think, the original intent), but it works. Though as you've found out from your exhausted laptop, collecting all those features is taxing! The summary feature really should be the way to go, as it is much more efficient.

It seems that now the features are being grouped correctly, the
excessive network and CPU burden has been resolved!


> As it sounds like you've discovered, tweaking the span in the source wig data should improve the appearance of the track.  I'm guessing the rounded/blocky nature of the graphs are inherent differences between the xyplot and wiggle_xyplot glyphs and how the data is collected for the graph.

Right. I'm happy now because I'm encoding the data as it is (one value
per kb of genome) and it's displaying as I want it (and fast!).

Thanks for your help!
Dan.


# insane fun!
irc://irc.perl.org/#gmod

#not so bad!
irc://irc.freenode.net/#bioperl

> Tim
>
>
>
>
> On 10/7/10 2:44 PM, "Dan Bolser" <dan.bolser at gmail.com> wrote:
>
> Hi Tim,
>
> Sorry for the confusion, I'm comparing BigWig vs. SeqStore (scores),
> not BigWig vs. Wig.
>
> Thanks for the tip! I didn't know you could use regular glyph types
> when using the bigwig adaptor... I guess I got confused by the wig
> instructions.
>
> Oh... I just noticed you changed the feature from 'summary' to
> 'region'... where can I read about those options?
>
> Now that I try:
> database       = bigwig
> feature        = region
> glyph          = xyplot
> graph_type     = line
>
>
> Things are getting pretty freaky... (see attached img of a 2kb
> region)... ahh... its drawing a scale for each 'feature', one every kb
> in my data... I tried scale = none, which removes the scales, but also
> the data! Also, loading the track suddenly seems to make my laptop die
> of CPU and network overload... switching back to summary (and
> wiggle_xyplot) seems to fix that problem.
>
> Well, the problem is no longer significant, and I can live with the
> slight rounding that the bigwig track shows. The smoothing on the wig
> track was initially obscuring the important peaks in my data, but I
> found that by tweaking the span setting on the wig, I could get
> results similar to what I'm now seeing with bigwig now.
>
> Thanks for your help,
> Dan.
>
>
> # insane fun!
> irc://irc.perl.org/#gmod
>
> #not so bad!
> irc://irc.freenode.net/#bioperl
>
>
> On 7 October 2010 20:06, Timothy Parnell <Timothy.Parnell at hci.utah.edu> wrote:
>> Hi Dan,
>>
>> I'm not sure I understand whether you're trying to compare the bigwig data to data stored in a wig file (referenced in a B:D:SF:Store database) or stored directly in the B:D:SF:Store database (as GFF features with scores).
>>
>> If you're trying to get the bigwig data to look identical to B:D:SF:Store source data, then you'll need to convince the bigwig db adaptor to return features (used by the xyplot glyph), and not the statistical summary that wiggle_xyplot glyph expects. Set the following
>> database     = bigwig_source_database
>> feature      = region
>> glyph        = xyplot
>> graph_type   = line
>> group_on     = type
>>
>> If you're instead trying to get the bigwig source data to look like wig source data, then I'm stumped as to why they look different. You're using wiggle_xyplot glyph for both, then, right? The arguments should be identical for both, with the exception of database and feature.
>>
>>
>> Tim
>>
>>
>>
>>
>> On 10/7/10 10:04 AM, "Dan Bolser" <dan.bolser at gmail.com> wrote:
>>
>> On 2 June 2010 21:31, Lincoln Stein <lincoln.stein at gmail.com> wrote:
>>> Sorry you've had such a miserable time of it. I am hoping to throw out
>>> Bio::Graphics::Wiggle entirely and replace it with a pure-perl
>>> implementation of Bio::DB::BigWig. The naive storage system used in BGW just
>>> doesn't work well across large regions. I recommend Bio::DB::BigWig if at
>>> all possible.
>>
>> I didn't immediately find information on how to configure GBrowse to
>> use Bio::DB::BigWig, so here is my config (I came here because I was
>> seeing a similar smoothing problem, which is not yet fully resolved):
>>
>> Under the [GENERAL] section:
>>
>> database      = basic
>>
>> # see '[basic:database]' below
>>
>> #db_adaptor    = Bio::DB::SeqFeature::Store
>> #
>> #db_args       = -adaptor DBI::mysql
>> #                -dsn www-potato:my.mysql.host
>> ##                -namespace gb_pot_agp
>> #                -user me
>> #                -pass secret
>>
>>
>> Just before the [TRACK DEFAULTS] section:
>>
>> [basic:database]
>> db_adaptor    = Bio::DB::SeqFeature::Store
>>
>> db_args       = -adaptor DBI::mysql
>>                -dsn www-potato:my.mysql.host
>> #                -namespace gb_pot_agp
>>                -user me
>>                -pass secret
>>
>> [bigwig:database]
>> db_adaptor    = Bio::DB::BigWig
>>
>> db_args       = -bigwig '/path/to/my/bigwig_file.bw'
>>
>> [clonesLink_bigwig]
>> database       = bigwig
>> feature        = summary
>> glyph          = wiggle_xyplot
>> graph_type     = line
>> bgcolor        = black
>> fgcolor        = black
>> height         = 50
>> min_score      = 0
>> max_score      = 10
>> scale          = right
>> category       = Link
>> key            = finally
>>
>>
>> I think the feature has to be 'summary', so I'm not sure if you can
>> have more than one track per database.
>>
>>
>> I'm still seeing differences between the track produced by the BigWig
>> database adaptor and the track produced using the SeqFeature::Store
>> database adaptor, unlike the situation reported above. For this
>> reason, I'm not sure if I set the step and span options correctly...
>> here is a sample of the GFF / WIG:
>>
>>
>> chr04   dundee  link intensity  1       1000    0       .       .       Name=Exhaust
>> chr04   dundee  link intensity  1001    2000    1       .       .       Name=Exhaust
>> chr04   dundee  link intensity  2001    3000    1       .       .       Name=Exhaust
>>
>>
>> track type=wiggle_0 name='link intensity' description='Exhaust'
>> fixedStep chrom=chr04 start=1 step=1000 span=1000
>>
>>
>> Attached is an image to confirm the (slight) difference in smoothing
>> behaviour of the two resulting tracks.
>>
>> Cheers,
>> Dan.
>>
>>
>> # fun for everyone!
>> irc://irc.perl.org/#gmod
>>
>> #also good!
>> irc://irc.freenode.net/#bioperl
>>
>>
>>> Lincoln
>>>
>>> On Wed, Jun 2, 2010 at 3:26 PM, Timothy Parnell
>>> <Timothy.Parnell at hci.utah.edu> wrote:
>>>>
>>>> Hi Dave,
>>>>
>>>> I identified the source of the problem and possible workarounds/fixes.
>>>>
>>>> The problem stems from the formatting of my wig file. The wig file I am
>>>> using is a variable step wig file that contains the microarray data value
>>>> recorded at just the probe's midpoint. However, when it is converted to the
>>>> binary file, it apparently is converted to a 1 bp fixed step, where all the
>>>> intervening positions are undef. If I use a custom script to pull data out
>>>> of the wib file, it is in 1 bp increments, with undef values at all
>>>> positions intervening the real values.
>>>>
>>>> The problem comes when the zoom level (in bp) exceeds the pixel resolution
>>>> of the track png. The image renderer apparently pulls out a data value at
>>>> every n'th position from the wig file across the region. If the value at
>>>> that position is undef, it simply extends the value from the previously
>>>> found position. This essentially gives the appearance of smoothed data, as
>>>> it may only come across a real value every so often across the region.
>>>>
>>>> There are two workarounds, which I have confirmed works for me. The first
>>>> is to generate a new wig file at 1 bp fixed step by interpolating the values
>>>> between the real number positions. The second is to use Lincoln's new
>>>> adaptor, Bio::DB::BigWig, which fundamentally works differently by pulling
>>>> out features with scores, rather than assuming a fixed 1 bp step interval.
>>>> The BigWig adaptor produces a track indistinguishable from a track produced
>>>> using the SeqFeature::Store database adaptor.
>>>>
>>>> The fix would be in the appropriate Graphics adaptor
>>>> (wiggle_xyplot.pm????) to first toss out all the undef values prior to down
>>>> sampling for rendering the track.
>>>>
>>>> I hope this helps.
>>>> Tim
>>>>
>>>>
>>>> On 6/1/10 5:35 PM, "Dave Clements, GMOD Help Desk" <help at gmod.org> wrote:
>>>>
>>>> Hi Timothy,
>>>>
>>>> I submitted a bug on this.  See
>>>>
>>>>
>>>> https://sourceforge.net/tracker/?func=detail&aid=3010143&group_id=27707&atid=391291
>>>>
>>>> Dave C.
>>>>
>>>> On Tue, May 25, 2010 at 3:10 PM, Timothy Parnell
>>>> <Timothy.Parnell at hci.utah.edu> wrote:
>>>> Hi Dave,
>>>> Yes, I forgot to mention this in my first email. I'm using the latest
>>>> relevant distributions available from CPAN.
>>>>
>>>> GBrowse-2.08
>>>> Bio-Graphics-2.09
>>>> BioPerl-1.6.1
>>>>
>>>> I have not tried checking anything out from the live developer versions.
>>>> Should I try those?
>>>>
>>>> Tim
>>>>
>>>>
>>>>
>>>>
>>>> On 5/25/10 3:54 PM, "Dave Clements, GMOD Help Desk" <help at gmod.org> wrote:
>>>>
>>>> Hi Timothy,
>>>>
>>>> I'm pretty sure this is a a bug.  I know the wig rendering code has been
>>>> touched in the last year or so.  What version of GBrowse is this happening
>>>> in?  I'm hopeful that upgrading will solve this problem.
>>>>
>>>> Dave C.
>>>>
>>>> On Mon, May 24, 2010 at 9:02 AM, Timothy Parnell
>>>> <Timothy.Parnell at hci.utah.edu> wrote:
>>>> Hello,
>>>>
>>>> I'm having issues with the smoothing of wiggle data when rendering a track
>>>> in GBrowse. Specifically, turning smoothing off.
>>>>
>>>> I have the same high resolution microarray data loaded both into a
>>>> SeqFeature::Store database as well as binary wiggle files (via the
>>>> wiggle2gff3.pl <http://wiggle2gff3.pl>  <http://wiggle2gff3.pl>  script).
>>>> Viewing tracks from each data source gives vastly different graphs (see
>>>> attached image). It's most apparent with the histogram plot. It appears that
>>>> the wiggle track is automatically binning the data to smooth it. However, I
>>>> anticipate the "noisy" data as seen in the database track, and it would be
>>>> nice to see it in the wiggle track.
>>>>
>>>> I have set the smoothing option to "none" in the wiggle track
>>>> configuration, but GBrowse appears to be ignoring this line. The smoothing
>>>> occurs at zoom levels above 2 kb, whereas at 1 kb or below the tracks are
>>>> the same. I can pull the data back out of the database and wiggle files
>>>> (using either custom scripts or downloading track data from GBrowse) and the
>>>> data is identical (within expectations of converting the data to 8-bit
>>>> dynamic range).
>>>>
>>>> Is there a way to control the semantic automatic smoothing of wiggle data?
>>>> Is my configuration set incorrectly? Or is this a bug?
>>>>
>>>> My track conf is below:
>>>>
>>>> [sample_chip_wig1]
>>>> feature      = deg_con_ratio_mono1_wig
>>>> glyph        = wiggle_xyplot
>>>> graph_type   = histogram
>>>> height       = 50
>>>> scale        = left
>>>> label        = 0
>>>> autoscale    = global
>>>> bicolor_pivot = zero
>>>> pos_color    = blue
>>>> neg_color    = red
>>>> smoothing    = none
>>>> smoothing_window = 1
>>>> key          = sample_chip_wig1
>>>>
>>>> [sample_chip_db1]
>>>> feature      = deg_con_ratio_mono1_244k
>>>> glyph        = xyplot
>>>> graph_type   = histogram
>>>> height       = 50
>>>> scale        = left
>>>> fgcolor      = black
>>>> min_score    = -2
>>>> max_score    = 2
>>>> label        = 0
>>>> group_on     = display_name
>>>> key          = sample_chip_db1
>>>>
>>>> Thanks for any assistance
>>>>
>>>> --
>>>>
>>>> Timothy J Parnell, PhD.
>>>> Research Associate
>>>> Howard Hughes Medical Institute
>>>> Department of Oncology
>>>> Huntsman Cancer Institute
>>>> University of Utah
>>>> Salt Lake City, UT 84112
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>>
>>>> _______________________________________________
>>>> Gmod-gbrowse mailing list
>>>> Gmod-gbrowse at lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>>
>>>> _______________________________________________
>>>> Gmod-gbrowse mailing list
>>>> Gmod-gbrowse at lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>>
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> Lincoln D. Stein
>>> Director, Informatics and Biocomputing Platform
>>> Ontario Institute for Cancer Research
>>> 101 College St., Suite 800
>>> Toronto, ON, Canada M5G0A3
>>> 416 673-8514
>>> Assistant: Renata Musa <Renata.Musa at oicr.on.ca>
>>>
>>> ------------------------------------------------------------------------------
>>> ThinkGeek and WIRED's GeekDad team up for the Ultimate
>>> GeekDad Father's Day Giveaway. ONE MASSIVE PRIZE to the
>>> lucky parental unit.  See the prize list and enter to win:
>>> http://p.sf.net/sfu/thinkgeek-promo
>>> _______________________________________________
>>> Gmod-gbrowse mailing list
>>> Gmod-gbrowse at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>>>
>>>
>>
>>
>
>




More information about the Gmod-help mailing list