[Gmod-help] Re: [Gmod-gbrowse] NGS in GGB

Tue Mar 17 20:55:50 EDT 2009

Dave,
Here is one for your talk:
http://www.nabble.com/Wiggle-density-with-feature-masking:-Dros.-mel.-transcriptome-expression-map-td19363162.html

or http://insects.eugenes.org/species/data/dmel5/modencode/

I've more gbrowse pix, but mostly the data owners won't share yet.

My suggestion for teaching to handle big data from the next-gen machines:
folks should learn to use R, along with Perl, to summarize and quantify
these data sets.  That also means learning some basic data manipulations
like partitioning large data sets for analyses, assessing the contents
of them with summary statistics (e.g need to know min, max, quantiles for
xyplot or wiggle displays, as well as whether you have outliers).

I don't know that data transport is as much the issue , where cheap disks
remain able to hold all the data sets, as much as enabling the biologists/genomicists
with weak informatics skills to view and process large data sets on their
personal / lab computers.  Many of these data sets have the size of the
genome sequences, but the greater complexity of microarray data, as experimenters
throw in many treatments and manipulations.  So the lab scientists are the
ones who best know contents and likely analyses, more than an informatician
just used to processing standard sequence data.   

A GMOD role, perhaps more a general bioinformatics role, in this may be to
document practices that work for others, maybe including simple code for
data processing.  I don't think these data sets are necessarily database-ready,
pushing them into a Chado db seems a waste of time to me, where file processing
with Perl and R can do all you need (but of course genome-centric summary results
should make its way into a long-term database).

- Don Gilbert
-- d.gilbert--bioinformatics--indiana-u--bloomington-in-47405
-- gilbertd at indiana.edu--http://marmot.bio.indiana.edu/