[Gmod-help] massively parallel sequencing question

Thu May 29 11:53:37 EDT 2008

Hi Jennifer,

In my lab we've tested the combination of GBrowse and Bio::DB::GFF on a 5X
coverage of C. elegans using Illumina sequencing (about 2.5 million reads).
Each read is stored as an individual feature, plus a FASTA file with the
sequences of each read. Under these conditions, both the display and the
responsiveness were fine. However, this is pretty small compared to typical
"SNP calling" experiments on vertebrates. For those, I think you will want
to store the coverage information using a "wiggle" track, plus a list of the
called SNPs and associated data such as allele frequencies. I am working on
a specialized display for this type of data, but no release date is
anticipated.

Lincoln

On Thu, May 29, 2008 at 11:41 AM, Scott Cain <cain.cshl at gmail.com> wrote:

> Hi Jennifer,
>
> I'm cc'ing the GMOD schema mailing list, because there have been other
> people wondering the same thing.
>
> First I should say that I don't really know, because no one has tried
> it.  That said, I can tell you that the FlyBase Chado schema has several
> million rows in their feature table and it works for them.  What you no
> doubt would need is a database server with enough horsepower and memory
> to do the job, as well as properly tuning the database server for
> performance.
>
> For use with GBrowse, I don't think I would advocate using Chado
> directly, as the Chado adaptor for GBrowse is significantly slower than
> the Bio::SeqFeature::Store database which is designed specifically for
> giving speedy query results for use with GBrowse.  You could set up a
> system where you use Chado as your working/annotation database and then
> set up a periodic dump of your features to GFF3 which would get loaded
> into a SeqFeature::Store database for use with GBrowse.
>
> Also, in the upcoming release of GBrowse there will be support for
> wiggle tracks like in the UCSC browser, which will be well suited for
> displaying things like coverage density in a fast-rendering way.
>
> Scott
>
>
> On Thu, 2008-05-29 at 08:38 -0600, Jennifer Beane wrote:
> > Hi,
> >
> > I'm a post-doctoral fellow in bioinformatics and my lab is about to
> > receive data generated from a massively parallel sequencing platform
> > --  Illumina's genome analyzer.  The data will contain several million
> > short sequence reads from mRNA and microRNA.  There are several
> > software packages to align the reads to the human genome, but I will
> > need to create a way to store, filter, and efficiently annotate these
> > reads.  I'm thinking of loading the data into a chado database, and
> > using applications such as GBrowse to view the data.  I'm wondering if
> > you have any experience with using GMOD software/applications for this
> > type of data?  I'm wondering if the data will be too extensive to be
> > queried in a database?  If you have any advice/suggestions I would
> > really appreciate it.
> >
> > Thank you very much,
> > Jennifer Beane, Ph.D
> > Post-doctoral Fellow
> > Boston University School of Medicine
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                         cain at cshl.edu
> GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
> Cold Spring Harbor Laboratory
>
>

-- 
Lincoln D. Stein

Ontario Institute for Cancer Research
101 College St., Suite 800
Toronto, ON, Canada M5G0A3
416 673-8514
Assistant: Stacey Fairfield <Stacey.Fairfield at oicr.on.ca>

Cold Spring Harbor Laboratory
1 Bungtown Road
Cold Spring Harbor, NY 11724 USA
(516) 367-8380
Assistant: Sandra Michelsen <michelse at cshl.edu>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://brie4.cshl.edu/pipermail/gmod-help/attachments/20080529/56652d1c/attachment.html>