[Gmod-help] Re: [Gmod-schema] Pipelines for 'short read processing' + chado back-end?

Wed Aug 26 17:27:52 EDT 2009

My off hand thoughts are: one could store the short read data in a 
Chado genome database, but that is a bit like storing all the traces from
your genome sequencing also, where most folks only want the finished
assembly for long term management in a genome database.  Handling the
preliminary, unassembled reads this way is probably a big effort.

The short read data processing tools like SAM/BAM would let you store 
and fetch this raw data, then you could save metadata / features derived from 
those in a Chado genome db.  

I like using a Unix file system instead of an RDBMS where suited;
raw data in bulk form from sequencers is perhaps best left in its raw files,
with indexing as needed to pull individual data. But usually you want 
to process the whole experiment as a batch, with say perl or C compiled
programs, then db-store the results of those as genome features.  Your
database can store the metadata for each experiment, including a file path
to the raw reads.

- Don