<p>Hi Margie,</p><p>I remember speaking with you in Toronto. I hope that you are still enjoying working in biology!</p><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div lang="EN-CA" link="blue" vlink="purple"><div>
<p style="text-indent:-18.0pt"><span>-<span style="font:7.0pt "Times New Roman"">
</span></span>I am creating two new database schemas that will
contain mostly genomic variation data as well as some phenotype data. These
data will also include information on a study, methods, platforms, subjects,
samples, etc.</p>
<p style="text-indent:-18.0pt"><span>-<span style="font:7.0pt "Times New Roman"">
</span></span>I would like to create a schema that suits the needs of
our organization. I have reviewed Chado in some detail and it does not suit the
needs of our organization. Ideally, our own schema should be used and I would
like to continue with this approach.</p></div></div></blockquote><div>Can you describe what you found lacking in Chado? This will help us improve it in the near future: Chado is extendable and NESCent (<a href="http://nescent.org">nescent.org</a>) has developed a natural diversity module for Chado. This is still in Beta (and is likely to change before it is released). It is based on the GDPDM, which is used at Gramene and MaizeGenetics for this purpose. One of my deliverables for 2009 is to get the natural diversity module out of Beta and into production Chado.</div>
<div>Several things should help this along. One is a NESCent working group that needs this to be done, and secondly we are trying to schedule a GMOD natural diversity hackathon for 2009 that will move this work forward. </div>
<div></div><div>If you are interested the natural diversity module and GDPDM are described at:</div><div> <a href="http://heliconiusdb.svn.sourceforge.net/viewvc/heliconiusdb/trunk/schema/doc/">http://heliconiusdb.svn.sourceforge.net/viewvc/heliconiusdb/trunk/schema/doc/</a></div>
<div> <a href="http://www.maizegenetics.net/gdpdm/">http://www.maizegenetics.net/gdpdm/</a></div><div></div><div>I think all this work may come too late for your needs. However, I encourage you to look at the current beta release as a possible solution. When I actually get to work on this (probably starting in February) I may ask you for any insights you have and for a copy of your schema. If you are really lucky (!) I might even ask if you are interested in attending the hackathon. :-)</div>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div lang="EN-CA" link="blue" vlink="purple"><div>
<p style="text-indent:-18.0pt"><span>-<span style="font:7.0pt "Times New Roman"">
</span></span>We will most likely employ GBrowse as the genome
browser for display of data in the above databases.</p>
<p style="text-indent:-18.0pt"><span>-<span style="font:7.0pt "Times New Roman"">
</span></span>My highest level questions that I have yet to find
appropriate answers to are these:</p>
<p style="margin-left:72.0pt;text-indent:-18.0pt"><span style="font-family:"Courier New""><span>o<span style="font:7.0pt "Times New Roman"">
</span></span></span>Can I use my own schema to build the database
which underlies Gbrowse? If so, will a separate 'Bio::DB::GFF'
database need to be created to act as a bridge between my database and Gbrowse?</p>
<p style="margin-left:72.0pt;text-indent:-18.0pt"><span style="font-family:"Courier New""><span>o<span style="font:7.0pt "Times New Roman"">
</span></span></span>What components would I most likely need from
GMOD to get my database and GBrowse to work together?</p>
<p style="text-indent:-18.0pt"><span>-<span style="font:7.0pt "Times New Roman"">
</span></span>From what I can determine based on the documentation, I
should be able to use my own database schema to underlie GBrowse. It looks like
my database would require a GBrowse adaptor (Bio::DB::GFF??) and GBrowse. It
also looks like I might need an annotation pipeline, too.</p>
<p style="text-indent:-18.0pt"><span>-<span style="font:7.0pt "Times New Roman"">
</span></span>Other questions that arise are:</p>
<p style="margin-left:72.0pt;text-indent:-18.0pt"><span style="font-family:"Courier New""><span>o<span style="font:7.0pt "Times New Roman"">
</span></span></span>What is "Bio::DB::GFF"? Is it a
database? Schema? Adaptor?</p>
<p style="margin-left:72.0pt;text-indent:-18.0pt"><span style="font-family:"Courier New""><span>o<span style="font:7.0pt "Times New Roman"">
</span></span></span>Where does annotation data come from? What is
the annotation pipeline?</p></div></div></blockquote><div>GBrowse uses adaptors to read different data sources. The data source can be flat files (GFF3 + FASTA if you want the sequence), or databases, or any other data source you can imagine. I believe that all adaptors are written in Perl. Each adaptor has an expected input format. The database adaptors expect a specific schema to talk to. </div>
<div></div><div>So Bio::DB::GFF is a Perl module that is a GBrowse adaptor. It expects to read from a database with a specific schema. (Bio::DB::GFF also assumes GFF2, a now deprecated format.)</div><div></div><div>However, writing an adaptor is not a small undertaking. Probably a much easier way to tackle this is to write a program to export GFF3 and FASTA formatted files from your database and then load it into a into a Bio::DB::SeqFeature::Store MySQL database. This will likely be faster than running directly off of your source database. GFF3 is a flat file format for specifying genomic features (genes, exons, SNPs, ...) and relationships between them. FASTA is a flat file format for specifying sequence.</div>
<div></div><div>Since you have a custom database, there is not going to be any program that will create GFF3 or FASTA for you. FASTA should be trivial to create (if you have the sequence). GFF3 will require more work. Some code you could look at for inspiration is the GMODTools suite (<a href="http://gmod.org/wiki/GMODTools">http://gmod.org/wiki/GMODTools</a>). It does conversion from several formats to GFF3.</div>
<div></div><div>Where does annotation data come from? From an annotation pileline!</div><div></div><div>Wait. That answer isn't helpful, darnit. A pipeline is usually a series (thus a pipeline) of programs that performs some analysis on sequence. For example, you might have an already annotated reference genome, and a slew of short sequences reads from ESTs* from the latest high-throughput sequencer and you want to annotate the reference genome with the new data. your pipeline might be:</div>
<div></div><div>1. Assemble the short reads into a series of contigs (put the short reads together into longer chunks, hopefully each as long as the complete EST). </div><div>2. Align the contigs to the reference genome (figure out where they came from)</div>
<div>3. Create a GFF3 file and a FASTA file (not sure on the FASTA) describing where each EST aligns to and load it into GBrowse.</div><div></div><div>All of these steps may involve heavy magic. Fortunately, most of that magic is already done by the people who have written the programs to do the steps.</div>
<div></div><div>ESTs = a relatively easy way to find out what part of the genome is being transcribed (what the active genes are)</div><div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">
<div lang="EN-CA" link="blue" vlink="purple"><div>
<p>As I said, I am relatively new to GMOD and I find the online
documentation is plentiful, but not easily navigated by the newbie. After two
weeks of reading the documentation I find I am now going in circles looking for
answers to my questions – and information on how to design an information
system employing components of GMOD. </p>
<p>Ideally a diagram that displays a database and how it
interacts with the components of GMOD would be great to see. I haven't
yet found anything like this in the documentation. At the very least, if
someone could steer me in the right direction as far as what components I should
focus on and what specific documentation I can read, it would be appreciated.</p></div></div></blockquote><div>It is possible to use GBrowse as a standalone tool, without any other GMOD tools. A lot of people actually do this. It sounds like this might work fine for you. </div>
<div></div><div>Thanks for the documentation suggestions. We just did a community survey and one of the top priorities for the help desk was improving the documentation. Look for progress in 2009.</div><div></div><div>Finally, although you didn't ask for it, I can think of two GBrowse instances that might show datatypes that are sort of similar to what you are doing:</div>
<div> <a href="http://hapmap.org">http://hapmap.org</a></div><div> <a href="http://jimwatsonsequence.cshl.edu/cgi-perl/gbrowse/jwsequence/">http://jimwatsonsequence.cshl.edu/cgi-perl/gbrowse/jwsequence/</a> or</div><div>
<a href="http://jimwatsonsequence.cshl.edu/cgi-perl/gbrowse/cvsequence/">http://jimwatsonsequence.cshl.edu/cgi-perl/gbrowse/cvsequence/</a></div><div></div><div>Please let em know if you have any questions or comments.</div>
<div></div><div>Dave C</div><div>541 914 6324</div><div>AIM or Skype user: tnabtaf </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div lang="EN-CA" link="blue" vlink="purple">
<div><p></p>
<p> </p>
<p>Any assistance you can provide on these questions would be
tremendously appreciated. And if I can, in turn, provide some input on how to
create some "newbie" documentation, I will do so – to help
others in my situation.</p>
<p> </p>
<p>Also…I have 15 years' experience working with
relational databases…but not genomic databases…so you can assume a
level of technical understanding, but with the caveat that genomic databases
are new territory for me.</p>
<p> </p>
<p>Thanks so much for your time.</p>
<p> </p>
<p>Kind regards,</p>
<p> </p><font color="#888888">
<p>Margie Manker</p>
</font></div>
</div>
</blockquote></div>Was this helpful? Let us know at <a href="http://gmod.org/wiki/Help_Desk_Feedback">http://gmod.org/wiki/Help_Desk_Feedback</a><br>
<br><br>