<p>Hi Margie,</p><p>I remember speaking with you in Toronto. I hope that you are still enjoying working in biology!</p><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div lang="EN-CA" link="blue" vlink="purple"><div>


<p style="text-indent:-18.0pt"><span>-<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

</span></span>I am creating two new database schemas that will

contain mostly genomic variation data as well as some phenotype data. These

data will also include information on a study, methods, platforms, subjects,

samples, etc.</p>


<p style="text-indent:-18.0pt"><span>-<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

</span></span>I would like to create a schema that suits the needs of

our organization. I have reviewed Chado in some detail and it does not suit the

needs of our organization. Ideally, our own schema should be used and I would

like to continue with this approach.</p></div></div></blockquote><div>Can you describe what you found lacking in Chado? &nbsp;This will help us improve it in the near future: &nbsp;Chado is extendable and NESCent (<a href="http://nescent.org">nescent.org</a>) has developed a natural diversity module for Chado. This is still in Beta (and is likely to change before it is released). &nbsp;It is based on the GDPDM, which is used at Gramene and MaizeGenetics for this purpose. &nbsp;One of my deliverables for 2009 is to get the natural diversity module out of Beta and into production Chado.</div>

<div>Several things should help this along. &nbsp;One is a NESCent working group that needs this to be done, and secondly we are trying to schedule a GMOD natural diversity hackathon for 2009 that will move this work forward. &nbsp;</div>

<div></div><div>If you are interested the natural diversity module and GDPDM are described at:</div><div>&nbsp; <a href="http://heliconiusdb.svn.sourceforge.net/viewvc/heliconiusdb/trunk/schema/doc/">http://heliconiusdb.svn.sourceforge.net/viewvc/heliconiusdb/trunk/schema/doc/</a></div>

<div>&nbsp; <a href="http://www.maizegenetics.net/gdpdm/">http://www.maizegenetics.net/gdpdm/</a></div><div></div><div>I think all this work may come too late for your needs. &nbsp;However, I encourage you to look at the current beta release as a possible solution. &nbsp;When I actually get to work on this (probably starting in February) I may ask you for any insights you have and for a copy of your schema. &nbsp;If you are really lucky (!) I might even ask if you are interested in attending the hackathon. &nbsp;:-)</div>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div lang="EN-CA" link="blue" vlink="purple"><div>


<p style="text-indent:-18.0pt"><span>-<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

</span></span>We will most likely employ GBrowse as the genome

browser for display of data in the above databases.</p>


<p style="text-indent:-18.0pt"><span>-<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

</span></span>My highest level questions that I have yet to find

appropriate answers to are these:</p>


<p style="margin-left:72.0pt;text-indent:-18.0pt"><span style="font-family:&quot;Courier New&quot;"><span>o<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;

</span></span></span>Can I use my own schema to build the database

which underlies Gbrowse? If so, will a separate 'Bio::DB::GFF'

database need to be created to act as a bridge between my database and Gbrowse?</p>


<p style="margin-left:72.0pt;text-indent:-18.0pt"><span style="font-family:&quot;Courier New&quot;"><span>o<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;

</span></span></span>What components would I most likely need from

GMOD to get my database and GBrowse to work together?</p>


<p style="text-indent:-18.0pt"><span>-<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

</span></span>From what I can determine based on the documentation, I

should be able to use my own database schema to underlie GBrowse. It looks like

my database would require a GBrowse adaptor (Bio::DB::GFF??) and GBrowse. It

also looks like I might need an annotation pipeline, too.</p>


<p style="text-indent:-18.0pt"><span>-<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;

</span></span>Other questions that arise are:</p>


<p style="margin-left:72.0pt;text-indent:-18.0pt"><span style="font-family:&quot;Courier New&quot;"><span>o<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;

</span></span></span>What is "Bio::DB::GFF"? Is it a

database? Schema? Adaptor?</p>


<p style="margin-left:72.0pt;text-indent:-18.0pt"><span style="font-family:&quot;Courier New&quot;"><span>o<span style="font:7.0pt &quot;Times New Roman&quot;">&nbsp;&nbsp;

</span></span></span>Where does annotation data come from? What is

the annotation pipeline?</p></div></div></blockquote><div>GBrowse uses adaptors to read different data sources. &nbsp;The data source can be flat files (GFF3 + FASTA if you want the sequence), or databases, or any other data source you can imagine. &nbsp;I believe that all adaptors are written in Perl. &nbsp;Each adaptor has an expected input format. &nbsp;The database adaptors expect a specific schema to talk to. &nbsp;</div>

<div></div><div>So Bio::DB::GFF is a Perl module that is a GBrowse adaptor. &nbsp;It expects to read from a database with a specific schema. &nbsp;(Bio::DB::GFF also assumes GFF2, a now deprecated format.)</div><div></div><div>However, writing an adaptor is not a small undertaking. &nbsp;Probably a much easier way to tackle this is to write a program to export GFF3 and FASTA formatted files from your database and then load it into a into a Bio::DB::SeqFeature::Store MySQL database. &nbsp;This will likely be faster than running directly off of your source database. &nbsp;GFF3 is a flat file format for specifying genomic features (genes, exons, SNPs, ...) and relationships between them. &nbsp;FASTA is a flat file format for specifying sequence.</div>

<div></div><div>Since you have a custom database, there is not going to be any program that will create GFF3 or FASTA for you. FASTA should be trivial to create (if you have the sequence). &nbsp;GFF3 will require more work. &nbsp;Some code you could look at for inspiration is the GMODTools suite (<a href="http://gmod.org/wiki/GMODTools">http://gmod.org/wiki/GMODTools</a>). &nbsp;It does conversion from several formats to GFF3.</div>

<div></div><div>Where does annotation data come from? &nbsp;From an annotation pileline!</div><div></div><div>Wait. &nbsp;That answer isn&#39;t helpful, darnit. &nbsp;A pipeline is usually a series (thus a pipeline) of programs that performs some analysis on sequence. &nbsp;For example, you might have an already annotated reference genome, and a slew of short sequences reads from ESTs* from the latest high-throughput sequencer and you want to annotate the reference genome with the new data. &nbsp;your pipeline might be:</div>

<div></div><div>1. Assemble the short reads into a series of contigs (put the short reads together into longer chunks, hopefully each as long as the complete EST). &nbsp;</div><div>2. Align the contigs to the reference genome (figure out where they came from)</div>

<div>3. Create a GFF3 file and a FASTA file (not sure on the FASTA) describing where each EST aligns to and load it into GBrowse.</div><div></div><div>All of these steps may involve heavy magic. &nbsp;Fortunately, most of that magic is already done by the people who have written the programs to do the steps.</div>

<div></div><div>ESTs = a relatively easy way to find out what part of the genome is being transcribed (what the active genes are)</div><div></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;">

<div lang="EN-CA" link="blue" vlink="purple"><div>


<p>As I said, I am relatively new to GMOD and I find the online

documentation is plentiful, but not easily navigated by the newbie. After two

weeks of reading the documentation I find I am now going in circles looking for

answers to my questions – and information on how to design an information

system employing components of GMOD.&nbsp;</p>


<p>Ideally a diagram that displays a database and how it

interacts with the components of GMOD would be great to see. I haven't

yet found anything like this in the documentation. At the very least, if

someone could steer me in the right direction as far as what components I should

focus on and what specific documentation I can read, it would be appreciated.</p></div></div></blockquote><div>It is possible to use GBrowse as a standalone tool, without any other GMOD tools. &nbsp;A lot of people actually do this. &nbsp;It sounds like this might work fine for you.&nbsp;</div>

<div></div><div>Thanks for the documentation suggestions. &nbsp;We just did a community survey and one of the top priorities for the help desk was improving the documentation. &nbsp;Look for progress in 2009.</div><div></div><div>Finally, although you didn&#39;t ask for it, I can think of two GBrowse instances that might show datatypes that are sort of similar to what you are doing:</div>

<div>&nbsp; <a href="http://hapmap.org">http://hapmap.org</a></div><div>&nbsp; <a href="http://jimwatsonsequence.cshl.edu/cgi-perl/gbrowse/jwsequence/">http://jimwatsonsequence.cshl.edu/cgi-perl/gbrowse/jwsequence/</a> or</div><div>

&nbsp; <a href="http://jimwatsonsequence.cshl.edu/cgi-perl/gbrowse/cvsequence/">http://jimwatsonsequence.cshl.edu/cgi-perl/gbrowse/cvsequence/</a></div><div></div><div>Please let em know if you have any questions or comments.</div>

<div></div><div>Dave C</div><div>541 914 6324</div><div>AIM or Skype user: tnabtaf&nbsp;</div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex;"><div lang="EN-CA" link="blue" vlink="purple">

<div><p></p>


<p>&nbsp;</p>


<p>Any assistance you can provide on these questions would be

tremendously appreciated. And if I can, in turn, provide some input on how to

create some "newbie" documentation, I will do so – to help

others in my situation.</p>


<p>&nbsp;</p>


<p>Also…I have 15 years' experience working with

relational databases…but not genomic databases…so you can assume a

level of technical understanding, but with the caveat that genomic databases

are new territory for me.</p>


<p>&nbsp;</p>


<p>Thanks so much for your time.</p>


<p>&nbsp;</p>


<p>Kind regards,</p>


<p>&nbsp;</p><font color="#888888">


<p>Margie Manker</p>


</font></div>


</div>


</blockquote></div>Was this helpful? &nbsp;Let us know at <a href="http://gmod.org/wiki/Help_Desk_Feedback">http://gmod.org/wiki/Help_Desk_Feedback</a><br>

<br><br>