Hi Tom,<br><br>I will answer some of your questions below. I'll try to avoid answering the same questions that Scott did.<br><br>Dave C<br><br><div class="gmail_quote">On Wed, Oct 20, 2010 at 3:56 PM, Walk, Tom <span dir="ltr"><<a href="mailto:Tom.Walk@ars.usda.gov">Tom.Walk@ars.usda.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div link="blue" vlink="purple" lang="EN-US">
<div>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">Chris,</span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">I am a new postdoc working for Scott Geib. Please pardon
any redundancy or confusion on my part.</span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">fyi, I am including gmod help in case they can more readily
answer my questions.</span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">Your input is very much appreciated here, where we are sequencing
the oriental fruit fly genome along with transcripts, and are in the initial
stages of web and database development. I have worked at the Broad
Institute and TAIR, so I am familiar with using these things, but they were
already well constructed before my arrival. Here at the USDA we are
starting from scratch.</span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">Right now, there are a few things I want to ask about.</span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">One thing I am unclear about is the distinction between a
GBrowse db and a drupal db. </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">Would the drupal db be part of web user queries? For
example, if a web user wants a list of all kinases, would the search only use
the drupal db? If so, it seems redundant to me. We will have all of
the information in a genomics db, perhaps with a Chado schema. I can see
that using this for web queries might pose integrity risks. Are you
suggesting that we use 2 somewhat overlapping db, one for internal use and
another for public use. If so, I see the public db being mostly a subset
of the internal one, with us choosing which fields to make public, and possibly
adding tables for web specific info, such as links followed or user info. </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> Alternatively, should we use the GBrowse db for internal
use as well as the backend for genome related web use, and limit the drupal db
use to nongenome related web info?</span></p></div></div></blockquote><div><br>I am far from a Drupal expert, but I'll take a stab at this. I concur with Chris that you want a separate database for GBrowse. Something like a Bio::SeqFeature::Store database is custom built to power an application like GBrowse. As Scott said, Chado can do it for small to medium datasets but it won't be quite as fast. For large datasets, performance will become a serious issue.<br>
<br>It's worth thinking about viewing a Chado instance as your "system of record." It is where the definitive copy of your data lives. It can also store a lot more datatypes then a GBrowse-centric db. So, yes, I would use 1 db to power your genome browser, and Chado to power your non-genome website. Therefore, Drupal would not come into consideration for your genome browser, only your non-genome web data.<br>
<br>Drupal complicates things further. Drupal 6 (as I understand it) does not integrate well with pre-existing databases that weren't created by Drupal, e.g., Chado. Tools like Tripal and GMOD-DBSF (see http:/<a href="http://gmod.org/wiki/Tripal">gmod.org/wiki/Tripal</a> and <a href="http://gmod.org/wiki/Gmod-dbsf">http://gmod.org/wiki/Gmod-dbsf</a>) jump through some hoops to make Drupal talk to Chado. Tripal uses synchrnoization to keep the Drupal side and the Chado side in agreement. I'm not sure what GMOD-DBSF does.<br>
<br>This will become easier in Drupal 7, which will be able to talk directly to external databases. See <a href="http://drupal7releasedate.com/">http://drupal7releasedate.com/</a> for a statistics based estimate on when Drupal 7 will be released. However, that won't immediately help Tripal or GMOD-DBSF, as they will both need to be upgraded, and that can happen until all the Drupal modules they use are ported to Drupal 7, which will likely be a while.<br>
<br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div link="blue" vlink="purple" lang="EN-US"><div><p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">Should we use the same RDBMS for both? It would seem to be
simpler, but I may be missing some reasons why we need both mySQL and
PostgreSQL.</span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">While I am on the subject, does PostgreSQL have problems with
the size of sequence objects? Should that factor into our decisions?</span> </p></div></div></blockquote><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div link="blue" vlink="purple" lang="EN-US"><div>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">As for which tools we will use, I think that a lot of decisions
remain. We may use GBrowse and Apollo, but I am also experimenting with
Argo. I was leaning toward using Chado, but from your email, and from
looking at the tables, it seems that we may want to use another schema,
probably simpler than Chado for our purposes.</span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">Is GBrowse limited to the schemas outlined under adaptors in the
following?</span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"><a href="http://gmod.org/wiki/GBrowse#About_Databases" target="_blank">http://gmod.org/wiki/GBrowse#About_Databases</a></span></p></div></div></blockquote>
<div><br>There are probably one or two other ones floating around out there, but these are the ones that are well supported. I would say that everything from Bio::DB::SeqFeature::Store through Bio::DB::Das::Chado is both well supported and widely enough used to consider. <br>
<br> <br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div link="blue" vlink="purple" lang="EN-US"><div>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">Do you know which ones are most supported or least buggy?
Do you have a recommendation?</span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">In a related inquiry, are the Gmod tools flexible? </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">I see a lot of tables in Chado, which is great, but for whatever
db we use, can we add or delete tables? How difficult is it to
incorporate new tools or info?</span></p></div></div></blockquote><div><br>Flexibility is a key goal at GMOD. We strive to have software that is useful in a wide variety of environments. This means the software is usually both very configurable and extensible. However, I would say that almost every large project pushes the software in new directions. Sometimes this new functionality is implemented by the tool developers, and sometimes by users.<br>
<br>You can drop/modify/add tables and columns to Chado. Many organizations do this. I would be very careful deleting things from the core: General, CV, Sequence, Pub, Organism. Tools should continue to work with the addition of new tables/columns, provided this doesn't cause constraint violations for existing tools.<br>
<br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div link="blue" vlink="purple" lang="EN-US"><div>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> For example, we are dealing with multiple strains/species.
If SNP analysis is not in the db, can we add table for it, or are we
constrained to existing tables and fields?</span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">In addition, how well do the GMOD tools and db's handle existing
functional or structural analysis or adapt to new analysis tools?</span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">Finally, at USDA, we are interested in biotic
interactions. Are there tables to link organisms like there are for
linking protein interactions? If not, are the tools extensible for that?
</span></p></div></div></blockquote><div><br>This is a poorly documented area of Chado. There is a phylogeny module for that type of data. There is also a natural diversity module under development that will be able to support arbitrary crosses and breeding. This will all likely become clearer during the upcoming GMOD Evo Hackathon next month.<br>
</div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;"><div link="blue" vlink="purple" lang="EN-US"><div>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">It looks like anything that can be mapped out as a feature with
genome coordinates can be handled. So if we use new tools, or find pathogenicity
genes or markers, then we can use GMOD. Perhaps you can correct that if
it is wrong or too simplistic.</span></p></div></div></blockquote><div><br>I think this is correct. <br></div><blockquote class="gmail_quote" style="margin: 0pt 0pt 0pt 0.8ex; border-left: 1px solid rgb(204, 204, 204); padding-left: 1ex;">
<div link="blue" vlink="purple" lang="EN-US"><div>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">I think that is enough for today. It seems that I still
have a lot to figure out. Sorry if this is too long. As I learn
more and we progress, I will likely seek out more advice. If you would
prefer to talk to me on the phone, please call the number below.</span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">Thanks for all of you help to date and for any feedback you can
provide to this inquiry.</span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">Tom Walk</span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"><a href="mailto:tom.walk@ars.usda.gov" target="_blank">tom.walk@ars.usda.gov</a></span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);">808 932 2176</span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<p class="MsoNormal"><span style="font-size: 11pt; color: rgb(31, 73, 125);"> </span></p>
<div style="border-width: 1pt medium medium; border-style: solid none none; border-color: rgb(181, 196, 223) -moz-use-text-color -moz-use-text-color; padding: 3pt 0in 0in;">
<p class="MsoNormal"><b><span style="font-size: 10pt;">From:</span></b><span style="font-size: 10pt;"> Chris Childers
[mailto:<a href="mailto:genetics.guy@gmail.com" target="_blank">genetics.guy@gmail.com</a>] <br>
<b>Sent:</b> Wednesday, October 20, 2010 4:22 AM<br>
<b>To:</b> Natasha Sostrom<br>
<b>Subject:</b> Re: Bee Base site - databases</span></p>
</div>
<p class="MsoNormal"> </p>
<p class="MsoNormal" style="margin-bottom: 12pt;">Hi Natasha,<br>
<br>
The short answer is that I would recommend keeping your GBrowse database
separate from the drupal database, for the simple reason that this allows you
to have more flexibility in the future. You might want to run both
databases on MySQL or Postgres, or have one on each. I'll talk a little
more about this below. I apologize if this is something you already know
about, but I wanted to try clarifying my earlier response. The long
answer is below.<br>
<br>
There is an important distinction that a lot of people can get mixed up over
when talking about databases, and this can be confusing to folks that are just
getting into it. There are actually two distinct things people talk about
when they mention databases. One is a "RDBMS" or Relational Database
Management System", and the other is the databases that live in that
system. <br>
<br>
The RDBMS is something like MySQL, or postrges, or Oracle, and it
includes all the software for storing and managing information. Many of
the RDBMS out there use SQL, and there is a lot of overlap in how you interact
with the data, regardless of whether it is a postgres or mysql database.
There are some differences though, and that's why people use different systems
for different uses. Each RDBMS can hold many databases, and each database
can have lots of data. <br>
<br>
The GMOD tool Chado is the main relational database for housing all the
information you might have, but it has historically had problems when used as a
back end for GBrowse. GBrowse has several different database schemas (a
schema is like a blueprint for how to store the data) that it can use, as
long as you specify which one you use. <br>
<br>
That was why I was asking if you were still planning to only use GBRowse, or if
you had decided to also start using Chado. If you are going to use Chado,
I have heard that the new version of GBrowse runs a lot better with it, but I
haven't tested it myself. If you guys are only planning to use GBrowse,
you might just want to use one of the basic MySQL databases. Those are
much smaller and run really fast. <br>
<br>
Sorry about the long winded answer. I hope this helps you guys with your
planning. <br>
<br>
Thanks,<br>
Chris</p>
<div>
<p class="MsoNormal">On Tue, Oct 19, 2010 at 7:44 PM, Natasha Sostrom <<a href="mailto:sostrom@hawaii.edu" target="_blank">sostrom@hawaii.edu</a>>
wrote:</p>
<p class="MsoNormal">Chris,<br>
<br>
<br>
I apologize for not being clear about what the situation was. Right
now we are still in the development stage. Nothing has gone live, and we are
trying to make some decisions about where we want our site to go and
such. <br>
<br>
<br>
MySQL is what we were using for the general functionality of the
Drupal site. As we speak we have not set up anything on the website to display
data. Is it best to JUST use postgres?<br>
<br>
<br>
I did see the iFrame module, which seems very useful. Which is why I'm
wondering whether we should use two separate databases or just one. To chose
just ONE database for the entire website, which would be best?<br>
<br>
<br>
Thank you<br>
<span style="color: rgb(136, 136, 136);">Natasha Sostrom</span></p>
<div>
<div>
<p class="MsoNormal"><br>
<br>
----- Original Message -----<br>
From: Chris Childers <<a href="mailto:genetics.guy@gmail.com" target="_blank">genetics.guy@gmail.com</a>><br>
Date: Tuesday, October 19, 2010 3:33 am<br>
Subject: Re: Bee Base site - databases<br>
To: Natasha Sostrom <<a href="mailto:sostrom@hawaii.edu" target="_blank">sostrom@hawaii.edu</a>><br>
<br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>Hi Natasha,<br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span><br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>Are you still
planning to run GBrowse only, or are you using Chado? In our lab, we have
instances of Chado to store our community annotation data and mysql databases
to house the GBrowse data. <br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span><br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>When you say a
mysql database site, are you referring to a GBrowse page? Or are you
using some other software to display the data in the postgres database? <br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span><br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>In terms of
showing a GBrowse page using an iframe, this is not a problem as long as you
are nor planning to send extra information va the address bar. Drupal has
an iframe plugin that simplifies the syntax for making an iframe, and it can
auto set the frame height to the length of the page, which is great for
dynamically generated pages. <br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span><br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>I hope this
helps,<br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>Chris<br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span></p>
<div>
<p class="MsoNormal"><span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>On
Mon, Oct 18, 2010 at 8:17 PM, Natasha Sostrom <<a href="mailto:sostrom@hawaii.edu" target="_blank">sostrom@hawaii.edu</a>> wrote:</p>
<p class="MsoNormal"><span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>Chris,<br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span><br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span><br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>I emailed you a
while back about Gbrowse and Drupal. Now we have come to find that we need to
use PostgreSQL for GMOD, while the Drupal site is currently using MySQL. In the
last email you mentioned using iFrames which is a good way to display a
postgresql database site within a mysql site. Is this what you did? <br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span><br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span><br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>A fellow employee
mentioned that it may be best to just use one database (migrating to
PostgreSQL).<br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span><br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span><br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>Do you have any
insight about this? <br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span><br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span><br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>Thanks in
advance,<br>
<span style="font-size: 10.5pt; color: rgb(136, 136, 136); background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span><span style="color: rgb(136, 136, 136);">Natasha Sostrom </span></p>
</div>
<p class="MsoNormal"><span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span><br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span><br clear="all">
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span><br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>-- <br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>Chris Childers<br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>Postdoctoral
Fellow<br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>Elsik
Computational Genomics Laboratory<br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>Georgetown
University<br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>Department of
Biology<br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>406 Reiss Bldg<br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>Washington, DC
20057<br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>Phone
202-687-5855<br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span>Fax 202-687-5662<br>
<span style="font-size: 10.5pt; background: none repeat scroll 0% 0% rgb(245, 248, 240);">> </span></p>
</div>
</div>
</div>
<p class="MsoNormal" style="margin-bottom: 12pt;"><br>
<br clear="all">
<br>
-- <br>
Chris Childers<br>
Postdoctoral Fellow<br>
Elsik Computational Genomics Laboratory<br>
Georgetown University<br>
Department of Biology<br>
406 Reiss Bldg<br>
Washington, DC 20057<br>
Phone 202-687-5855<br>
Fax 202-687-5662</p>
</div>
</div>
</blockquote></div><br><br clear="all"><br>-- <br><a href="http://gmod.org/wiki/GMOD_News" target="_blank">http://gmod.org/wiki/GMOD_News</a><br><a href="http://gmod.org/wiki/Calendar" target="_blank">http://gmod.org/wiki/Calendar</a><br>
<a href="http://gmod.org/wiki/Help_Desk_Feedback" target="_blank">http://gmod.org/wiki/Help_Desk_Feedback</a><br>