[Gmod-help] RE: Bee Base site - databases
Walk, Tom
Tom.Walk at ARS.USDA.GOV
Wed Oct 20 18:56:57 EDT 2010
Chris,
I am a new postdoc working for Scott Geib. Please pardon any redundancy
or confusion on my part.
fyi, I am including gmod help in case they can more readily answer my
questions.
Your input is very much appreciated here, where we are sequencing the
oriental fruit fly genome along with transcripts, and are in the initial
stages of web and database development. I have worked at the Broad
Institute and TAIR, so I am familiar with using these things, but they
were already well constructed before my arrival. Here at the USDA we
are starting from scratch.
Right now, there are a few things I want to ask about.
One thing I am unclear about is the distinction between a GBrowse db and
a drupal db.
Would the drupal db be part of web user queries? For example, if a web
user wants a list of all kinases, would the search only use the drupal
db? If so, it seems redundant to me. We will have all of the
information in a genomics db, perhaps with a Chado schema. I can see
that using this for web queries might pose integrity risks. Are you
suggesting that we use 2 somewhat overlapping db, one for internal use
and another for public use. If so, I see the public db being mostly a
subset of the internal one, with us choosing which fields to make
public, and possibly adding tables for web specific info, such as links
followed or user info.
Alternatively, should we use the GBrowse db for internal use as well as
the backend for genome related web use, and limit the drupal db use to
nongenome related web info?
Should we use the same RDBMS for both? It would seem to be simpler, but
I may be missing some reasons why we need both mySQL and PostgreSQL.
While I am on the subject, does PostgreSQL have problems with the size
of sequence objects? Should that factor into our decisions?
As for which tools we will use, I think that a lot of decisions remain.
We may use GBrowse and Apollo, but I am also experimenting with Argo. I
was leaning toward using Chado, but from your email, and from looking at
the tables, it seems that we may want to use another schema, probably
simpler than Chado for our purposes.
Is GBrowse limited to the schemas outlined under adaptors in the
following?
http://gmod.org/wiki/GBrowse#About_Databases
Do you know which ones are most supported or least buggy? Do you have a
recommendation?
In a related inquiry, are the Gmod tools flexible?
I see a lot of tables in Chado, which is great, but for whatever db we
use, can we add or delete tables? How difficult is it to incorporate
new tools or info?
For example, we are dealing with multiple strains/species. If SNP
analysis is not in the db, can we add table for it, or are we
constrained to existing tables and fields?
In addition, how well do the GMOD tools and db's handle existing
functional or structural analysis or adapt to new analysis tools?
Finally, at USDA, we are interested in biotic interactions. Are there
tables to link organisms like there are for linking protein
interactions? If not, are the tools extensible for that?
It looks like anything that can be mapped out as a feature with genome
coordinates can be handled. So if we use new tools, or find
pathogenicity genes or markers, then we can use GMOD. Perhaps you can
correct that if it is wrong or too simplistic.
I think that is enough for today. It seems that I still have a lot to
figure out. Sorry if this is too long. As I learn more and we
progress, I will likely seek out more advice. If you would prefer to
talk to me on the phone, please call the number below.
Thanks for all of you help to date and for any feedback you can provide
to this inquiry.
Tom Walk
tom.walk at ars.usda.gov
808 932 2176
From: Chris Childers [mailto:genetics.guy at gmail.com]
Sent: Wednesday, October 20, 2010 4:22 AM
To: Natasha Sostrom
Subject: Re: Bee Base site - databases
Hi Natasha,
The short answer is that I would recommend keeping your GBrowse database
separate from the drupal database, for the simple reason that this
allows you to have more flexibility in the future. You might want to
run both databases on MySQL or Postgres, or have one on each. I'll talk
a little more about this below. I apologize if this is something you
already know about, but I wanted to try clarifying my earlier response.
The long answer is below.
There is an important distinction that a lot of people can get mixed up
over when talking about databases, and this can be confusing to folks
that are just getting into it. There are actually two distinct things
people talk about when they mention databases. One is a "RDBMS" or
Relational Database Management System", and the other is the databases
that live in that system.
The RDBMS is something like MySQL, or postrges, or Oracle, and it
includes all the software for storing and managing information. Many of
the RDBMS out there use SQL, and there is a lot of overlap in how you
interact with the data, regardless of whether it is a postgres or mysql
database. There are some differences though, and that's why people use
different systems for different uses. Each RDBMS can hold many
databases, and each database can have lots of data.
The GMOD tool Chado is the main relational database for housing all the
information you might have, but it has historically had problems when
used as a back end for GBrowse. GBrowse has several different database
schemas (a schema is like a blueprint for how to store the data) that
it can use, as long as you specify which one you use.
That was why I was asking if you were still planning to only use
GBRowse, or if you had decided to also start using Chado. If you are
going to use Chado, I have heard that the new version of GBrowse runs a
lot better with it, but I haven't tested it myself. If you guys are
only planning to use GBrowse, you might just want to use one of the
basic MySQL databases. Those are much smaller and run really fast.
Sorry about the long winded answer. I hope this helps you guys with
your planning.
Thanks,
Chris
On Tue, Oct 19, 2010 at 7:44 PM, Natasha Sostrom <sostrom at hawaii.edu>
wrote:
Chris,
I apologize for not being clear about what the situation was. Right now
we are still in the development stage. Nothing has gone live, and we are
trying to make some decisions about where we want our site to go and
such.
MySQL is what we were using for the general functionality of the Drupal
site. As we speak we have not set up anything on the website to display
data. Is it best to JUST use postgres?
I did see the iFrame module, which seems very useful. Which is why I'm
wondering whether we should use two separate databases or just one. To
chose just ONE database for the entire website, which would be best?
Thank you
Natasha Sostrom
----- Original Message -----
From: Chris Childers <genetics.guy at gmail.com>
Date: Tuesday, October 19, 2010 3:33 am
Subject: Re: Bee Base site - databases
To: Natasha Sostrom <sostrom at hawaii.edu>
> Hi Natasha,
>
> Are you still planning to run GBrowse only, or are you using Chado?
In our lab, we have instances of Chado to store our community annotation
data and mysql databases to house the GBrowse data.
>
> When you say a mysql database site, are you referring to a GBrowse
page? Or are you using some other software to display the data in the
postgres database?
>
> In terms of showing a GBrowse page using an iframe, this is not a
problem as long as you are nor planning to send extra information va the
address bar. Drupal has an iframe plugin that simplifies the syntax for
making an iframe, and it can auto set the frame height to the length of
the page, which is great for dynamically generated pages.
>
> I hope this helps,
> Chris
>
> On Mon, Oct 18, 2010 at 8:17 PM, Natasha Sostrom <sostrom at hawaii.edu>
wrote:
> Chris,
>
>
> I emailed you a while back about Gbrowse and Drupal. Now we have come
to find that we need to use PostgreSQL for GMOD, while the Drupal site
is currently using MySQL. In the last email you mentioned using iFrames
which is a good way to display a postgresql database site within a mysql
site. Is this what you did?
>
>
> A fellow employee mentioned that it may be best to just use one
database (migrating to PostgreSQL).
>
>
> Do you have any insight about this?
>
>
> Thanks in advance,
> Natasha Sostrom
>
>
>
> --
> Chris Childers
> Postdoctoral Fellow
> Elsik Computational Genomics Laboratory
> Georgetown University
> Department of Biology
> 406 Reiss Bldg
> Washington, DC 20057
> Phone 202-687-5855
> Fax 202-687-5662
>
--
Chris Childers
Postdoctoral Fellow
Elsik Computational Genomics Laboratory
Georgetown University
Department of Biology
406 Reiss Bldg
Washington, DC 20057
Phone 202-687-5855
Fax 202-687-5662
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://brie4.cshl.edu/pipermail/gmod-help/attachments/20101020/7d8e78fa/attachment.html>
More information about the Gmod-help
mailing list