[Gmod-schema] [Gmod-help] gmod bulk upload

Scott Cain cain.cshl at gmail.com
Thu May 8 17:30:09 EDT 2008


Hi Josh,

The stuff about Pg 8.3 is good to know.  I am concerned that tools that
have been written for Chado and GBrowse will fail with Pg 8.3 though
because of changes in the way casting is done.  Of course, I haven't
done any testing of it yet :-)

Scott

On Thu, 2008-05-08 at 17:09 -0400, Josh Goodman wrote:
> I just wanted to add that since that discussion about multibyte encodings, PostgreSQL has improved 
> its LIKE/ILIKE query performance when operating on multibyte encodings in the 8.3 series 
> (http://www.postgresql.org/docs/8.3/interactive/release-8-3.html).  I have not yet run any 
> benchmarks to compare performance in Chado.
> 
> Your cumulative slow down as more features are loaded does not sound like an encoding issue to me 
> based on my experience.
> 
> I would instead start looking at hardware bottlenecks or PostgreSQL config options that might be set 
> too low.  I don't know enough about gmod_bulk_load_gff3.pl to say if there are potential problems there.
> 
> Josh
> 
> 
> Dave Clements, GMOD Help Desk wrote:
> > Dear Stephen,
> > 
> > I don't think this (lack of) performance is typical.  It suggests to
> > me something in the database is going awry.  It could be any number of
> > things (see http://gmod.org/PostgreSQL_Performance_Tips for some of
> > them).
> > 
> > It could also be the default encoding for the database.  If Postgres
> > is using a multibyte character encoding then that can slow things down
> > by a couple orders of magnitude.  See
> > 
> > http://sourceforge.net/mailarchive/forum.php?thread_name=200711082012.lA8KCtV15976%40cricket.bio.indiana.edu&forum_name=gmod-schema
> > 
> > for a discussion of that.
> > 
> > Has anything improved or have you discovered anything new since you
> > sent the e-mail?
> > 
> > I am also cross-posting this to the GMOD Schema list as people there
> > may have suggestions.
> > 
> > Thanks,
> > 
> > Dave C
> > GMOD Help Desk
> > 
> > On Wed, May 7, 2008 at 7:10 PM, Stephen Ficklin
> > <FICKLIN at exchange.clemson.edu> wrote:
> >> Hello,
> >>
> >>
> >>
> >> We have an installation of chado that has about 7million records in the
> >> feature table.  We're uploading our data as GFF files using the
> >> gmod_bulk_load_gff3.pl  and we find that it is taking a very long time.  It
> >> has taken about 28 hours to upload 190,220 entries in two GFF files.   Is
> >> this normal?   It seems the more entries we add to the database the slower
> >> these uploads become.  We still have over a million more records to add to
> >> the database.  Is there any way we can speed up this upload?
> >>
> >>
> >>
> >> Thanks,
> >>
> >> Stephen Ficklin
> >>
> >> Clemson University Genomics Institute
> >>
> >> http://www.genome.clemson.edu/
> >>
> >> 864-656-4298
> > 
> > -------------------------------------------------------------------------
> > This SF.net email is sponsored by the 2008 JavaOne(SM) Conference 
> > Don't miss this year's exciting event. There's still time to save $100. 
> > Use priority code J8TL2D2. 
> > http://ad.doubleclick.net/clk;198757673;13503038;p?http://java.sun.com/javaone
> > _______________________________________________
> > Gmod-schema mailing list
> > Gmod-schema at lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/gmod-schema
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory




More information about the Gmod-help mailing list