[Gmod-help] gmod bulk upload

Scott Cain cain.cshl at gmail.com
Thu May 8 13:42:30 EDT 2008


Hi Stephen,

Sorry about the slowness, but that is pretty typical when the database
gets large.  What I usually do is to break the GFF files in to chunks
with about 2-300,000 features each and then write a simple bash script
to do the loading and then walk away (for a long time :-)

The problem comes down to one of the loader having to know what features
are already in the database to make sure that no unique indexes are
violated.  I have added some options to the loader script to experiment
with speeding it up, but honestly I don't know if any of them do any
good:

  --notransact
     prevents the loader from using a single transaction for a given
load; this is probably the one that would help the most to speed up, but
it is also less safe (since any loading error has the potential to hose
your database).

  --drop_indexes
     I put this in to see if index rebuilding during a given load was
causing a slow down.  My recollection was that it didn't help anything.

  --skip_vacuum
     By default, the loader vacuums the tables to make sure the
performance is good after the load, but when the tables are large, this
can take quite a while.  When I am loading several files in a row, I
will skip vacuum on many of them and only periodically do the vacuum.

Scott



On Wed, 2008-05-07 at 22:10 -0400, Stephen Ficklin wrote:
> Hello,
> 
>  
> 
> We have an installation of chado that has about 7million records in
> the feature table.  We’re uploading our data as GFF files using the
> gmod_bulk_load_gff3.pl  and we find that it is taking a very long
> time.  It has taken about 28 hours to upload 190,220 entries in two
> GFF files.   Is this normal?   It seems the more entries we add to the
> database the slower these uploads become.  We still have over a
> million more records to add to the database.  Is there any way we can
> speed up this upload?
> 
>  
> 
> Thanks,
> 
> Stephen Ficklin
> 
> Clemson University Genomics Institute
> 
> http://www.genome.clemson.edu/
> 
> 864-656-4298
> 
> 
-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                         cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory




More information about the Gmod-help mailing list