[Gmod-help] gmod bulk upload
Scott Cain
cain.cshl at gmail.com
Thu May 8 13:42:30 EDT 2008
Hi Stephen,
Sorry about the slowness, but that is pretty typical when the database
gets large. What I usually do is to break the GFF files in to chunks
with about 2-300,000 features each and then write a simple bash script
to do the loading and then walk away (for a long time :-)
The problem comes down to one of the loader having to know what features
are already in the database to make sure that no unique indexes are
violated. I have added some options to the loader script to experiment
with speeding it up, but honestly I don't know if any of them do any
good:
--notransact
prevents the loader from using a single transaction for a given
load; this is probably the one that would help the most to speed up, but
it is also less safe (since any loading error has the potential to hose
your database).
--drop_indexes
I put this in to see if index rebuilding during a given load was
causing a slow down. My recollection was that it didn't help anything.
--skip_vacuum
By default, the loader vacuums the tables to make sure the
performance is good after the load, but when the tables are large, this
can take quite a while. When I am loading several files in a row, I
will skip vacuum on many of them and only periodically do the vacuum.
Scott
On Wed, 2008-05-07 at 22:10 -0400, Stephen Ficklin wrote:
> Hello,
>
>
>
> We have an installation of chado that has about 7million records in
> the feature table. We’re uploading our data as GFF files using the
> gmod_bulk_load_gff3.pl and we find that it is taking a very long
> time. It has taken about 28 hours to upload 190,220 entries in two
> GFF files. Is this normal? It seems the more entries we add to the
> database the slower these uploads become. We still have over a
> million more records to add to the database. Is there any way we can
> speed up this upload?
>
>
>
> Thanks,
>
> Stephen Ficklin
>
> Clemson University Genomics Institute
>
> http://www.genome.clemson.edu/
>
> 864-656-4298
>
>
--
------------------------------------------------------------------------
Scott Cain, Ph. D. cain at cshl.edu
GMOD Coordinator (http://www.gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory
More information about the Gmod-help
mailing list