[Gmod-schema] [Gmod-help] using gmod_bulk_load_gff3.pl in a multiuser environment

Scott Cain scott at scottcain.net
Tue Sep 1 09:49:58 EDT 2009


Hi Cornel,

That's what I thought too, and that's why I assigned a higher priority  
to making the bulk loader more multiuser friendly (by at least  
specifically disallowing multiple processes).  If you want to take a  
look at gmod_load_gff3.pl, it is installed along with the bulk loader,  
so it is probably either in /usr/local/bin or /usr/bin.

Scott



On Sep 1, 2009, at 9:44 AM, Ghiban, Cornel wrote:

> Hi Scott,
>
> Thanks for investigating this. :)
>
> So, this is a different program than gmod_bulk_load_gff3.pl?
> If so, being 10 times slower that the bulk version might not help,  
> I'm not sure it worth the effort. And, where can I find it?
>
> As for the bulk version, I could easily run it thurough a queing  
> system to allow only one running instance of the program.
>
> Cornel
>
> -----Original Message-----
> From: Scott Cain [mailto:scott at scottcain.net]
> Sent: Tuesday, September 01, 2009 8:04 AM
> To: Scott Cain
> Cc: Ghiban, Cornel; gmod-schema at lists.sourceforge.net; help at gmod.org
> Subject: Re: [Gmod-schema] [Gmod-help] using gmod_bulk_load_gff3.pl  
> in a multiuser environment
>
> Hi Cornel,
>
> The good (and frankly, somewhat surprising to me) news is that  
> gmod_load_gff3.pl still works for the most part.  It does not handle  
> CDS and exon features in the "standard practices" way, because it  
> was written before those standards existed, but it does load the  
> database in a multiuser friendly way (I think!  I haven't tested  
> it).  It is slow, I haven't performed timing tests, but it may be as  
> much as an order of magnitude slower than the bulk loader.
>
> So, things I am going to do (in this order):
>
> 1. Fix the bulk loader so that it will at least behave correctly in  
> a multiuser environment.  That is, it will only allow one process at  
> a time by implementing a locking table in the database.  It should  
> make implementing a queue much easier.
>
> 2. Fix CDS/exon handling in gmod_load_gff3.pl.
>
> Also, if you want me to, I can do timing tests with the two loaders  
> so you can get an idea of just how slow gmod_load_gff3.pl is.
>
> Scott
>
>
> On Aug 31, 2009, at 10:59 AM, Scott Cain wrote:
>
>> Just to follow up a little bit, I will test gmod_load_gff3.pl today,
>> which is based on Class::DBI to see if it still works.  I haven't
>> tried it in a long time, so it may need updating, but it may be more
>> multiuser friendly.
>>
>> Scott
>>
>>
>> On Mon, Aug 31, 2009 at 10:52 AM, Scott Cain<scott at scottcain.net>
>> wrote:
>>> Hi Cornel,
>>>
>>> You absolutely cannot use the bulk loader by more than one user on
>>> the same database at a time.  It will fail horribly, and not just
>>> because of temporary tables/files: the bulk loader also gets the
>>> values of table primary keys and increments them inside the loader,
>>> but not the database, while it is running.  If another process comes
>>> along and tries to do the same thing, it will reuse primary keys and
>>> then fail to load the data because of that.  It is a fairly
>>> fundamental design issue to make the loader faster (if it had to
>>> update the sequences in the database it would really slow it down a
>>> lot).
>>>
>>> It may be possible to make a much slower loader that is multiuser
>>> friendly, but if I were doing that, I would design it from the  
>>> ground
>>> up to use either Class::DBI, because that has been done before, but
>>> isn't used much (at all?) any more, or DBIx::Class, because it is a
>>> more modern ORM system.  In the meantime, the best you can do is
>>> force the processes into a queue.
>>>
>>> Scott
>>>
>>>
>>> On Mon, Aug 31, 2009 at 10:25 AM, Ghiban, Cornel<ghiban at cshl.edu>
>>> wrote:
>>>> Hi all,
>>>>
>>>> I'm building a web application that uses this script to load  
>>>> various
>>>> analyses results into a chado database. We often notice the loading
>>>> fails and it wasn't difficult to see why. Since
>>>> gmod_bulk_load_gff3.pl takes a few good seconds to execute, a  
>>>> second
>>>> or third instance of this script will fail to work properly, if the
>>>> first run is still running.
>>>>
>>>> This is because of the temporary tables.
>>>>
>>>> The first script will create a series of temporary tables,  
>>>> meanwhile
>>>> a 2nd script tries to do the same, but it fails - it may still
>>>> continue to run, I'm not sure, but then the 1st script finishes
>>>> (drops the temporary tables) and the 2nd script can't do it's job.
>>>>
>>>> Maybe the temporary objects' name should contain a random string
>>>> (unique per run), that would prevent name clashes. Could this be
>>>> easily done? Or is there another solution?
>>>>
>>>> Also in the web environment, Bio::GMOD::DB::Adapter->file_handles
>>>> fails to create the temp files, unless I change the TEMPLATE to "/
>>>> tmp/chado-$key-XXXX".
>>>>
>>>> Thanks,
>>>> Cornel
>>>>
>>>>
>>>
>>>
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.                                   scott at
>>> scottcain dot net
>>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>>> Ontario Institute for Cancer Research
>>>
>>> ---------------------------------------------------------------------
>>> --------- Let Crystal Reports handle the reporting - Free Crystal
>>> Reports
>>> 2008 30-Day
>>> trial. Simplify your report design, integration and deployment - and
>>> focus on what you do best, core application coding. Discover what's
>>> new with Crystal Reports now.  http://p.sf.net/sfu/bobj-july
>>> _______________________________________________
>>> Gmod-schema mailing list
>>> Gmod-schema at lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>>
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at
>> scottcain dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>>
>
> -----------------------------------------------------------------------
> Scott Cain, Ph. D. scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/) 216-392-3087
> Ontario Institute for Cancer Research
>
>
>
>

-----------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research







More information about the Gmod-help mailing list