[Gmod-schema] [Gmod-help] using gmod_bulk_load_gff3.pl in a multiuser environment

Scott Cain scott at scottcain.net
Tue Sep 1 08:04:22 EDT 2009


Hi Cornel,

The good (and frankly, somewhat surprising to me) news is that  
gmod_load_gff3.pl still works for the most part.  It does not handle  
CDS and exon features in the "standard practices" way, because it was  
written before those standards existed, but it does load the database  
in a multiuser friendly way (I think!  I haven't tested it).  It is  
slow, I haven't performed timing tests, but it may be as much as an  
order of magnitude slower than the bulk loader.

So, things I am going to do (in this order):

1. Fix the bulk loader so that it will at least behave correctly in a  
multiuser environment.  That is, it will only allow one process at a  
time by implementing a locking table in the database.  It should make  
implementing a queue much easier.

2. Fix CDS/exon handling in gmod_load_gff3.pl.

Also, if you want me to, I can do timing tests with the two loaders so  
you can get an idea of just how slow gmod_load_gff3.pl is.

Scott


On Aug 31, 2009, at 10:59 AM, Scott Cain wrote:

> Just to follow up a little bit, I will test gmod_load_gff3.pl today,
> which is based on Class::DBI to see if it still works.  I haven't
> tried it in a long time, so it may need updating, but it may be more
> multiuser friendly.
>
> Scott
>
>
> On Mon, Aug 31, 2009 at 10:52 AM, Scott Cain<scott at scottcain.net>  
> wrote:
>> Hi Cornel,
>>
>> You absolutely cannot use the bulk loader by more than one user on  
>> the
>> same database at a time.  It will fail horribly, and not just because
>> of temporary tables/files: the bulk loader also gets the values of
>> table primary keys and increments them inside the loader, but not the
>> database, while it is running.  If another process comes along and
>> tries to do the same thing, it will reuse primary keys and then fail
>> to load the data because of that.  It is a fairly fundamental design
>> issue to make the loader faster (if it had to update the sequences in
>> the database it would really slow it down a lot).
>>
>> It may be possible to make a much slower loader that is multiuser
>> friendly, but if I were doing that, I would design it from the ground
>> up to use either Class::DBI, because that has been done before, but
>> isn't used much (at all?) any more, or DBIx::Class, because it is a
>> more modern ORM system.  In the meantime, the best you can do is  
>> force
>> the processes into a queue.
>>
>> Scott
>>
>>
>> On Mon, Aug 31, 2009 at 10:25 AM, Ghiban, Cornel<ghiban at cshl.edu>  
>> wrote:
>>> Hi all,
>>>
>>> I'm building a web application that uses this script to load  
>>> various analyses results into
>>> a chado database. We often notice the loading fails and it wasn't  
>>> difficult to see why. Since gmod_bulk_load_gff3.pl takes a few  
>>> good seconds to execute, a second or third instance of this script  
>>> will fail to work properly, if the first run is still running.
>>>
>>> This is because of the temporary tables.
>>>
>>> The first script will create a series of temporary tables,  
>>> meanwhile a 2nd script tries to do the same, but it fails - it may  
>>> still continue to run, I'm not sure, but then the 1st script  
>>> finishes (drops the temporary tables) and the 2nd script can't do  
>>> it's job.
>>>
>>> Maybe the temporary objects' name should contain a random string  
>>> (unique per run), that would prevent name clashes. Could this be  
>>> easily done? Or is there another solution?
>>>
>>> Also in the web environment, Bio::GMOD::DB::Adapter->file_handles  
>>> fails to create the temp files, unless I change the TEMPLATE to "/ 
>>> tmp/chado-$key-XXXX".
>>>
>>> Thanks,
>>> Cornel
>>>
>>>
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at  
>> scottcain dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>>
>> ------------------------------------------------------------------------------
>> Let Crystal Reports handle the reporting - Free Crystal Reports  
>> 2008 30-Day
>> trial. Simplify your report design, integration and deployment -  
>> and focus on
>> what you do best, core application coding. Discover what's new with
>> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
>> _______________________________________________
>> Gmod-schema mailing list
>> Gmod-schema at lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/gmod-schema
>>
>
>
>
> -- 
> ------------------------------------------------------------------------
> Scott Cain, Ph. D.                                   scott at  
> scottcain dot net
> GMOD Coordinator (http://gmod.org/)                     216-392-3087
> Ontario Institute for Cancer Research
>

-----------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research







More information about the Gmod-help mailing list