[Gmod-help] get a new GBrowse to work

Scott Cain cain.cshl at gmail.com
Thu Sep 25 10:10:30 EDT 2008


Hi Zhiliang,

If you do a SELECT * FROM ftype in your database, I think you will get
this as a result:

mysql> select * from ftype;
+---------+----------+---------+
| ftypeid | fmethod  | fsource |
+---------+----------+---------+
|       1 | region   | Genbank |
|       2 | gap      | Genbank |
|       3 | gene     | Genbank |
|       4 | mRNA     | Genbank |
|       5 | exon     | Genbank |
|       6 | CDS      | Genbank |
|       7 | RNA      | Genbank |
|       8 | misc_RNA | Genbank |
+---------+----------+---------+

The "method" (types) are the only things you will be able to display
in GBrowse.  The region is the chromosome, so you can display gaps,
genes, mRNAs (with exons if you use the processed_transcript
aggregator/glyph), CDSes (with the cds aggregator/glyph) and RNAs.

To map your other features to what is in your current database
(assuming it's the same as mine), you need to have "NC_007320" in the
first column, since that is the ID used by the genbank loading script.

Scott


On Wed, Sep 24, 2008 at 4:55 PM, Zhiliang Hu <zhu at iastate.edu> wrote:
> Hi Scott,
>
> I used ncbi powerscript to download the chromosome genbank file (it took 2
> hrs 10 min) then used your bp_genbank2gff.pl to load db.  Looks like the
> tables are populated:
>
> o fattribute ( 13 records )
> o fattribute_to_feature ( 7782 records )
> o fdata ( 10939 records )
> o fdna ( 44259 records )
> o fgroup ( 1953 records )
> o fmeta ( 4 records )
> o ftype ( 8 records )
>
> but I still cannot bring up the graphs:
> http://www.animalgenome.org/cgi-bin/gbrowse/cattle/
>
> Could you help to see if I have any key part missing from the config file:
> http://www.animalgenome.org/hu/share/scott/cow.conf
>
> Thank you,
>
> Zhiliang
>
>
> At 10:33 AM 9/24/2008 -0400, Scott Cain wrote:
>
> There seems to be a problem with BioPerl related to getting the
> sequence directly from GenBank: if I download NC_007320 and then run
>
>    bp_genbank2gff.pl --file NC_007320.gbk  --dsn dbi:mysql:test --create
>
> it works fine in a couple of seconds.  If however I run
>
>   bp_genbank2gff.pl --accession NC_007320  --dsn dbi:mysql:test --create
>
> I get these two lines over and over again as it runs for a long time
> (I'm letting it go now so I can see how long it will take and what
> will eventually happen):
>
> Use of uninitialized value in pattern match (m//) at
> /usr/local/share/perl/5.8.8/Bio/SeqIO/genbank.pm line 663, <GEN1> line
> 115.
> Use of uninitialized value in pattern match (m//) at
> /usr/local/share/perl/5.8.8/Bio/SeqIO/genbank.pm line 667, <GEN1> line
> 115.
>
> Scott
>
>
> On Wed, Sep 24, 2008 at 9:56 AM, Zhiliang Hu <zhu at iastate.edu> wrote:
>> I repeated on the same machine (RHEL/RedHat Linux 2.4.21-20.ELsmp, 8GB
>> RAM)
>> - I counted 7 minutes before its quit on "Out of memory!" this time.
>>
>> I then installed Bioperl/GBrowse/etc last night on another machine (Linux
>> CentOS, 8GB RAM), tried the same to run on background.
>>
>> This morning I found the processes died away without loading the db. I
>> didn't find any core dump or else but only in the /tmp dir a file created
>> shortly after I started it:
>> http://nagrp2.ansci.iastate.edu/zhu/tmp/RJQApIbFbh.txt -- this doesn't
>> seem
>> to be right because on the browser it seems to be HUGE:
>>
>> http://www.ncbi.nlm.nih.gov/entrez/viewer.fcgi?val=194719399&view=gbwithparts
>>
>> Zhiliang
>>
>>
>> At 11:17 PM 9/23/2008 -0400, Lincoln Stein wrote:
>>
>> It may take a long time to run - try overnight.
>>
>> Lincoln
>>
>> On Tue, Sep 23, 2008 at 10:46 PM, Scott Cain <cain.cshl at gmail.com> wrote:
>> When I ran it, it spun it's wheels for a long time (30+ minutes) and I
>> killed it.   I tried the analagous thing with bp_genbank2gff.pl and
>> had to kill it too.  I thought it was a problem with bioperl until
>> just now, when I tried it with an E coli chromosome from genbank and
>> it worked fine (it ran a couple of minutes).
>>
>> Scott
>>
>>
>> On Tue, Sep 23, 2008 at 10:13 PM, Lincoln Stein <lincoln.stein at gmail.com >
>> wrote:
>>> Oh no! I've never seen a memory problem before. How long a time elapses
>>> between the original loading message and the first Out of memory!
>>> message?
>>>
>>> Lincoln
>>>
>>> On Tue, Sep 23, 2008 at 1:25 PM, Zhiliang Hu <zhu at iastate.edu> wrote:
>>>>
>>>> Scott,
>>>>
>>>> I decide to take an "easier" approach as a start - I will try to load
>>>> NCBI
>>>> and UCSC cattle genomes to GBrowse.  Once that works, I can move on with
>>>> more customized data sets.
>>>>
>>>> I have following questions in doing that:
>>>>
>>>> 1.
>>>> A technical one: When I try to load a cattle chromosome using your
>>>> 'load_genbank.pl', I got a memory problem (there is 8 GB RAM on the
>>>> machine
>>>> - I bet there must a work around?)
>>>>
>>>> > load_genbank.pl --create -dsn dbi:mysql:gb_cattle -user --pass
>>>> > --accession NC_007320
>>>> Loading NC_007320...
>>>> Out of memory!
>>>> Out of memory!
>>>> Out of memory!
>>>> Segmentation fault (core dumped)
>>>>
>>>> 2.
>>>> Can I load all chromosomes into one database?  Or should I create
>>>> separate
>>>> databases for each chromosome? (I assume the former but not sure).
>>>>
>>>> 3.
>>>> If I also bring in UCSC golden tracks, should I set up a different
>>>> database, Or can I put them into one db, naming UCSC chromosomes a
>>>> little
>>>> differently?
>>>>
>>>> Thank you,
>>>>
>>>> Zhiliang
>>>>
>>>>
>>>> --
>>>> Zhi-Liang Hu (PhD)
>>>> Associate Scientist,
>>>> Department of Animal Science,
>>>> Center for Integrated Animal Genomics,
>>>> National Animal Genome Research Program,
>>>> Iowa State University
>>>> Tel: 901-759-0643
>>>> Mob: 901-212-2820
>>>> Web: http://www.animalgenome.org
>>>>
>>>> "Not everything that counts can be counted, and
>>>>     not everything that can be counted counts."
>>>
>>>
>>> --
>>> Lincoln D. Stein
>>>
>>> Ontario Institute for Cancer Research
>>> 101 College St., Suite 800
>>> Toronto, ON, Canada M5G0A3
>>> 416 673-8514
>>> Assistant: Stacey Quinn <Stacey.Quinn at oicr.on.ca >
>>>
>>> Cold Spring Harbor Laboratory
>>> 1 Bungtown Road
>>> Cold Spring Harbor, NY 11724 USA
>>> (516) 367-8380
>>> Assistant: Sandra Michelsen <michelse at cshl.edu>
>>>
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D. cain.cshl at gmail.com
>> GMOD Coordinator (http://gmod.org/) 216-392-3087
>> Cold Spring Harbor Laboratory
>>
>>
>>
>>
>> --
>> Lincoln D. Stein
>>
>> Ontario Institute for Cancer Research
>> 101 College St., Suite 800
>> Toronto, ON, Canada M5G0A3
>> 416 673-8514
>> Assistant: Stacey Quinn <Stacey.Quinn at oicr.on.ca >
>>
>> Cold Spring Harbor Laboratory
>> 1 Bungtown Road
>> Cold Spring Harbor, NY 11724 USA
>> (516) 367-8380
>> Assistant: Sandra Michelsen <michelse at cshl.edu >
>
>
>
> --
> ------------------------------------------------------------------------
> Scott Cain, Ph. D. cain.cshl at gmail.com
> GMOD Coordinator (http://gmod.org/) 216-392-3087
> Cold Spring Harbor Laboratory



-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   cain.cshl at gmail.com
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Cold Spring Harbor Laboratory



More information about the Gmod-help mailing list