[Gmod-help] get a new GBrowse to work

Tue Sep 23 15:22:34 EDT 2008

Hi Zhiliang,

(I cc'ed this to the gbrowse mailing list, since there may be advice
or comments there on what course to take.)

I haven't tried to use load_genbank.pl in a few years, so I can't say
I'm surprised it doesn't work.  Its cousin script in BioPerl,
bp_genbank2gff.pl, doesn't appear to load a Bio::DB::GFF database
either, though it purports to.  I am currently of the opinion that the
functionality you are looking for (providing a GenBank accession and
having it loading into a GBrowse database) should be rewritten to use
more "modern" technology: bp_genbank2gff3.pl and
Bio::DB::SeqFeature::Store.  The SeqFeature::Store database is the one
you'd want to use with GFF3.

While I ponder the best way to go about doing that, my suggestion for
you to set up a Cow GBrowse is to first ask the folks at
bovinegenome.org (Chris Elsik's group) if they have a set of GFF3 they
could just give you.  Failing that, I would suggest getting all of the
GenBank files for the bovine genome and run bp_genbank2gff3.pl on
them.  With the created GFF3 files, you could load them into a
SeqFeature::Store database using bp_seqfeature_load.pl.

For your other questions: yes, you probably should put all of the cow
chromosomes in one database.  For the UCSC data, you could either put
it in the same database or not--if you aren't relating them to one
another, it probably would make sense to put them in a separate
database.

Scott

On Tue, Sep 23, 2008 at 1:25 PM, Zhiliang Hu <zhu at iastate.edu> wrote:
> Scott,
>
> I decide to take an "easier" approach as a start - I will try to load NCBI
> and UCSC cattle genomes to GBrowse.  Once that works, I can move on with
> more customized data sets.
>
> I have following questions in doing that:
>
> 1.
> A technical one: When I try to load a cattle chromosome using your
> 'load_genbank.pl', I got a memory problem (there is 8 GB RAM on the machine
> - I bet there must a work around?)
>
>> load_genbank.pl --create -dsn dbi:mysql:gb_cattle -user --pass --accession
>> NC_007320
> Loading NC_007320...
> Out of memory!
> Out of memory!
> Out of memory!
> Segmentation fault (core dumped)
>
> 2.
> Can I load all chromosomes into one database?  Or should I create separate
> databases for each chromosome? (I assume the former but not sure).
>
> 3.
> If I also bring in UCSC golden tracks, should I set up a different database,
> Or can I put them into one db, naming UCSC chromosomes a little differently?
>
> Thank you,
>
> Zhiliang
>
>
> --
> Zhi-Liang Hu (PhD)
> Associate Scientist,
> Department of Animal Science,
> Center for Integrated Animal Genomics,
> National Animal Genome Research Program,
> Iowa State University
> Tel: 901-759-0643
> Mob: 901-212-2820
> Web: http://www.animalgenome.org
>
> "Not everything that counts can be counted, and
>     not everything that can be counted counts."

-- 
------------------------------------------------------------------------
Scott Cain, Ph. D. cain.cshl at gmail.com
GMOD Coordinator (http://gmod.org/) 216-392-3087
Cold Spring Harbor Laboratory