[Gmod-help] RE: [Gmod-gbrowse] Entering GenBank / Refseq file: fail to display in GBrowse2

Chevreux, Bastien Bastien.Chevreux at dsm.com
Wed Jul 14 08:02:14 EDT 2010


(sorry, forgot the list on the CC the first time)

Hello Dave,

thank you very much for asking back. You've got a good tracking system for unanswered questions :-)

I think I found the reason yesterday evening and wanted to work out a solution today so that I could propose something workable before answering to myself on the list. But as you asked ... here's what I found out and where my thoughts are going to.

My first guess was that something with the MySQL adaptor was broken, but after reproducing the Problem with SQLite I ditched that assumption. I then tried out the Volvox test data (as I should have done much earlier) and these worked flawlessly in SQLite (haven't tried MySQL yet, but I expect it to work there also). This pointed at the GFF3 itself. Comparing the Volvox GFF3 and the GFF3 created via bp_genbank2gff3, I stumbled across this comment in the Volvox GFF3:


# The contig establishes coordinates for segment of sequence

# It needs a name so that users can look it up, but it doesn't need an ID

# because it has no children.

Indeed, the GFF3 produced by bp_genbank2gff3 does not have a single "Name=" in it. It has "ID=", but no "Name=". Adding a few "Name=" by hand I could restore functionality for the entries which then had a name.

Looks like there were changes in GBrowse2 in the search functionality where the bp_genbank2gff3 script does not provide data for at the moment.

How to fix? Well, a couple of solutions come to my mind, but I do not know which one is "the best".


 1.  As immediate fix, I just hacked a sed script which simply adds a "Name=" to every entry having an "ID="

sed -e 's/ID=\([^;]*\)/&;Name=\1/' nc_000964.gbk.gff >nc_000964.gbk.gff.forgb2

This will of course cause problems for GFF files already having "ID=" and "Name=", but for files coming from the current bp_genbank2gff3 it seems OK.
The next two possibilities are tightly linked to how GBrowse2 performs its searches and how the authors define search functionality. As I currently don't know enough of this, please read the following just as "ideas":


 1.  I could imagine that changing "bp_genbank2gff3" itself might be a good option. Just adapt it to the fact that GBrowse2 wants "Name=" to be able to search.
 2.  Changing the (search) functionality of GBrowse2 might also be something worthwhile: every time a "Name=" is missing, use "ID=" (if existing) as substitute. Though this might have side-effects I don't know of yet.

What do you think?

Best,
  Bastien

PS: while we are at it: bp_genbank2gff3 replaces a number of characters with encoded strings (e.g. %3C). GBrowse2 does not decode these back to normal characters.



--
DSM Nutritional Products AG
R&D Human Nutrition & Health
Bioinformatics - Bldg. 203.4 / 188
P.O. Box 2676
CH-4002 Basel / Switzerland
Tel. +41 61 815 8264

________________________________
From: gmodhelp at googlemail.com [mailto:gmodhelp at googlemail.com] On Behalf Of Dave Clements, GMOD Help Desk
Sent: Mittwoch, 14. Juli 2010 01:15
To: Chevreux, Bastien
Cc: gmod-gbrowse at lists.sourceforge.net
Subject: Re: [Gmod-gbrowse] Entering GenBank / Refseq file: fail to display in GBrowse2

Hi Bastien,

Did you ever figure out this problem?  If not, I'll spend some time seeing what I can figure out.

Dave C.
On Mon, Jul 5, 2010 at 6:24 AM, Chevreux, Bastien <Bastien.Chevreux at dsm.com<mailto:Bastien.Chevreux at dsm.com>> wrote:
Hello there,

I've got GBrowse2 (the latest one could get by installing it last Friday) up and running fine with the demo data. I also got it running with some small own test data using the MySQL database, but now I'm running into a problem when I try it with "real" data from the NCBI.

Synopsis:

After having entered Genbank data from NCBI in a MySQL database, display of search results yields in a blank page in the browser and the following entry in the Apache error log:
  [Mon Jul 05 14:28:55 2010] [error] [client 172.25.140.138] Can't call method "seq_id" on an undefined value at /usr/local/lib/perl/5.10.0/Bio/Graphics/Browser2/Render.pm
    line 3640., referer: http://chkau66uxas150.kau12.dsm-group.com:2014/cgi-bin/gbrowse/bsub1682/"

How to reproduce:

I looked up the current "Tutorial" file in the GBrowse installation and used that as template. Here's what I did:


 1.  Download NC_000964 as GenBank format (including the sequence) from NCBI: http://www.ncbi.nlm.nih.gov/nuccore/NC_000964 The file should be some 11 to 12 MiB.
 2.  Transform to GFF3:
   bp_genbank2gff3 nc_000964.gbk
 3.  In a file (bla.sql), save
   drop database if exists gb2db_bs1682;
  create database gb2db_bs1682;
  grant all privileges on gb2db_bs1682.* to gbrowse2 at localhost;
  grant select on gb2db_bs1682.* to nobody at localhost;
  quit
 4.  Then create MySQL database:
  mysql -uroot -p <bla.sql
 5.  Enter GFF3 file into MySQL database:
  bp_seqfeature_load -c -u gbrowse2 -p ... -a DBI::mysql -d gb2db_bs1682 nc_000964.fasta nc_000964.gbk.gff
 6.  In /etc/gbrowse2/GBrowse.conf, add:
  [bsub1682]
  description   = Bacillus subtilis 168
  path          = bsub1682.conf
 7.  In /etc/gbrowse2/bsub1682.conf:
  [GENERAL]
  description   = B. subtilis 168 test in MySQL DB
  db_adaptor = Bio::DB::SeqFeature::Store
  db_args = -dsn gb2db_bs1682
            -user nobody
            -pass ""
+ the usual other configuration for a simle genome browser.
 8.  Navigate with Browser to the search page of the new genome, enter a search term (e.g. "hisE") and search -> blank page

I'll readily admit that I'm novice regarding GBrowse, the likelihood of having done something wrong is not 0. E.g., in the above "bp_seqfeature_load" call I did not add a file with FASTA data as I thought it to be unnecessary (the GFF3 contains everything needed). On the other hand, also uploading a FASTA file didn't change anything in the results.

What have I been doing wrong, or is this a genuine bug?

Best,
  Bastien

--
DSM Nutritional Products AG
R&D Human Nutrition & Health
Bioinformatics - Bldg. 203 / 115
P.O. Box 2676
CH-4002 Basel / Switzerland
Tel. +41 61 815 8264


________________________________
DISCLAIMER :
This e-mail is for the intended recipient only
If you have received it by mistake please let us know by reply and then delete it from your system; access, disclosure, copying, distribution or reliance on any of it by anyone else is prohibited.
If you as intended recipient have received this e-mail incorrectly, please notify the sender (via e-mail) immediately.

------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first<http://sprint.com/first> -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse at lists.sourceforge.net<mailto:Gmod-gbrowse at lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse



--
===> PLEASE KEEP RESPONSES ON THE LIST <===
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/ISMB_2010
http://gmod.org/wiki/Help_Desk_Feedback
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://brie4.cshl.edu/pipermail/gmod-help/attachments/20100714/95dbba07/attachment.html>


More information about the Gmod-help mailing list