[Gmod-help] RE: [Gmod-gbrowse] Entering GenBank / Refseq file: fail to display in GBrowse2
Chevreux, Bastien
Bastien.Chevreux at dsm.com
Wed Jul 14 08:02:14 EDT 2010
(sorry, forgot the list on the CC the first time)
Hello Dave,
thank you very much for asking back. You've got a good tracking system for unanswered questions :-)
I think I found the reason yesterday evening and wanted to work out a solution today so that I could propose something workable before answering to myself on the list. But as you asked ... here's what I found out and where my thoughts are going to.
My first guess was that something with the MySQL adaptor was broken, but after reproducing the Problem with SQLite I ditched that assumption. I then tried out the Volvox test data (as I should have done much earlier) and these worked flawlessly in SQLite (haven't tried MySQL yet, but I expect it to work there also). This pointed at the GFF3 itself. Comparing the Volvox GFF3 and the GFF3 created via bp_genbank2gff3, I stumbled across this comment in the Volvox GFF3:
# The contig establishes coordinates for segment of sequence
# It needs a name so that users can look it up, but it doesn't need an ID
# because it has no children.
Indeed, the GFF3 produced by bp_genbank2gff3 does not have a single "Name=" in it. It has "ID=", but no "Name=". Adding a few "Name=" by hand I could restore functionality for the entries which then had a name.
Looks like there were changes in GBrowse2 in the search functionality where the bp_genbank2gff3 script does not provide data for at the moment.
How to fix? Well, a couple of solutions come to my mind, but I do not know which one is "the best".
1. As immediate fix, I just hacked a sed script which simply adds a "Name=" to every entry having an "ID="
sed -e 's/ID=\([^;]*\)/&;Name=\1/' nc_000964.gbk.gff >nc_000964.gbk.gff.forgb2
This will of course cause problems for GFF files already having "ID=" and "Name=", but for files coming from the current bp_genbank2gff3 it seems OK.
The next two possibilities are tightly linked to how GBrowse2 performs its searches and how the authors define search functionality. As I currently don't know enough of this, please read the following just as "ideas":
1. I could imagine that changing "bp_genbank2gff3" itself might be a good option. Just adapt it to the fact that GBrowse2 wants "Name=" to be able to search.
2. Changing the (search) functionality of GBrowse2 might also be something worthwhile: every time a "Name=" is missing, use "ID=" (if existing) as substitute. Though this might have side-effects I don't know of yet.
What do you think?
Best,
Bastien
PS: while we are at it: bp_genbank2gff3 replaces a number of characters with encoded strings (e.g. %3C). GBrowse2 does not decode these back to normal characters.
--
DSM Nutritional Products AG
R&D Human Nutrition & Health
Bioinformatics - Bldg. 203.4 / 188
P.O. Box 2676
CH-4002 Basel / Switzerland
Tel. +41 61 815 8264
________________________________
From: gmodhelp at googlemail.com [mailto:gmodhelp at googlemail.com] On Behalf Of Dave Clements, GMOD Help Desk
Sent: Mittwoch, 14. Juli 2010 01:15
To: Chevreux, Bastien
Cc: gmod-gbrowse at lists.sourceforge.net
Subject: Re: [Gmod-gbrowse] Entering GenBank / Refseq file: fail to display in GBrowse2
Hi Bastien,
Did you ever figure out this problem? If not, I'll spend some time seeing what I can figure out.
Dave C.
On Mon, Jul 5, 2010 at 6:24 AM, Chevreux, Bastien <Bastien.Chevreux at dsm.com<mailto:Bastien.Chevreux at dsm.com>> wrote:
Hello there,
I've got GBrowse2 (the latest one could get by installing it last Friday) up and running fine with the demo data. I also got it running with some small own test data using the MySQL database, but now I'm running into a problem when I try it with "real" data from the NCBI.
Synopsis:
After having entered Genbank data from NCBI in a MySQL database, display of search results yields in a blank page in the browser and the following entry in the Apache error log:
[Mon Jul 05 14:28:55 2010] [error] [client 172.25.140.138] Can't call method "seq_id" on an undefined value at /usr/local/lib/perl/5.10.0/Bio/Graphics/Browser2/Render.pm
line 3640., referer: http://chkau66uxas150.kau12.dsm-group.com:2014/cgi-bin/gbrowse/bsub1682/"
How to reproduce:
I looked up the current "Tutorial" file in the GBrowse installation and used that as template. Here's what I did:
1. Download NC_000964 as GenBank format (including the sequence) from NCBI: http://www.ncbi.nlm.nih.gov/nuccore/NC_000964 The file should be some 11 to 12 MiB.
2. Transform to GFF3:
bp_genbank2gff3 nc_000964.gbk
3. In a file (bla.sql), save
drop database if exists gb2db_bs1682;
create database gb2db_bs1682;
grant all privileges on gb2db_bs1682.* to gbrowse2 at localhost;
grant select on gb2db_bs1682.* to nobody at localhost;
quit
4. Then create MySQL database:
mysql -uroot -p <bla.sql
5. Enter GFF3 file into MySQL database:
bp_seqfeature_load -c -u gbrowse2 -p ... -a DBI::mysql -d gb2db_bs1682 nc_000964.fasta nc_000964.gbk.gff
6. In /etc/gbrowse2/GBrowse.conf, add:
[bsub1682]
description = Bacillus subtilis 168
path = bsub1682.conf
7. In /etc/gbrowse2/bsub1682.conf:
[GENERAL]
description = B. subtilis 168 test in MySQL DB
db_adaptor = Bio::DB::SeqFeature::Store
db_args = -dsn gb2db_bs1682
-user nobody
-pass ""
+ the usual other configuration for a simle genome browser.
8. Navigate with Browser to the search page of the new genome, enter a search term (e.g. "hisE") and search -> blank page
I'll readily admit that I'm novice regarding GBrowse, the likelihood of having done something wrong is not 0. E.g., in the above "bp_seqfeature_load" call I did not add a file with FASTA data as I thought it to be unnecessary (the GFF3 contains everything needed). On the other hand, also uploading a FASTA file didn't change anything in the results.
What have I been doing wrong, or is this a genuine bug?
Best,
Bastien
--
DSM Nutritional Products AG
R&D Human Nutrition & Health
Bioinformatics - Bldg. 203 / 115
P.O. Box 2676
CH-4002 Basel / Switzerland
Tel. +41 61 815 8264
________________________________
DISCLAIMER :
This e-mail is for the intended recipient only
If you have received it by mistake please let us know by reply and then delete it from your system; access, disclosure, copying, distribution or reliance on any of it by anyone else is prohibited.
If you as intended recipient have received this e-mail incorrectly, please notify the sender (via e-mail) immediately.
------------------------------------------------------------------------------
This SF.net email is sponsored by Sprint
What will you do first with EVO, the first 4G phone?
Visit sprint.com/first<http://sprint.com/first> -- http://p.sf.net/sfu/sprint-com-first
_______________________________________________
Gmod-gbrowse mailing list
Gmod-gbrowse at lists.sourceforge.net<mailto:Gmod-gbrowse at lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
--
===> PLEASE KEEP RESPONSES ON THE LIST <===
http://gmod.org/wiki/GMOD_News
http://gmod.org/wiki/ISMB_2010
http://gmod.org/wiki/Help_Desk_Feedback
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://brie4.cshl.edu/pipermail/gmod-help/attachments/20100714/95dbba07/attachment.html>
More information about the Gmod-help
mailing list