[Gmod-help] cvterm dbxref errors on loading into Chado
Barry Dancis
bdancis at digiconasp.com
Mon May 4 16:17:35 EDT 2009
Hi--
I am trying to load the Anopheles genome as per the GMODTools TestCase. As per instructions, I added
"pseudogenic_tRNA" => "pseudogenic_region",
to the TypeMapper.pm FT_SO_map sub hash but got the error:
MSG: no cvterm for pseudotRNA
so I added
"pseudotRNA" => "pseudogenic_region",
which didn't help and then I realized that I needed to change Chado. I added the records:
cvterm table: 60001;10;"pseudotRNA";"A non functional descendent of a tRNA.";60000;0;0
dbxref table: 60000;149;"194303198";"1";""
and the pseudotRNA errors disappeared but then I got:
MSG: no cvterm for processed_transcript
There is not mention of this in the TestCase docs. Is there an error in the docs or is my db defective? When I added the records
cvterm table: 60003;10;"processed_transcript";"most general form of RNA";60002;0;0
dbxref table: 60002;152;"194303200";"1";""
the cvterm errors disappeared, but I did get:
DBD::Pg::db pg_endcopy failed: ERROR: duplicate key value violates unique constraint "dbxref_pkey"
CONTEXT: COPY dbxref, line 513: "60000 149 158289335 1 \N" at /usr/local/share/perl/5.10.0/Bio/GMOD/DB/Adapter.pm line 2723, <$fh> line 6597.
I am not too surprised that adding records to Chado by inventing values(I did at least try to make them unique) still resulted in errors. Are there specific values I should have used? Are there other tables/records that needed to be changed?
Is ther some reason why the mapping
"pseudotRNA" => "pseudogenic_region",
didn't work; there is a pseudogenic_region in the cvterm table. I had to add a pseudotRNA cvterm record and then the mapping wasn't even needed.
When I looked in the gff3 file for pseudo (egrep -n 'pseudo' gff3/NC_004818.gbk.gff), I found records of the sort:
5347:NC_004818 GenBank pseudotRNA 13578567 13578639 . - . ID=AgaP_AGAP000742.r01;Parent=AgaP_AGAP000742;Dbxref=VectorBase:AGAP000742;locus_tag=AgaP_AGAP000742;old_locus_tag=AgaP_ENSANGG00000025243;product=tRNA-OTHER;pseudo=_no_value
The gbk file (egrep -n -A 1 -B 4 'pseudo' gbk/NC_004818.gbk) had:
32229- tRNA complement(13578567..13578639)
32230- /locus_tag="AgaP_AGAP000742"
32231- /old_locus_tag="AgaP_ENSANGG00000025243"
32232- /product="tRNA-OTHER"
32233: /pseudo
32234- /db_xref="VectorBase:AGAP000742"
Thus, it appears that the gbk2gff process creates the pseudotRNA identifier, but then the mapping did not occur during load so that cvterm needed an extra entry. Is that correct?
Thanks for your help,
Barry
More information about the Gmod-help
mailing list