[Gmod-help] Loading sequences with no GI in chado

Scott Cain scott at scottcain.net
Sat Sep 26 22:39:38 EDT 2009


Hi Paul,

My best suggestion for you is to ask on the BioPerl mailing list if
the GenBank parser could be modified to work with this sort of input.
When you ask, I would suggest that you explain that you got this input
from someone else who will not reformat it into something a little
easier to use.

Failing getting BioPerl to work, I guess this would have to be handled
with a custom parser.

Scott


On Sat, Sep 26, 2009 at 3:51 AM, Paul Visendi <P.Visendi at cgiar.org> wrote:
> Hii Scott,
>
> Thanks for your quick reply.
>
> I had tried to do that but the parser did not work.i know i missed
> something.
>
> The files were from a sequencing project but the sequence has not been
> submitted to genbank.
>
> i have an original sample file in embl/genbank format with no headers too
>  ;-)
>
> I had removed the "FT" declarations so that that section resembles genbank
> files. and added header sections
>
> how would i best transform this file into a genbank file so that i can parse
> it as below with
>
> bp_genbank2gff3.pl so as to be able to load it into the database (Chado) as
> below?
>
>
>          bp_genbank2gff3.pl -noCDS -s chr_1.gb
>
> please find attached the file
>
>
> On Sep 25, 2009, at 4:17 PM, Scott Cain wrote:
>
>> Or, would it be possible to add text to it to make it look more like a
>> GenBank file so the BioPerl parser would be able to work with it?
>>
>>
>> On Fri, Sep 25, 2009 at 9:16 AM, Scott Cain <scott at scottcain.net> wrote:
>>>
>>> Hi Paul,
>>>
>>> So what you have is files formatted like GenBank files, but not actual
>>> GenBank files?  Where did the data come from?  If converting those
>>> isn't working properly, it is really a BioPerl problem (though still
>>> ultimately our problem :-)
>>>
>>> What does happen?  Would it be possible to go back to the originator
>>> of this file to get it in some other format?
>>>
>>> Scott
>>>
>>>
>>>
>>>
>>> On Fri, Sep 25, 2009 at 2:06 AM, Paul Visendi <P.Visendi at cgiar.org>
>>> wrote:
>>>>
>>>> Halo Help desk,
>>>>
>>>> We at the International Livestock Research Institute have sequenced a
>>>> strain
>>>> of Theileria parva .
>>>>
>>>> we have set up chado on postgresql and we are trying to store this
>>>> sequence
>>>> information in the chado schema.
>>>>
>>>> converting this sequence files to GFF3 has been a problem. How can this
>>>> be
>>>> done since the files we  have are in genbank format without accession
>>>> numbers, GI identifiers etc
>>>>
>>>> only the features section exists
>>>>
>>>> Thank you in advance,
>>>>
>>>> visendi
>>>>
>>>
>>>
>>>
>>> --
>>> ------------------------------------------------------------------------
>>> Scott Cain, Ph. D.                                   scott at scottcain
>>> dot net
>>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>>> Ontario Institute for Cancer Research
>>>
>>
>>
>>
>> --
>> ------------------------------------------------------------------------
>> Scott Cain, Ph. D.                                   scott at scottcain
>> dot net
>> GMOD Coordinator (http://gmod.org/)                     216-392-3087
>> Ontario Institute for Cancer Research
>
>
>



-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research




More information about the Gmod-help mailing list