[Gmod-help] no cvterm for ... errors with gmod_bulk_load_gff3.pl

Scott Cain cain.cshl at gmail.com
Thu Dec 4 22:47:29 EST 2008


Hello Michael,

First, I'm guessing you know Brian O'Connor?  He's still in the Nelson
lab, right?  Tell him I said hi.

Anyway, yes, Postgres is case sensitive, so it should be 'region' not
'Region'.  The term 'processed_transcript' doesn't appear in SO,
though 'mature_transcript' does.  As an aside, it's not clear to me
why processed_transcript isn't a synonym of mature_transcript, but it
doesn't matter at the moment, becuase the GFF loader doesn't support
synonyms (the GFF3 spec is relatively silent on synonyms--it says that
the term must come from SO, which to me implies it must be the proper
term, not a synonym).

At one time, there was a gmod_load_gff3.pl, but it used a very slow
technology for loading features and has fallen behind in terms of its
capabilities, which is why it wasn't included in the release.  Believe
it or not, the bulk loader is much faster.

Anyway, for fixing the GFF files you have: probably the easiest thing
to do is to use perl on the command line to fix the files manually.
For instance, this command:

  perl -pi.bak -e 's/\tRegion\t/\tregion\t/' name.of.gff.file

will fix the region problem you mentioned.  For processed_transcript,
I would convert it to mRNA, as it is a more standardly used term and
the bulk loader will behave better for the combination of gene, mRNA,
and exon or CDS.

Finally, when you do get the GFF fixed so that it conforms to the GFF3
standard, you might want to send it back to who ever generated it so
that can (hopefully) fix their generation process.

Scott


On Thu, Dec 4, 2008 at 10:04 PM, Michael Yourshaw <myourshaw at ucla.edu> wrote:
> I'm getting exceptions while attempting to load RefSeq into CHADO with
> gmod_bulk_load_gff3.pl. I was able to load S. cerevisiae OK, so I assume I
> have a mostly good installation. But the human ref seq files are failing
> with errors, depending on chromosome, such as
>
> MSG: no cvterm for processed_transcript
>
> or
>
> MSG: no cvterm for Region
>
>
>
> I found that there is a "region" but not a "Region", so this means pgsql is
> case sensitive?
>
>
>
> I followed the instructions for loading ontologies in the INSTALL.Chado file
> and selected all optional ontologies. Then I followed the Load RefSeq into
> Chado wiki up to the point where is says to run gmod_load_gff3.pl, could not
> be found, so I'm using gmod_bulk_load_gff3.pl.
>
>
>
> Can you give me some advice?
>
>
>
> Michael Yourshaw
>
> UCLA Geffen School of Medicine
> Department of Human Genetics, Nelson Lab
> 695 Charles E Young Drive S
> Gonda 5554
> Los Angeles CA 90095-8348 USA
> myourshaw at ucla.edu
> 970.691.8299
>
> This message is intended only for the use of the addressee and may contain
> information that is PRIVILEGED and CONFIDENTIAL, and/or may contain ATTORNEY
> WORK PRODUCT. If you are not the intended recipient, you are hereby notified
> that any dissemination of this communication is strictly prohibited. If you
> have received this communication in error, please erase all copies of the
> message and its attachments and notify us immediately. Thank you.
>
>



-- 
------------------------------------------------------------------------
Scott Cain, Ph. D.                                   scott at scottcain dot net
GMOD Coordinator (http://gmod.org/)                     216-392-3087
Ontario Institute for Cancer Research



More information about the Gmod-help mailing list