[Gmod-gbrowse] [Gmod-help] Re: [Gmod-schema] Error loading GFF3 file into chado

Scott Cain scott at scottcain.net
Fri Aug 14 17:04:01 EDT 2009


Hi Dave, Manny and everyone else now following this thread,

There are two problems that one might arise when trying to load this  
into Chado:

1. The missing srcfeature/seq_id info.  That is, there are no entries  
for the chromosomes and super contigs in the dataabase.  One could  
make use of gmod_fasta2gff3.pl to take a set of fasta files to  
generate GFF3 to load prior to loading this file to take care of that.

2. "Description" is not a valid GFF3 tag.  It should be "Note" or  
alternatively "description" (but Note is better).

Finally, when loading this data, you'd want to supply the --noexon  
flag so that the loader doesn't create exons that correspond to the  
CDS lines in the GFF, since the exons are already explicitly in the GFF.

Scott

On Aug 14, 2009, at 4:52 PM, Dave Clements wrote:

> Hi Scott,
>
> That was my guess on the file Manny is using.  He's actually using ftp://ftp.jgi-psf.org/pub/JGI_data/phytozome/v4.0/Sbicolor/annotation/Sbi1.4/Sbi1.4.gff3.gz
>
> Manny explicitly said that and Shu picked up on that (Thanks for  
> reading Manny's response carefully!).  I'm looking at the file now.   
> It does not have full-length entries, but I can's see why it  
> complains about super_10 in particular.
>
> Dave C.
>
> On Fri, Aug 14, 2009 at 1:47 PM, Scott Cain <scott at scottcain.net>  
> wrote:
> Hi Shu,
>
> The file in question:
>
> ftp://ftp.jgi-psf.org/pub/JGI_data/Sorghum_bicolor/v1.0/Sorbi1_GeneModels_Sbi1_4_Sbi1_4.gff.gz
>
> is definitely not GFF3.  The first couple of lines look like this:
>
> chr_1   JGI     exon    2164    2829    .       +       .       name
> "Sb01g000200"; transcriptId 5257096
> chr_1   JGI     CDS     2164    2829    .       +       0       name
> "Sb01g000200"; proteinId 5027909; exonNumber 1
>
> The 9th column needs to be in "tag=value" format and "name" should be
> capitalized, and it would be nice to have gene and mRNA features to go
> with the exon and CDS features, rather than depending on inferring
> their existance (the Chado loader will fail to do that, and the
> Bio::DB::SeqFeature::Store will probably be successful, but getting
> GBrowse to display it properly will likely not be easy).
>
> Scott
>
>
> On Aug 14, 2009, at 4:35 PM, Shengqiang Shu wrote:
>
> > This file is in format of GFF3, Is this loading problem due to file
> > not conforming to GFF3 standard? In term of SO terms, it uses gene,
> > mRNA, exon, CDS. No problem, is there?
> >
> > Only thing looks like missing is Name attribute. Is Name attribute
> > required? From scanning through doc on sequence ontology site, it
> > did not say Name is required.
> >
> > I know this file is produced by one of our members. If you guys
> > point out what is the problem with file in term of conforming to
> > GFF3 standard, it would be very helpful.
> >
> > By the way, FASTA in GFF3 is optional, isn't? Don't know why loader
> > can not handle it? ah, the file is annotation GFF3 file, it does not
> > contain genomic features. Is this causing the loading problem?
> >
> >
> > Shu
> >
> >
> > On Aug 14, 2009, at 12:54 PM, Chris Fields wrote:
> >
> >> Yikes, another GFF2 variant?
> >>
> >> chris
> >>
> >> On Aug 14, 2009, at 2:50 PM, Jason Stajich wrote:
> >>
> >>> I also have these script jg_gff2gff3.pl in gff_tools of http://github.com/hyphaltip/genome-scripts/tree/master
> >>> and
> >>> gff3_to_JGIgff.pl for converting back into JGI gff from GFF3.
> >>>
> >>> -jason
> >>>
> >>> On Aug 14, 2009, at 11:01 AM, Don Gilbert wrote:
> >>>
> >>>>
> >>>>> Has anyone out there written a JGI GFF2 to GFF3 parser
> >>>>
> >>>> I've written both ways: jgi to gff3, and gff3 to jgi-gff (which
> >>>> is a
> >>>> variant of gff2).
> >>>> Find here
> >>>> http://wfleabase.org/release1/current_release/supplement/jgi_gff/
> >>>> as jgi2gff.pl  (and dpulex_gff2jgi.pl  for the other way)
> >>>>
> >>>> This script will want some revision for your data, and the  
> variant
> >>>> of JGI's gff
> >>>> I worked with may differ from yours.  Look at the sample at the  
> top
> >>>> of the
> >>>> script to see if it matches yours.
> >>>>
> >>>> - Don
> >>>>
> >>>>  
> ------------------------------------------------------------------------------
> >>>> Let Crystal Reports handle the reporting - Free Crystal Reports
> >>>> 2008
> >>>> 30-Day
> >>>> trial. Simplify your report design, integration and deployment -
> >>>> and
> >>>> focus on
> >>>> what you do best, core application coding. Discover what's new  
> with
> >>>> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> >>>> _______________________________________________
> >>>> Gmod-gbrowse mailing list
> >>>> Gmod-gbrowse at lists.sourceforge.net
> >>>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
> >>>
> >>> --
> >>> Jason Stajich
> >>> jason.stajich at gmail.com
> >>> jason at bioperl.org
> >>>
> >>>
> >>>  
> ------------------------------------------------------------------------------
> >>> Let Crystal Reports handle the reporting - Free Crystal Reports  
> 2008
> >>> 30-Day
> >>> trial. Simplify your report design, integration and deployment -  
> and
> >>> focus on
> >>> what you do best, core application coding. Discover what's new  
> with
> >>> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> >>> _______________________________________________
> >>> Gmod-gbrowse mailing list
> >>> Gmod-gbrowse at lists.sourceforge.net
> >>> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
> >>
> >>
> >>  
> ------------------------------------------------------------------------------
> >> Let Crystal Reports handle the reporting - Free Crystal Reports
> >> 2008 30-Day
> >> trial. Simplify your report design, integration and deployment -
> >> and focus on
> >> what you do best, core application coding. Discover what's new with
> >> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> >> _______________________________________________
> >> Gmod-gbrowse mailing list
> >> Gmod-gbrowse at lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
> >
>
> -----------------------------------------------------------------------
> Scott Cain, Ph. D. scott at scottcain dot net
> GMOD Coordinator (http://gmod.org/) 216-392-3087
> Ontario Institute for Cancer Research
>
>
>
>
>
> ------------------------------------------------------------------------------
> Let Crystal Reports handle the reporting - Free Crystal Reports 2008  
> 30-Day
> trial. Simplify your report design, integration and deployment - and  
> focus on
> what you do best, core application coding. Discover what's new with
> Crystal Reports now.  http://p.sf.net/sfu/bobj-july
> _______________________________________________
> Gmod-gbrowse mailing list
> Gmod-gbrowse at lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/gmod-gbrowse
>
>
>
> -- 
> GMOD News: http://gmod.org/wiki/GMOD_News

-----------------------------------------------------------------------
Scott Cain, Ph. D. scott at scottcain dot net
GMOD Coordinator (http://gmod.org/) 216-392-3087
Ontario Institute for Cancer Research







More information about the Gmod-help mailing list