suggestions requested for plant trait ontology

Susan McCouch srm4 at cornell.edu
Wed Aug 22 17:41:24 EDT 2001


Dear Michael and Chris,

 From an agricultural perspective, we are very interested in allelic 
variation and therefore it is essential to be able to do the kind of 
quantitative queries that you say the GO database is not designed to 
do.

For example, a plant breeder will certainly want to be able to query 
the database to find all germplasm samples that are resistant to 
disease X, or to find those that are both resistant to disease X at a 
minimal level and highly resistant to disease Y.  We aim to structure 
the TO to facilitate these kinds of queries and are trying to come up 
with the best way to curate the data we have on mutants and 
quantitative trait locus alleles and phenotypes.

Perhaps you have some insights as to how we can proceed.  The 
solution suggested by Chris and Leonore involving an orthogonal 
combination of terms may work fine if we can combine Lincoln's OPtion 
#2 (suggested in the previous e-mails) in which we present some 
description of the meanings that the scores will have for the 
different traits.  THis will be required to avoid massive confusion 
(i.e., the same scores will be used to indicate levels of 
resistance/susceptibility, relative grain weight, plant height, 
levels of pigmentation, etc).

To implement this approach in the further development of monocot 
trait ontologies, we'd love to know when a working version of the new 
tool being developed at Fly-Base might be available to us to see how 
it meets our needs.

Susan




I also dislike option 1 for the same reasons as Michael.

                However, option 2 worries me in that if I understand 
your solution there
                appears to be useful information burried in free text.

                It seems to me that there are actually 3 classes of 
concept here:
                diseases, susceptibility scores, and the actual phenotypes.

                So what about 3 subgraphs: one minimal one for 
diseases, another with 10
                entries (one per score) - these two subgraphs should 
be orthogonal - and a
                final subgraph of phenotypes. Each phenotype node 
should have two parents,
                one from disease, one from score. This way you're 
representing the
                knowledge that not all scores can be applied to all 
diseases. This is
                similar to the way GO is headed with anatomy + development.

                The nodes should be named unambiguously eg leaf-blast-3

                This still isn't ideal - the GO database wasn't 
designed with quantative
                data in mind, so it will be impossible to do 
quantitive queries (eg find
                me everything susceptible (threshold X) to disease Y). 
Although with 10
                scores this could be hacked (either at the application 
level or by adding
                less-than/more-than arcs between score nodes). This is 
the kind of thing
                that will spur us to make the database more generic 
and useful to wider
                groups of people.

                This is a slightly experimental suggestion; it may be 
best to stick with
                option 2, effectively making TO a disease ontology, 
but you effectively
                lose scale information.

                On Tue, 21 Aug 2001, Pankaj Jaiswal wrote:

                > Dear Michael
                >
                > Thanks for the suggestions. We are also worried 
about the scoring system, since
                > there is a generic way of evaluation by scoring from 
0-9, however all the 10
                > scores (0-9), are not scored for each disease 
(important thing here is to note
                > that traits will cover not only the damage by 
pathogens, but will also include
                > response to environmental stress and plant's own 
characteristics, eg.
                > morphology/anatomy/growth and again each of these 
sub-sub instances has a score
                > 0-9 for evaluation). some may have all 10 other only 
1,3,5,7,9 or else only
                > 1,5,9 and so on. From my options the no. 2 seems 
fine, but that never represents
                > the usage of the CV/O with respect to resistance or 
susceptibility. DO you think
                > that for each disease i can have the susceptibility 
and resistance nodes and
                > their definitions will suggest the respective 
infectivity score, if one wants to
                > evaluate.
                >
                >
                > -pankaj
                >
                >
                > "Michael Ashburner (Genetics)" wrote:
                > >
                > > Pankaj
                > >
                > > I do not like option 1 at all (over and above the 
fact that each term
                > > must be lexically unique, it suffers from mixing 
chalk (the diseases)
                > > and cheese (the scoring system). It is a very good 
principle that
                > > within a DAG all terms should belong to the same 
"semantic family"
                > > - if I may create a phrase. So, option 2 it should 
be. I thought of
                > > creating a separate "score" DAG but think that 
would be too cumbersome -
                > > especially as the scores are disease specific.
                > >
                > > Michael

*****************************************************************
NOW HIRING BIOLOGICAL CURATORS & BIOINFORMATICS POSTDOCTORAL FELLOWS.
http://ars-genome.cornell.edu/rice/employment.html.
*****************************************************************

Susan McCouch                          Phone: 607-255-0420
Dept of Plant Breeding                 Fax: 607-255-6683
418 Bradfield Hall                     E-Mail: srm4 at cornell.edu
Cornell University
Ithaca, NY  14853-1901







More information about the Gramene mailing list