suggestions requested for plant trait ontology
Susan McCouch
srm4 at cornell.edu
Wed Aug 22 17:41:24 EDT 2001
Dear Michael and Chris,
From an agricultural perspective, we are very interested in allelic
variation and therefore it is essential to be able to do the kind of
quantitative queries that you say the GO database is not designed to
do.
For example, a plant breeder will certainly want to be able to query
the database to find all germplasm samples that are resistant to
disease X, or to find those that are both resistant to disease X at a
minimal level and highly resistant to disease Y. We aim to structure
the TO to facilitate these kinds of queries and are trying to come up
with the best way to curate the data we have on mutants and
quantitative trait locus alleles and phenotypes.
Perhaps you have some insights as to how we can proceed. The
solution suggested by Chris and Leonore involving an orthogonal
combination of terms may work fine if we can combine Lincoln's OPtion
#2 (suggested in the previous e-mails) in which we present some
description of the meanings that the scores will have for the
different traits. THis will be required to avoid massive confusion
(i.e., the same scores will be used to indicate levels of
resistance/susceptibility, relative grain weight, plant height,
levels of pigmentation, etc).
To implement this approach in the further development of monocot
trait ontologies, we'd love to know when a working version of the new
tool being developed at Fly-Base might be available to us to see how
it meets our needs.
Susan
I also dislike option 1 for the same reasons as Michael.
However, option 2 worries me in that if I understand
your solution there
appears to be useful information burried in free text.
It seems to me that there are actually 3 classes of
concept here:
diseases, susceptibility scores, and the actual phenotypes.
So what about 3 subgraphs: one minimal one for
diseases, another with 10
entries (one per score) - these two subgraphs should
be orthogonal - and a
final subgraph of phenotypes. Each phenotype node
should have two parents,
one from disease, one from score. This way you're
representing the
knowledge that not all scores can be applied to all
diseases. This is
similar to the way GO is headed with anatomy + development.
The nodes should be named unambiguously eg leaf-blast-3
This still isn't ideal - the GO database wasn't
designed with quantative
data in mind, so it will be impossible to do
quantitive queries (eg find
me everything susceptible (threshold X) to disease Y).
Although with 10
scores this could be hacked (either at the application
level or by adding
less-than/more-than arcs between score nodes). This is
the kind of thing
that will spur us to make the database more generic
and useful to wider
groups of people.
This is a slightly experimental suggestion; it may be
best to stick with
option 2, effectively making TO a disease ontology,
but you effectively
lose scale information.
On Tue, 21 Aug 2001, Pankaj Jaiswal wrote:
> Dear Michael
>
> Thanks for the suggestions. We are also worried
about the scoring system, since
> there is a generic way of evaluation by scoring from
0-9, however all the 10
> scores (0-9), are not scored for each disease
(important thing here is to note
> that traits will cover not only the damage by
pathogens, but will also include
> response to environmental stress and plant's own
characteristics, eg.
> morphology/anatomy/growth and again each of these
sub-sub instances has a score
> 0-9 for evaluation). some may have all 10 other only
1,3,5,7,9 or else only
> 1,5,9 and so on. From my options the no. 2 seems
fine, but that never represents
> the usage of the CV/O with respect to resistance or
susceptibility. DO you think
> that for each disease i can have the susceptibility
and resistance nodes and
> their definitions will suggest the respective
infectivity score, if one wants to
> evaluate.
>
>
> -pankaj
>
>
> "Michael Ashburner (Genetics)" wrote:
> >
> > Pankaj
> >
> > I do not like option 1 at all (over and above the
fact that each term
> > must be lexically unique, it suffers from mixing
chalk (the diseases)
> > and cheese (the scoring system). It is a very good
principle that
> > within a DAG all terms should belong to the same
"semantic family"
> > - if I may create a phrase. So, option 2 it should
be. I thought of
> > creating a separate "score" DAG but think that
would be too cumbersome -
> > especially as the scores are disease specific.
> >
> > Michael
*****************************************************************
NOW HIRING BIOLOGICAL CURATORS & BIOINFORMATICS POSTDOCTORAL FELLOWS.
http://ars-genome.cornell.edu/rice/employment.html.
*****************************************************************
Susan McCouch Phone: 607-255-0420
Dept of Plant Breeding Fax: 607-255-6683
418 Bradfield Hall E-Mail: srm4 at cornell.edu
Cornell University
Ithaca, NY 14853-1901
More information about the Gramene
mailing list