FYI from GO FW: A _draft_ proposal re unknown terms

Mary (MaizeGDB) SchaefferM at missouri.edu
Thu Dec 15 11:16:33 EST 2005


I believe we have discussed this in the PO, and the use by GO was one reason
to allow in PO. However, this seems to be changing a bit.
  mary
------ Forwarded Message
From: "Michael Ashburner (Genetics)" <ma11 at gen.cam.ac.uk>
Reply-To: "Michael Ashburner (Genetics)" <ma11 at gen.cam.ac.uk>
Date: Thu, 15 Dec 2005 08:52:05 +0000 (GMT)
To: go at genome.stanford.edu
Cc: phismith at buffalo.edu
Subject: A _draft_ proposal re unknown terms

GO folk and Barry.  I attach a draft proposal and justification re
eliminating the pesky *_unknown terms in the GO.  I welcome feedback.

Michael

A proposal to eliminate *_unknown functions from the Gene Ontology.

MA. Draft 1. December 15 2005.

Each of the three axes of the Gene Ontology, molecular function, biological
process and cellular component, includes an 'unknown' term, i.e.,

molecular function unknown ; GO:0005554
biological process unknown ; GO:0000004
cellular component unknown ; GO:0008372

The GO documentation clearly states that these nodes are to be used only
when a 
curator
has searched for information concerning a particular gene product, either
computationally
or in the literature, and has failed to find any evidence for a _particular_
molecular
function (or biological process or cellular component).  The evidence code
to be 
used
is ND (no data), unless (a) there is a literature reference which
explicitely 
states
that (e.g.) the function of the gene product is unknown (in which case use
the 
evidence
code NAS or TAS and cite the reference), or (b) when the 'unknown' property
is 
inferred
by ISS with, e.g. an InterPro domain which is itself stated to have unknown
function.

There has been concern about these three nodes for some time and I am now
suggesting
that we change both the GO and documentation to eliminate them.  Instead,
gene
products that would (on the criteria set out above) have been annotated to
these
terms will now be annotated to the relevant root term, i.e.,

molecular function ; GO:0003674
biological process ; GO:0008150
cellular component ; GO:0005575

There is an assumption behind this proposal: it is that no gene product is
wholly
'useless'.  My view is that this is not an unreasonable assumption and that
if
there is ever rigorous evidence for a 'useless' gene product then we will
cross
that bridge.

The reason for this change is that the three 'unknown' nodes violate a
principle
of ontology development. At the moment they are direct 'isa' children of
their
root.  But this is, of course, nonsense: 'molecular function unknown' is NOT
a 
type of
'molecular function'.  This relationship is simply wrong.

One of the great benefits of the subsumption hierarchy built into the GO is
that
objects can be annotated with a parent node, rather than a more granular
leaf, 
so
as to indicate the shallow depth of knowledge at the time of annotation.
That 
is
precisely what we are suggesting.  A gene product that is now annotated with
an '* unknown' term would be annotated with the root - which says 'this gene
product
_has_ a molecular function (etc) but we do not know anything more specific
about 
it'.





------ End of Forwarded Message




More information about the Po-dev mailing list