[Po-dev] Shaping OBO Foundry naming conventions - survey: input sollicited
Daniel Schober
schober at ebi.ac.uk
Wed Jan 16 06:04:31 EST 2008
Dear Plant Ontology curators.
/[ This email is being sent at the request of the OBO Foundry
coordinators Barry Smith, Suzi Lewis, Chris Mungall and Michael Ashburner]/
I am approaching you regarding your role as an editor/curator of the
following ontologies:
*Plant growth and developmental stage ,
http://www.obofoundry.org/cgi-bin/detail.cgi?id=po_temporal
Plant structure ,
http://www.obofoundry.org/cgi-bin/detail.cgi?id=po_anatomy*
This is an opportunity for you and your ontology community to *help OBO
Foundry** shape **guidelines for naming conventions*.
I would appreciate *your input to our survey* on your current practice
in naming (labeling) entities within your ontology.
The questionnaire is copied below and it is also available at the OBO
Foundry wiki http://obofoundry.org/wiki/index.php/Naming
Please, fill out the attached *questionnaire *and send it back to me, or
alternatively let me know if you are available for a *phone *or *skype
interview *and your contact details.
The first phase of the survey has been completed and we have already
collected answers from* 34 *ontologies!
This second and last part of the survey will be closed by Feb 8th and
results posted at the OBO Foundry wiki page.
If you have any question, please don't hesitate to contact me. Thanks in
advance for your reply!
Best regards,
Daniel Schober
*
***
The OBO Foundry paper is out in Nature Biotechnology
http://www.nature.com/nbt/journal/v25/n11/pdf/nbt1346.pdf
*****
_naming conventions - SURVEY_
Scope
The overall goals of this survey are twofold:
1. to evaluate the current practice in naming entities within the
OBO ontology working groups
2. to move towards a common set of naming convention for the OBO Foundy
Further details on the rationale for this survey can be found in Schober
/et al/. 2007, "Towards naming conventions for use in controlled
vocabulary and ontology engineering",
_http://bio-ontologies.org.uk/download/Bio-Ontologies2007.pdf_, pages 29-32.
Results
Any questions as well as the completed questionnaires should be sent to
Daniel Schober (schober at ebi.ac.uk). The outcome will be posted on the
OBO Foundry wiki pages (http://obofoundry.org/wiki/index.php/Naming) and
all participants will get notified accordingly.
//
Questionnaire
_1. Questions on your ontology and its engineering and maintenance
process_
*1.1 *Are you and your co-workers familiar with the OBO Foundry
principles (http://www.obofoundry.org/crit.shtml)?
*1.2* If yours in an OBO Foundry ontology, have you started implementing
these principles and aligned your development process accordingly?
*1.3* Is work on your ontology closely associated with the need to
manage and formulate queries about a specific body of data? If so, can
you specify the type of data?
This question refers to the instance level and annotated data bodies. Is
the ontology developed in collaboration with the maintenance of a
database or collection of databases (as the GO is developed in
collaboration with UniProt and with various MOD databases)?
*1.4* Which ontology editor tool(s) do you use to build and manage your
ontology?
*1.5 *Please illustrate the editing process, i.e. is your ontology
developed in a centralised or distributed manner? In both cases, please,
state the number of people directly handling and editing the ontology
file and if they are physically distributed or if they are located in
one spot.
*1.6* Do you introduce special /helper classes/ or /bins/, that refer to
metadata to facilitate the engineering process? If yes, please describe
them and explain how do you indicate their special status?
Examples are 'obsolete classes', 'imported classes', 'deleted classes'.
_2. General questions on your current practice in naming entities
and their documentation_
*2.1* Have you developed naming conventions within your ontology
community? If yes, please specify how these are formulated, e.g. as a
specific, standalone document or as part of other documentation and
where these are available, e.g. provide URL.
*2.2 *(If your answer to 2.1 is positive) Which entities are tackled by
your naming conventions?
Some examples are provided below, add if required:
* class names
* relation names, e.g. colour vs has_colour vs colour_of
* instance names
* the name of the ontology, its versions, namespaces and term IDs
*2.3* (If your answer to 2.1 is negative) Have you re-used existing
naming conventions from other ontology groups?
For example have you applied the GO editor style guide or conventions
from other sources, e.g. ontology tutorials or guidelines from
standardization bodies such as ISO?
_3. Questions on the implementation of names_
*3.1 *Which of the following categories of names (or name types) do you
record in your ontology and which one do you anticipate such common
naming conventions are useful for?//
Some examples are provided below, add if required://
· preferred name
· short name or display name
· formal name
· synonym
· foreign language translations
· broader or narrower term
*3.2* (If applicable) Which ontology language idioms do you use to
capture the categories of names, listed in question 3.1? Please, provide
examples.
In OWL the preferred class name are probably captured using the
rdfs:label idiom and foreign language translation via another rdfs:label
idiom with the lang attribute set. Synonyms are probably captured by
some self-created OWL annotation properties or in OBO by the
'exact_synonym' idiom.
*3.3 *Do you think there is a need to expand the expressivity of the
ontology representation languages in order to provide more naming
flexibility? What elements are missing?
*3.4* Did you use any features/functions of your ontology editor to
check for consistency within the names?
In Protégé these features would be the browser key, PROMPT or
StringSearchTab, and the redundancy check functionality in OBO Edit.
Specialized software tools also exist to check naming conventions, e.g.
Validator (http://www.kismeta.com/Validtr.html)
_4. In depth questions on specific naming conventions_
*4.1 Explicit and concise names and context independence*
*4.1.1* Do you put any constraints on the use of natural language?
For example, do you omit nouns, articles or other words to ensure
shorter names, e.g. /'two dimensional J-resolved', /in place of /'the
two dimensional J-resolved pulse sequence'? /If yes, please describe.
*4.1.2* Do you make sure all the names are understandable on their own,
even when viewed outside of the immediate context?
*4.2 Compound names*
*4.2.1* Do you apply any conventions that help string matching? For
example, when creating compound names do you try to build the names out
of already defined building blocks, re-using the same words or
word-parts (affixes) present in other names and other representational
units?
For example, use 'x_part_of_process', 'y_part_of_process' and
'z_part_of_process' using consistently the string 'part of' (used and
defined elsewhere) instead of using also a synonymous strings e.g.
'x_component_of_process', 'y_part_of_process', 'z_portion_of_process',
introducing heterogeneity. GO for example re-uses the string
'development' in such a defined way all over in its class names.
*4.2.2* Do your names contain defined strings that have a special
defined meaning in each occurrence?
Defined strings could indicate administrative metadata, e.g. as in the
names '?device' or 'device refine'. GO used the 'sensu' string to
constrain validity, e.g. species specificity as in 'fruiting body
development (sensu Bacteria)' (GO:0030583, Note that the 'sensu'
practice is now discouraged in GO in favour of stating appropriate
differentia explicitly and avoiding the taxon in names and definitions,
e.g. 'cell wall (sensu bacteria)' becomes 'peptidoglycan-based cell
wall' ).
*4.2.3 *Have you developed any guidelines to create compound names?
For example conventions that demand a specific order of types of words
within the name.
*4.3 Homonyms*
*4.3.1 *How do you cope with homonyms and highly ambiguous names?
For example, one can try to avoid or disambiguate homonyms like 'set',
which can indicate a plurality as in 'protocol set', as well as an
action as in 'parameter set'. OpenCyc uses qualifiers as name suffixes
to disambiguate homonyms, e.g. 'Plant-the factory' vs. 'Plant-the organism'.
*4.4 Consistency of language*
*4.4.1* Do you consistently apply British or US English word forms?
For example using 'polymerising' vs. 'polymerizing' throughout.
*4.4.2* Do you encounter cases where inconsistency arises from including
words from different languages? If yes, please explain how you tackle
these inconsistencies.
For example, 'gut' is the English word for the Latin 'intestine'
*4.5 Noun and verb forms*
*4.5.1* Do you encode word forms consciously and in a consistent way
within your names and do you have any guidelines regarding word forms?
For example, using 'to be measured' (future) and 'measured' (past) or
the time-neutral noun form 'measurement' in certain circumstances only.
*4.6 Abbreviations and acronyms*
*4.6.1 *How do you handle acronyms and abbreviations and how do you deal
with widely used acronyms, .e.g. 'NMR' that would result in very long
name when resolved? Could a /cut-off/ be defined when an acronym can be
used in a name and still be intuitive and understandable throughout the
domain?
*4.7 Singularity*
*4.7.1* Do you capture plural or singular word forms throughout your
ontology? When capturing pluralities, do you use a consistent naming
convention?
For example one could restrict the usage of plurality indicators, e.g.
either 'Xs' 'X collection', 'X set', or 'aggregate of X' throughout.
*4.8 Positive names*
*4.8.1 *Do you apply negative names such as 'non-separation device'?
*4.8.2 *Do you explicitly exclude things within your names?
For example in gene ontology one can find 'hydrolase activity, acting on
carbon-nitrogen (but not peptide) bonds, in cyclic amides
<http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0016812>' (GO:0016812
<http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0016812>).
*4.9 Conjunctions*
*4.9.1* Do the names in your ontology contain logical connectives and
Boolean operators such as 'or' or 'and', e.g. as in 'strain or line' or
' antigen processing and presentation of peptide or polysaccharide
antigen via MHC class II
<http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0002504>' (GO:0002504
<http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0002504>)?
*4.9.2* Do the names in your ontology contain hyphens, slashes,
parenthesis or other symbols and if so, do you use them consistently? If
yes, please explain their meaning and use.
In many ontologies currently the hyphen and slashes are used with
different underlying meanings. Hyphens can indicate omissions in names
(ellipse or apocope), e.g. 'gene-technology' for 'gene modification
technology', indicate logical connectives, e.g. 'black-white', or
ranges, e.g. '10-100'.
*4.10 Taboo words*
*4.10.1* Do names in your ontology contain words that refer to the
representational units (e.g. class or relation) they are encoded in,
rather than to what is represented
For example as in 'protocol class', 'animal type', 'color attribute',
'ligand relation'.
*4.11 Typography*
*4.11.1* Which typographical convention, e.g. lower case, UPPERCASE,
mixed Case or CamelCase do you use for the categories of names (listed
in question 3.1)?
Under the OBO umbrella one can find 'MyClass' 'My Class', 'My-Class',
'My_Class', 'My_class' and 'my class' conventions, even within one
ontology and throughout different representational idioms. In the AI
community the convention to have classes starting upper-case and
relations and instances starting lower-case is common.
*4.11.2* Do you use sub- or superscripts or other text formatting to
encode additional information?
*4.11.3* Do you use any character as a word separator (such as '_', '-',
' ', etc. ) within compound names? If yes, please, explain the reason of
your choice.
For example, XML based languages, such as OWL, cannot have the space
separator, because they need to be a valid when part of URIs, where
space is not allowed. CamelCase is problematic for text mining since
indicators for word-borders are lost in CamelCase.
*4.11.4* How do you handle chemical element symbols, Greek symbols like
*a**,** *and other special characters like *° C** *?
*4.11.5* Do you have to handle product names or registered brand names?
If yes, how do you render their names intuitive (or do you capture them
as they are) ?
A brand name 'US 2', describing an NMR magnet, could be renamed by using
the company name as prefix, the product brand name as infix and the
product type (superclass) as headword/suffix, e.g. use 'Bruker US 2 NMR
magnet'.
*Lastly, is there any final comment you want to make, including
additional questions that should have been in this questionnaire? Has
every naming issue you came across been covered?*
*THANKS*
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://brie4.cshl.edu/pipermail/po-dev/attachments/20080116/27b21a95/attachment.html>
More information about the Po-dev
mailing list