[Po-dev] Shaping OBO Foundry naming conventions - survey: input sollicited

Wed Jan 16 06:04:31 EST 2008

Dear Plant Ontology curators.

/[ This email is being sent at the request of the OBO Foundry 
coordinators Barry Smith, Suzi Lewis, Chris Mungall and Michael Ashburner]/

I am approaching you regarding your role as an editor/curator of the 
following ontologies:
*Plant growth and developmental stage , 
http://www.obofoundry.org/cgi-bin/detail.cgi?id=po_temporal
Plant structure , 
http://www.obofoundry.org/cgi-bin/detail.cgi?id=po_anatomy*

This is an opportunity for you and your ontology community to *help OBO 
Foundry** shape **guidelines for naming conventions*.
I would appreciate *your input to our survey* on your current practice 
in naming (labeling) entities within your ontology.
The questionnaire is copied below and it is also available at the OBO 
Foundry wiki http://obofoundry.org/wiki/index.php/Naming

Please, fill out the attached *questionnaire *and send it back to me, or 
alternatively let me know if you are available for a *phone *or *skype 
interview *and your contact details.

The first phase of the survey has been completed and we have already 
collected answers from* 34 *ontologies!
This second and last part of the survey will be closed by Feb 8th and 
results posted at the OBO Foundry wiki page.

If you have any question, please don't hesitate to contact me. Thanks in 
advance for your reply!

Best regards,
    Daniel Schober
*
***
The OBO Foundry paper is out in Nature Biotechnology
http://www.nature.com/nbt/journal/v25/n11/pdf/nbt1346.pdf
*****

  _naming conventions - SURVEY_

      Scope

The overall goals of this survey are twofold:

1.      to evaluate the current practice in naming entities within the 
OBO ontology working groups

2.      to move towards a common set of naming convention for the OBO Foundy

Further details on the rationale for this survey can be found in Schober 
/et al/. 2007, "Towards naming conventions for use in controlled 
vocabulary and ontology engineering", 
_http://bio-ontologies.org.uk/download/Bio-Ontologies2007.pdf_, pages 29-32.

      Results

Any questions as well as the completed questionnaires should be sent to 
Daniel Schober (schober at ebi.ac.uk). The outcome will be posted on the 
OBO Foundry wiki pages (http://obofoundry.org/wiki/index.php/Naming) and 
all participants will get notified accordingly.

//

Questionnaire

      _1. Questions on your ontology and its engineering and maintenance
      process_

*1.1 *Are you and your co-workers familiar with the OBO Foundry 
principles (http://www.obofoundry.org/crit.shtml)?

*1.2* If yours in an OBO Foundry ontology, have you started implementing 
these principles and aligned your development process accordingly?

*1.3* Is work on your ontology closely associated with the need to 
manage and formulate queries about a specific body of data? If so, can 
you specify the type of data?

This question refers to the instance level and annotated data bodies. Is 
the ontology developed in collaboration with the maintenance of a 
database or collection of databases (as the GO is developed in 
collaboration with UniProt and with various MOD databases)?

*1.4* Which ontology editor tool(s) do you use to build and manage your 
ontology?

*1.5 *Please illustrate the editing process, i.e. is your ontology 
developed in a centralised or distributed manner? In both cases, please, 
state the number of people directly handling and editing the ontology 
file and if they are physically distributed or if they are located in 
one spot.

*1.6* Do you introduce special /helper classes/ or /bins/, that refer to 
metadata to facilitate the engineering process? If yes, please describe 
them and explain how do you indicate their special status?

Examples are 'obsolete classes', 'imported classes', 'deleted classes'.

      _2. General questions on your current practice in naming entities
      and their documentation_

*2.1* Have you developed naming conventions within your ontology 
community? If yes, please specify how these are formulated, e.g. as a 
specific, standalone document or as part of other documentation and 
where these are available, e.g. provide URL.

*2.2 *(If your answer to 2.1 is positive) Which entities are tackled by 
your naming conventions?

Some examples are provided below, add if required:

    * class names
    * relation names, e.g. colour vs has_colour vs colour_of
    * instance names
    * the name of the ontology, its versions, namespaces and term IDs

*2.3* (If your answer to 2.1 is negative) Have you re-used existing 
naming conventions from other ontology groups?

For example have you applied the GO editor style guide or conventions 
from other sources, e.g. ontology tutorials or guidelines from 
standardization bodies such as ISO?

      _3. Questions on the implementation of names_

*3.1 *Which of the following categories of names (or name types) do you 
record in your ontology and which one do you anticipate such common 
naming conventions are useful for?//

Some examples are provided below, add if required://

·         preferred name

·         short name or display name

·         formal name

·         synonym

·         foreign language translations

·         broader or narrower term

*3.2* (If applicable) Which ontology language idioms do you use to 
capture the categories of names, listed in question 3.1? Please, provide 
examples.

In OWL the preferred class name are probably captured using the 
rdfs:label idiom and foreign language translation via another rdfs:label 
idiom with the lang attribute set. Synonyms are probably captured by 
some self-created OWL annotation properties or in OBO by the 
'exact_synonym' idiom.

*3.3 *Do you think there is a need to expand the expressivity of the 
ontology representation languages in order to provide more naming 
flexibility? What elements are missing?

*3.4* Did you use any features/functions of your ontology editor to 
check for consistency within the names?

In Protégé these features would be the browser key, PROMPT or 
StringSearchTab, and the redundancy check functionality in OBO Edit. 
Specialized software tools also exist to check naming conventions, e.g. 
Validator (http://www.kismeta.com/Validtr.html)

      _4. In depth questions on specific naming conventions_

*4.1 Explicit and concise names and context independence*

*4.1.1* Do you put any constraints on the use of natural language?

For example, do you omit nouns, articles or other words to ensure 
shorter names, e.g. /'two dimensional J-resolved', /in place of /'the 
two dimensional J-resolved pulse sequence'? /If yes, please describe.

*4.1.2* Do you make sure all the names are understandable on their own, 
even when viewed outside of the immediate context?

*4.2 Compound names*

*4.2.1* Do you apply any conventions that help string matching? For 
example, when creating compound names do you try to build the names out 
of already defined building blocks, re-using the same words or 
word-parts (affixes) present in other names and other representational 
units?

For example, use 'x_part_of_process', 'y_part_of_process' and 
'z_part_of_process' using consistently the string 'part of' (used and 
defined elsewhere) instead of using also a synonymous strings e.g.  
'x_component_of_process', 'y_part_of_process', 'z_portion_of_process', 
introducing heterogeneity. GO for example re-uses the string 
'development' in such a defined way all over in its class names.

*4.2.2* Do your names contain defined strings that have a special 
defined meaning in each occurrence?

Defined strings could indicate administrative metadata, e.g. as in the 
names '?device' or 'device refine'. GO used the 'sensu' string to 
constrain validity, e.g. species specificity as in 'fruiting body 
development (sensu Bacteria)' (GO:0030583, Note that the 'sensu' 
practice is now discouraged in GO in favour of stating appropriate 
differentia explicitly and avoiding the taxon in names and definitions, 
e.g. 'cell wall (sensu bacteria)' becomes 'peptidoglycan-based cell 
wall' ).

*4.2.3 *Have you developed any guidelines to create compound names?

For example conventions that demand a specific order of types of words 
within the name.

*4.3 Homonyms*

*4.3.1 *How do you cope with homonyms and highly ambiguous names?

For example, one can try to avoid or disambiguate homonyms like 'set', 
which can indicate a plurality as in 'protocol set', as well as an 
action as in 'parameter set'. OpenCyc uses qualifiers as name suffixes 
to disambiguate homonyms, e.g. 'Plant-the factory' vs. 'Plant-the organism'.

*4.4 Consistency of language*

*4.4.1* Do you consistently apply British or US English word forms?

For example using 'polymerising' vs. 'polymerizing' throughout.

*4.4.2* Do you encounter cases where inconsistency arises from including 
words from different languages? If yes, please explain how you tackle 
these inconsistencies.

For example, 'gut' is the English word for the Latin 'intestine'

*4.5 Noun and verb forms*

*4.5.1* Do you encode word forms consciously and in a consistent way 
within your names and do you have any guidelines regarding word forms?

For example, using 'to be measured' (future) and 'measured' (past) or 
the time-neutral noun form 'measurement' in certain circumstances only.

*4.6 Abbreviations and acronyms*

*4.6.1 *How do you handle acronyms and abbreviations and how do you deal 
with widely used acronyms, .e.g. 'NMR' that would result in very long 
name when resolved? Could a /cut-off/ be defined when an acronym can be 
used in a name and still be intuitive and understandable throughout the 
domain?

*4.7 Singularity*

*4.7.1* Do you capture plural or singular word forms throughout your 
ontology? When capturing pluralities, do you use a consistent naming 
convention?

For example one could restrict the usage of plurality indicators, e.g. 
either 'Xs' 'X collection', 'X set', or 'aggregate of X' throughout.

*4.8 Positive names*

*4.8.1 *Do you apply negative names such as 'non-separation device'?

*4.8.2 *Do you explicitly exclude things within your names?

For example in gene ontology one can find 'hydrolase activity, acting on 
carbon-nitrogen (but not peptide) bonds, in cyclic amides 
<http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0016812>' (GO:0016812 
<http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0016812>).

*4.9 Conjunctions*

*4.9.1* Do the names in your ontology contain logical connectives and 
Boolean operators such as 'or' or 'and', e.g. as in 'strain or line' or 
' antigen processing and presentation of peptide or polysaccharide 
antigen via MHC class II 
<http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0002504>' (GO:0002504 
<http://www.ebi.ac.uk/ego/DisplayGoTerm?id=GO:0002504>)?

*4.9.2* Do the names in your ontology contain hyphens, slashes, 
parenthesis or other symbols and if so, do you use them consistently? If 
yes, please explain their meaning and use.

In many ontologies currently the hyphen and slashes are used with 
different underlying meanings. Hyphens can indicate omissions in names 
(ellipse or apocope), e.g. 'gene-technology' for 'gene modification 
technology', indicate logical connectives, e.g. 'black-white', or 
ranges, e.g. '10-100'.

*4.10 Taboo words*

*4.10.1* Do names in your ontology contain words that refer to the 
representational units (e.g. class or relation) they are encoded in, 
rather than to what is represented

For example as in 'protocol class', 'animal type', 'color attribute', 
'ligand relation'.

*4.11 Typography*

*4.11.1* Which typographical convention, e.g. lower case, UPPERCASE, 
mixed Case or CamelCase do you use for the categories of names (listed 
in question 3.1)?

Under the OBO umbrella one can find 'MyClass' 'My Class', 'My-Class', 
'My_Class', 'My_class' and 'my class' conventions, even within one 
ontology and throughout different representational idioms. In the AI 
community the convention to have classes starting upper-case and 
relations and instances starting lower-case is common.

*4.11.2* Do you use sub- or superscripts or other text formatting to 
encode additional information?

*4.11.3* Do you use any character as a word separator (such as '_', '-', 
' ', etc. ) within compound names? If yes, please, explain the reason of 
your choice.

For example, XML based languages, such as OWL, cannot have the space 
separator, because they need to be a valid when part of URIs, where 
space is not allowed. CamelCase is problematic for text mining since 
indicators for word-borders are lost in CamelCase.

*4.11.4* How do you handle chemical element symbols, Greek symbols like 
*a**,** *and other special characters like *° C** *?

*4.11.5* Do you have to handle product names or registered brand names? 
If yes, how do you render their names intuitive (or do you capture them 
as they are) ?

A brand name 'US 2', describing an NMR magnet, could be renamed by using 
the company name as prefix, the product brand name as infix and the 
product type (superclass) as headword/suffix, e.g. use 'Bruker US 2 NMR 
magnet'.

*Lastly, is there any final comment you want to make, including 
additional questions that should have been in this questionnaire? Has 
every naming issue you came across been covered?*

*THANKS*

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://brie4.cshl.edu/pipermail/po-dev/attachments/20080116/27b21a95/attachment.html>