QTL schema

Tue Apr 23 17:05:06 EDT 2002

On Mon Apr 22, 2002, Doreen Ware wrote:
> Hi Cornell,
> 
> I am working with Kuan this morning.  My understanding is that on
> Friday you had a meeting discussing the QTL and mutant database
> structure.  Kuan has communicated with me that there was some
> agreement that the QTL structure would not accomodate the mutants.
> Can you please generate a list of items that were in disagreement
> and a list of items that were confusing so Kuan and I can take a
> look at these together.

Hello Doreen,

The consensus was that mutant and QTL studies, although conceptually
related, are operationally quite different and need to be curated
differently.  The closer you get to locus/sequence, of course, the
more similar they become, but getting down to that level involves a
very dissimilar data trail.  Trying to force one into the other
doesn't make a whole lot of sense.

A QTL study is an instance of a map study.  You cannot separate a QTL
from its underlying genetic map.  A QTL is inherently defined as a
statistically signficant correlation between mapped markers and
phenotype.  The critical part is that the mapped markers define the
distribution of parental alleles in a population of segregants and the
information derived from those markers allows you to divide the
population into genotypic categories that are compared to the
phenotypic differences between groups.  If the difference is
significant, a putative QTL is declared.

In a mutant study, there is a much simpler kind of comparison employed
- a mutant individual is compared to its isogenic "wild type"
("reference variety").  This requires no map, no markers and no
statistics.  The mutation is defined as a variant of the reference
phenotype.

For these reasons, the schema required to curate these types of
studies are fundamentally different, though the same controlled
vocabulary can be used to describe the phenotypes, the environments in
which the phenotypes are observed, the germplasm, the developmental
stages, the locations (GIS, etc), and indeed, many of the mutants are
themselves variants at the same loci as the QTLs.

Though there is still little data that allows us to determine which
variant sequences are responsible for the mutant or quantitative
phenotypes we are curating, Gramene should facilitate the users'
ability to make inferences about the relationship between genotype and
phenotype by showing positional, phenotypic, functional, etc.,
correspondences.

There are certainly items in the mutant schema that Kuan is presenting
that are general-purpose enough and should be reused, not only for
QTL, but for Gramene as a whole:

    object_to_*
    geographical_location
    image
    developmental_stage

Some need to be modified to make them more general-purpose, because
they appear to be quite mutant-specific at the moment:

    germplasm_info
    allele

These tables are mutant-specific, and probably should remain so:

    study [rename to mutant study?]
    mutant
    rough_mutant_data
    mutant_synonym
    mutagenesis

Here are some strange tables:

    phenotype_expression
        - has a field for trait_score; what is it supposed to be used for?

    map_panel
        - This is a term used in MaizeDB for mapping populations, but
          as defined herein appears to be a quite different concept.
          This should be defined as in Ken's cmap_map_study.  

We might also want to separate the map_study from the map_panel info.
Also, us Cornellian dislike the term "mapping panel" and prefer to use
"mapping population".  So for maps and markers we may have:

    map_study
    map_population
    linkage_group
    locus
    marker
    marker_correspondence

Locus/marker go into the map viewer as "features".  Currently, the
distinction between locus and marker is rather blurred in Gramene.
I'd really like to maintain them as discrete objects: A marker is an
actual physical probe or phenotype while a (genetic) locus is the
position to which it maps.  This will make it easier to infer
correspondences based on marker+position.  A QTL can then be
associated with a particular set of loci on a particular linkage group
mapped in a specific population.

I'm developing some QTL-specific tables:

    qtl_experiment
    trait_description [or link to trait ontology?]
    trait_evaluation
    environment
    qtl_analysis
    qtl

Give me some time to do the conversion to SQL.  I'm also going to
develop some HTML mockups for QTL, so you can see how it all hangs
together.

Noel [& Susan]
-- 
Immanuel V. Yap <ivy1 at cornell.edu>        | It is the fate of 
Department of Plant Breeding and Biometry | operating systems 
Cornell University                        | to become free.  
G15 Bradfield Hall, Ithaca, NY 14853      |    - Neal Stephenson
office: 1-607-255-3103                    |