Botanischer Garten und Botanisches Museum, 2k

Biodiversitätsinformatik / Biodiversity Informatics
Botanischer Garten und Botanisches Museum Berlin-Dahlem

MoreTax

Definition of semantics for serial relationships
 and formulation of a rule catalogue based on it 

Table of contents:  Introduction | Inference rules | Rule adjustments 

1. Introduction

The input of relationships between potential taxa [1] and their consequences for the quality of the transmitted factual information can be conceived as parts of an "expert system":

  • The knowledge base module comprises the potential taxa, the relationships between them, the factual information linked to these potential taxa as well as the rules relative to relationships and to factual information.

  • The inference engine module provides new knowledge, as the rules are applied to certain facts (potential taxa, relationships and factual information).

  • The explanation module allows to make transparent the proceeding of the system and informs  the user about both the solution method and other reasons for the results inferred. 

  • Finally the data acquisition module involves those mechanisms, which are used for the mantainance and update of the knowledge base. These mechanisms are the taxonomic editor (for the input of potential taxa and of their relationships) and the rule adjustment interface (for parameterizing some inference rules and output functions).

The following illustration outlines the architecture of such an expert system with World Wide Web interface.

BACK

 

2. Inference rules

To these inference rules belong in particular:

  • Rules that define the operations on relationships

  • Rules for the computation of the derived final relationship between two potential Taxa

  • The rule for evaluating transmitted factual information

 

2.1. Rules that define the operations on relationships[2]

  1. Strong agreement:
    Intersection of two combined relationships.

  2. Weak agreement: 
    Union of two combined relationships.

  3. Reversal:
    Computation of the combined relationship B2 between potential taxa PT2 and PT1, if B1 is the combined relationship between PT1 and PT2.

  4. Negation:
    Computation of the combined relationship B2, if the relationship B1 is negated.

  5. Concatenation:
    Computation of the combined relationship B3 between potential Taxa PT1 and PT3, if on the one hand B1 is the combined relationship between PT1 and PT2 and on the other hand B2 the one between PT2 and PT3.

 

2.2. Rules (and conditions) for the computation of the derived final relationship between an origin potential taxon PTo and a target potential taxon PTt

  1. Paths between PTo and PTt are built as a succession of contiguous edges. Within a path no  potential taxon may appear more than once (avoidance of cycles). Within a path two contiguous edges with systematic (biology) relationships may only occur if these prove to have the same hierarchical direction (in the sense of the rank hierarchy).

  2. If in a path an edge comes into question, for which other edges with exactly the same boundary potential taxa exist (with respective relationships specified by different relationship authors), then a result relationship valid for this potential taxon pair will be calculated on the basis of the weights [3].

  3. Each path is evaluated by assigning to it a relationship, on the basis of the concatenation of the respective (edge) relationships and on the basis of the edge weights. Moreover, each path is also weighted on the basis of the edge weights.

  4. Paths have to be broken off as soon as their assigned combined relationship consists of all basic relationships

  5. The derived final relationship between PTo and PTt is calculated on the one side on the basis of the evaluations for all paths between them and on the other side on the basis of the path weights.

 

2.3. The rule for evaluating transmitted factual information

  1. The quality of transmitted factual information[4] is calculated on the basis of the quality category at PTo and of the derived final relationship between PTo and PTt.

BACK

 

3. Rule adjustments (parameter definition)

The administrators must be able to define user-role-specific frameworks, within which users can choose further modifications and restrictions in the moment they formulate queries to the system. This framework is nothing else as a set of values to be stored and which influence following processes:

  • The inclusion of edges for the path building is additionally tuned by weighting edges in function of the relationship authors and by fixing a minimum weight, underneath which edges are not considered.
    This weights as well as the exclusion threshold are passed as parameters to the rules 6 and 7. The exclusion threshold is also passed to the rule 10.

  • The calculation of paths is restricted by fixing the maximum path length allowed, the maximum number of included systematic classification edges (higher and lower potential taxa are treated separately and the maximum can be set valid either for all sources together or per source) as well as by weighting relationships (e.g. a congruence relationship could be ignored for the calculation of the path length).
    These four values and these new weights for length are passed as parameters to the rule 6.

  • Weighting paths is conditioned by the mathematical operation used on edge weights.
    The choice for this operation is passed as a parameter to the rules 7 and 8.

  • The handling of "simultaneous" paths is specified by admissible intervals (which depend on actual maximum weights) and by the relationship operator ("strong agreement" or "weak agreement").
    These values are passed as parameter to the rules 7  and 10.

  • The output for the users is not controlled by the inference module but by a so far not specified "output module" with own procedures and rules. Nevertheless  it seems opportune to handle initialisation parameter for differentiated outputs within the present framework. These are:

    a)    The lowest quality of the transmitted factual information, underneath which the output of any factual information is not allowed.

    b)   Combinations of user role, source of factual information, lowest quality of transmitted factual information and comment. The output of factual information (with a corresponding comment) depends upon the user role and the source.
    Kombinationen aus Nutzerrolle, Sachdatenquelle, niedrigste Übertragungsqualität und Kommentar. Je nach Nutzerrolle und Sachdatenquelle wird die Anzeige der Sachdaten (versehen mit einem entsprechenden Kommentar) gesteuert.

    c)   Combinations of user role, factual information access restriction, lowest quality of transmitted factual information and comment. The output of factual information (with a corresponding comment) depends upon the user role and the restriction access.

Summarising, we deal here with influencing rules and procedures through the setting of parameter values. These parameter have to be stored. This could happen in a configuration file. XML files are a valid option for managing and structuring such a configuration.

 

Example of a XML file for configuration:

  <?xml version="1.0" encoding="UTF8" ?>

  <Parameters Who="Mgeo" When="20-05-02" xmlns:xsi="http://www.w3.org/2001/XMLSchema  instance" xsi:noNamespaceSchemaLocation="Parameters2.xsd">

  <AuthorPriority NumOperation="Minimum">

  <AuthorID Weight="100">10</AuthorID> 

  <AuthorID Weight="80">24</AuthorID> 

  </AuthorPriority>

  <Simultaneity StandardDistance="20" StandardOperator="Intersection" StandardExclude="10">

  <IntersectionByWeight FromWeight="10" ToWeight="75" Distance="30">false</IntersectionByWeight>

  </Simultaneity>

  <PathLength Length="8" MaxLowerTaxa="0">

  <RelationshipWeight RelationshipNumber="1">0</RelationshipWeight>

  </PathLength>

  <MinInfoCategory>maybeForSome</MinInfoCategory> 

  <SourceEvaluation>

  <Source Id="12">

  <allUsers>

  <Comment>No scientific source </Comment>

  <MinInfoCategory Doubt="false">forSome</MinInfoCategory>

  </allUsers>

  </Source>

  </SourceEvaluation>

  <InfoEvaluation>

  < ConfidentialityCat Confidentiality ="restricted">

  <allUsers>

  <MinInfoCategory>forSome</MinInfoCategory>

  </allUsers>

  <User>

  <UserType>Layman</UserType>

  <MinInfoCategory>doNotUse</MinInfoCategory>

  </User>

  </ConfidentialityCat>

  </InfoEvaluation>

  </Parameters>

 

The following table makes clear the structured content of this XML file:

An appropriate grammar for possible configuration files is defined in the file Parameters1.xsd[5] (a XML-Schema[6]).

BACK

Marc Geoffroy, January 2002

First version (German only): August 2001
Revised second (German and English) version: June 2002


[1] We regard here as potential taxon the set of elements that belong implicitly or explicitly to the circumscription of the potential taxon.

[2] See http://www.bgbm.org/BioDivInf/Projects/MoreTax/Standard_liste_en.htm

[3] Initially weights characterise edges, depending on the relationship authors. See section 3

[4] We consider four categories concerning the scope of the statement for the elements of a potential taxon:
„fully applicable“ if the factual information applies to all elements,
“partially applicable” if the f actual information applies to some elements,
“doubtful applicable” if the f actual information might apply to some elements and
„not applicable“
if there is no reason whatsoever why the factual information should apply to any element.

[5] For a detailed description of the schema see: http://www.bgbm.org/BioDivInf/Projects/MoreTax/Parameters_Schema.html.
The schema itself can be visualized with any Browser supporting XML at: http://www.bgbm.org/BioDivInf/Projects/MoreTax/Parameters.xsd

[6] See http://www.w3.org/XML/Schema

 

____________________________________________________________________________

 

MoreTax (Rule-based association of taxonomic concepts) is a research and development project  financed by the Federal Agency for Nature Conservation of the German Ministry of the Environment.

Project co-ordinator: Walter Berendsohn
Project scientist: Marc Geoffroy

 

This page last updated on 21-06-2002

Abt. Biodiversitätsinformatik /
Dept. of Biodiversity Informatics
BGBM 
Index

© Freie Universität Berlin, Botanischer Garten und Botanisches Museum Berlin-Dahlem,
Seitenverantwortlicher / Page editor: M. Geoffroy.     BGBM Impressum / Imprint