Botanischer Garten und Botanisches Museum, 2k

Biodiversitätsinformatik / Biodiversity Informatics
Botanischer Garten und Botanisches Museum Berlin-Dahlem

MoreTax

Choosing a formal language for the 
formulation of rules

Table of contents:  The starting point | The "potential taxon"-graph | User queries |
The transmission of factual information | The formal description | Examples | Editing the rule system

The starting point

Biological factual information, which is stored in independent sources (literature, databases, etc..), is mostly linked to taxon names and can be combined by means of these names. Substantial difficulties arise when attempting to merge this knowledge due to the fact that in the different sources different point of views may prevail about the systematic position (and, in consequence, of the correct name) and/or about the circumscription of a taxon. MoreTax is a research project which is analysing this problem with the aim to develop mechanisms to convey factual data from and between different databases where facts are linked to taxon names. 

BACK

The "Potential Taxon"-graph

The concept of a taxon as expressed in a specific source can formally be described by a combination of the taxon name with the source reference (we call this a "Potential Taxon"). Since taxa are classes of organisms, we can describe them as the set of all organisms belonging to that taxon. The concept of the taxon defines its circumscription and thus the elements contained within the set. Elsewhere in the project we describe a special editor (the Taxonomic Editor), which allows on the one hand to input and manage such Potential Taxa and on the other hand enables experts to define and edit the relationships between two Potential Taxa by means of a set relationship.

The following basic relationships from set theory are relevant for the description of the relationship between two potential taxa PT1 and PT2:

 

R1. PT1 and PT2 are congruent
PT1 º PT2                   xÎ PT1 Û xÎ PT2   

 

 

R2. PT1 is included in PT2  
PT1 Ì PT2                  xÎ PT1 Þ xÎ PT2, $yÎ PT2 | yÏ PT1

 

 

R3. PT1 includes PT2
PT1 É PT2                  xÎ PT2Þ xÎ PT1, $yÎ PT1 | yÏ PT2

 

 

R4. PT1 and PT2 overlap each other
PT1 Å PT2                  $xÎ PT1 | xÏ  PT2, $yÎ PT2 | yÏ PT1,                                     $zÎ PT1 | zÎ PT2

 

 

R5. PT1 and PT2 exclude each other  
PT1 ! PT2                    xÎ PT1 Þ xÏ PT2

 

The relationship between several Potential Taxa thus form an oriented graph, where the nodes are  the Potential Taxa and the edges are formed by those pairs of Potential Taxa,  for which the expert(s) assigned set relationships :

 

 

BACK

User queries

Suppose that there is a pool of connected Potential Taxa from different sources. Two different kind of queries about factual information are of interest for the user: 

  • To which taxa (actually: Potential Taxa) does certain factual information apply?

  • Which factual information applies to certain taxon names (actually: Potential Taxa)?

The result should not depend on which Potential Taxon the factual information was originally linked to. For this purpose a rule system based on the Potential Taxon graph has to be developed. Moreover it should be possible to formulate flexible rules, which restrain the result (e.g. depending on factors such as an assessment of the expertise of sources or authors of relationships). As a result, users can be notified about qualitative aspects of the linkage between transmitted facts and the potential taxon they used at the start of their query. 

BACK

The transmission of factual information

There are four categories for the applicability of factual information with respect to "their" Potential Taxon:
1) fully applicable, if the factual information applies to every element of the taxon,
2) partially applicable, if the factual information applies only to a subset of elements of the taxon, 
3) doubtful applicable, if the factual information may apply to some elements of the taxon and
4) not applicable, if the factual information does not apply to any element of the taxon.

Suppose that some factual information is fully applicable for the potential taxon PT1. Taking into account the graph with its relationships there are at least three options for the quality of the factual information if transmitted to the potential taxon PT2:

  • fully applicable, if PT1 º PT2 or PT1 É PT2

  • partially applicable, if PT1 Ì PT2 or PT1 Å PT2

  • not applicable, if PT1 ! PT2

As shown, the quality of the factual information applying to PT2 depends on both the quality of the same factual information when applying to PT1 and on the set relationship between both of them.

In the graph it is evident that an edge does not exist for every pair of potential taxa, although a path (sequence of edges) between them might exist. In our example this is the case for PTi and PTk. Therefore there must be a rule, which calculates the resulting set relationship, when concatenating two contiguous edges with their respective set relationships. If e.g. Bij and Bjk are the set relationship „Ì“, then it is easy to see that the resulting relationship between PTi and PTk is also „Ì“  and hence fully applicable factual information to PTi is only partially applicable to PTk . Assume that Bij still remains „Ì“ but that Bjk is „Å“. Then it turns out that the resulting relationship between PTi and PTk is no longer unique. It could be „Ì“ or „Å“ or even "!". This forces the introduction of "combined" relationships and a corresponding extension of the rule. With this extended rule it is then possible to associate a unique "combined" relationship to each path in the graph.

Actually, two potential taxa can be connected in the graph through several paths. This is the case for PTi and PTl, because there is a "direct" path - the corresponding edge - and also an "indirect" path over PTj. A "combined" relationship is associated to each path. Additional rules must thus specify how the system  has to be proceeded to obtain from such two "combined" relationships the resulting "combined" relationship. This leads at least to two alternative rules.

For each oriented relationship between two potential taxa PT1 and PT2 there exists a reverse oriented relationship between PT2 and PT1, which can be likewise defined by an appropriate rule. This results in altogether at least four different rules.

 

BACK

The formal description

The quality of factual information when transmitted from an "original" PTo to a "target" PTt thus depends (i) on the Potential Taxon graph, or more precisely on all paths from PTo to PTt and on the oriented relationships that are assigned to the edges included in these paths and (ii) on the applicability of the factual data. Computing the quality of transmitted factual information is therefore based on: 

  • algorithms that find all paths from  PTo to PTt in an oriented graph 

  • rules that assign to each path a relationship on the basis of the relationships corresponding to the included edges and which then assign a unique final relationship to the pair (PTo, PTt) based on  all paths from  PTo to PTt. This last relationship is used to compute the quality of the transmitted factual information.

  • a rule that combines the resulting relationship with the applicability of the factual information to arrive at a relevant result.

For the formal description of such a graph as well as for the algorithms and rules any higher programming language can be used. These rules do not need to be edited, since they do not depend on the specific contents of the included data. As an example we used Visual Basic to define a "relationship data type" as well as the above mentioned rules.

BACK

Examples

Definition of a datatype for "combined relationship"-objects:

Public Type Relationship
   Congruent_to As Boolean
   Is_included_in As Boolean
   Includes As Boolean
   Overlaps As Boolean
   Excludes As Boolean
   Doubtful As Boolean
End Type

Reversal rule for "combined relationships":

Public Function reverse(Rel1 As Relationship) As Relationship
   reverse = Rel1
   reverse.Is_included_in = Rel1.Includes
   reverse.Includes = Rel1.Is_included_in
End Function

Unification rule for two "combined relationships" (strong agreement - intersection):

Public Function cons(Rel1 As Relationship, Rel2 As Relationship) As Relationship
   If Rel1.Doubtful = Rel2.Doubtful Then
      cons.Congruent_to = Rel1.Congruent_to And Rel2.Congruent_to
      cons.Is_included_in = Rel1.Is_included_in And Rel2.Is_included_in
      cons.Includes = Rel1.Includes And Rel2.Includes
      cons.Overlaps = Rel1.Overlaps And Rel2.Overlaps
      cons.Excludes = Rel1.Excludes And Rel2.Excludes
      cons.Doubtful = Rel1.Doubtful
   ElseIf Rel1.Doubtful = False Then
      cons.Congruent_to = Rel1.Congruent_to
      cons.Is_included_in = Rel1.Is_included_in
      cons.Includes = Rel1.Includes
      cons.Overlaps = Rel1.Overlaps
      cons.Excludes = Rel1.Excludes
      cons.Doubtful = Rel1.Doubtful
   Else
      cons.Congruent_to = Rel2.Congruent_to
      cons.Is_included_in = Rel2.Is_included_in
      cons.Includes = Rel2.Includes
      cons.Overlaps = Rel2.Overlaps
      cons.Excludes = Rel2.Excludes
      cons.Doubtful = Rel2.Doubtful
   End If
End Function

Unification rule for two "combined relationships" (weak agreement - union):

Public Function large_cons(Rel1 As Relationship, Rel2 As Relationship) As Relationship
   large_cons.Congruent_to = Rel1.Congruent_to Or Rel2.Congruent_to
   large_cons.Is_included_in = Rel1.Is_included_in Or Rel2.Is_included_in
   large_cons.Includes = Rel1.Includes Or Rel2.Includes
   large_cons.Overlaps = Rel1.Overlaps Or Rel2.Overlaps
   large_cons.Excludes = Rel1.Excludes Or Rel2.Excludes
   large_cons.Doubtful = Rel1.Doubtful Or Rel2.Doubtful
End Function

Concatenation rule for two contiguous "combined relationships":

Public Function concatenate(Rel1 As Relationship, Rel2 As Relationship) As Relationship
Dim RelNull As Relationship
Dim RelFull As Relationship
Dim TempRelResult As Relationship

   RelNull.Congruent_to = False
   RelNull.Is_included_in = False
   RelNull.Includes = False
   RelNull.Overlaps = False
   RelNull.Excludes = False
   RelNull.Doubtful = False

   RelFull.Congruent_to = True
   RelFull.Is_included_in = True
   RelFull.Includes = True
   RelFull.Overlaps = True
   RelFull.Excludes = True
   RelFull.Doubtful = False


   concatenate = RelNull
   TempRelResult = RelNull

   If Rel1.Congruent_to Then
      concatenate = Rel2
   End If
   If Rel2.Congruent_to Then
      TempRelResult = Rel1
      concatenate = large_cons(concatenate, TempRelResult)
      TempRelResult = RelNull
   End If
   If Rel1.Is_included_in Then
      If Rel2.Is_included_in Then
         TempRelResult.Is_included_in = True
         concatenate = large_cons(concatenate, TempRelResult)
         TempRelResult = RelNull
      End If
      If Rel2.Includes Then
         TempRelResult = RelFull
         concatenate = large_cons(concatenate, TempRelResult)
         TempRelResult = RelNull
      End If
      If Rel2.Overlaps Then
         TempRelResult.Is_included_in = True
         TempRelResult.Overlaps = True
         TempRelResult.Excludes = True
         concatenate = large_cons(concatenate, TempRelResult)
         TempRelResult = RelNull
      End If
      If Rel2.Excludes Then
         TempRelResult.Excludes = True
         concatenate = large_cons(concatenate, TempRelResult)
         TempRelResult = RelNull
      End If
   End If

   If Rel1.Includes Then
      If Rel2.Is_included_in Then
         TempRelResult.Congruent_to = True
         TempRelResult.Is_included_in = True
         TempRelResult.Includes = True
         TempRelResult.Overlaps = True
         concatenate = large_cons(concatenate, TempRelResult)
         TempRelResult = RelNull
      End If
      If Rel2.Includes Then
         TempRelResult.Includes = True
         concatenate = large_cons(concatenate, TempRelResult)
         TempRelResult = RelNull
      End If
      If Rel2.Overlaps Then
         TempRelResult.Includes = True
         TempRelResult.Overlaps = True
         concatenate = large_cons(concatenate, TempRelResult)
         TempRelResult = RelNull
      End If
      If Rel2.Excludes Then
         TempRelResult.Includes = True
         TempRelResult.Overlaps = True
         TempRelResult.Excludes = True
         concatenate = large_cons(concatenate, TempRelResult)
         TempRelResult = RelNull
      End If
   End If

   If Rel1.Overlaps Then
      If Rel2.Is_included_in Then
         TempRelResult.Is_included_in = True
         TempRelResult.Overlaps = True
         concatenate = large_cons(concatenate, TempRelResult)
         TempRelResult = RelNull
      End If
      If Rel2.Includes Then
         TempRelResult.Includes = True
         TempRelResult.Overlaps = True
         TempRelResult.Excludes = True
         concatenate = large_cons(concatenate, TempRelResult)
         TempRelResult = RelNull
      End If
      If Rel2.Overlaps Then
         TempRelResult = RelFull
         concatenate = large_cons(concatenate, TempRelResult)
         TempRelResult = RelNull
      End If
      If Rel2.Excludes Then
         TempRelResult.Includes = True
         TempRelResult.Overlaps = True
         TempRelResult.Excludes = True
         concatenate = large_cons(concatenate, TempRelResult)
         TempRelResult = RelNull
      End If
   End If

   If Rel1.Excludes Then
      If Rel2.Is_included_in Then
         TempRelResult.Is_included_in = True
         TempRelResult.Overlaps = True
         TempRelResult.Excludes = True
         concatenate = large_cons(concatenate, TempRelResult)
         TempRelResult = RelNull
      End If
      If Rel2.Includes Then
         TempRelResult.Excludes = True
         concatenate = large_cons(concatenate, TempRelResult)
         TempRelResult = RelNull
      End If
      If Rel2.Overlaps Then
         TempRelResult.Is_included_in = True
         TempRelResult.Overlaps = True
         TempRelResult.Excludes = True
         concatenate = large_cons(concatenate, TempRelResult)
         TempRelResult = RelNull
      End If
      If Rel2.Excludes Then
         TempRelResult = RelFull
         concatenate = large_cons(concatenate, TempRelResult)
         TempRelResult = RelNull
      End If
   End If

   concatenate.Doubtful = Rel1.Doubtful Or Rel2.Doubtful

End Function

Interpretation rule for "combined relationships":

Public Function evaluate(Category as String, Rel1 As Relationship) As String
   If (Not Rel1.Congruent_to) And (Not Rel1.Is_included_in) And (Not Rel1.Includes) And (Not Rel1.Overlaps) And (Not Rel1.Excludes) Then
      evaluate = " Contradiction !"
   ElseIf Category = " fully applicable " Then
      If (Not Rel1.Doubtful) Then
         If (Not Rel1.Excludes) Then
            If (Not Rel1.Is_included_in) And (Not Rel1.Overlaps) Then
               evaluate = " fully applicable !"
            Else
               evaluate = " partially applicable !"
            End If
         Else
            If (Rel1.Congruent_to Or Rel1.Is_included_in Or Rel1.Includes Or Rel1.Overlaps) Then
               evaluate = " doubtful applicable !"
            Else
               evaluate = " not applicable !"
            End If
         End If
      Else
         If (Not Rel1.Excludes) Then
            If (Not Rel1.Is_included_in) And (Not Rel1.Overlaps) Then
               evaluate = " fully applicable ?"
            Else
               evaluate = " partially applicable ?"
            End If
         Else
            If (Rel1.Congruent_to Or Rel1.Is_included_in Or Rel1.Includes Or Rel1.Overlaps) Then
               evaluate = " doubtful applicable ?"
            Else
               evaluate = " not applicable ?"
            End If
         End If
      End If
   ElseIf Category = " partially applicable " Then
      If (Not Rel1.Doubtful) Then
         If (Not Rel1.Excludes) Then
            If (Not Rel1.Includes) And (Not Rel1.Overlaps) Then
               evaluate = " partially applicable !"
            Else
               evaluate = " doubtful applicable !"
            End If
         Else
            If (Rel1.Congruent_to Or Rel1.Is_included_in Or Rel1.Includes Or Rel1.Overlaps) Then
               evaluate = " doubtful applicable !"
            Else
               evaluate = " not applicable !"
            End If
         End If
      Else
         If (Not Rel1.Excludes) Then
            If (Not Rel1.Includes) And (Not Rel1.Overlaps) Then
               evaluate = " partially applicable ?"
            Else
               evaluate = " doubtful applicable ?"
            End If
         Else
            If (Rel1.Congruent_to Or Rel1.Is_included_in Or Rel1.Includes Or Rel1.Overlaps) Then
               evaluate = " doubtful applicable ?"
            Else
               evaluate = " not applicable ?"
            End If
         End If
      End If
   Else
      If (Not Rel1.Doubtful) Then
         If (Rel1.Congruent_to Or Rel1.Is_included_in Or Rel1.Includes Or Rel1.Overlaps) Then
            evaluate = " doubtful applicable !"
         Else
            evaluate = " not applicable !"
         End If
      Else
         If (Rel1.Congruent_to Or Rel1.Is_included_in Or Rel1.Includes Or Rel1.Overlaps) Then
            evaluate = " doubtful applicable ?"
         Else
            evaluate = " not applicable ?"
         End If
      End If
   End If
End Function

BACK

Editing the rule system

Sometimes - as a function of specific characteristics of the considered data or data sources - addition of new rules and/or the adjustment of existing ones is necessary, e.g to:

  • include or exclude certain data sources which are available in the system

  • give preferential treatment to certain data sources for data output

  • weighting edges depending on their source (e.g. higher weighting of the opinion held by a certain expert for a certain taxonomic group) 

  • define a special treatment for queries that entail some special risk (e.g. medical information or information concerning the protection of species)

Since rules of this kind are not generally foreseeable and since they may refer directly to data contents and metadata of the source, they should not be incorporated in the core rules and analysis algorithms, but should be read and applied at run-time.

To ensure the adjustment of these rules, they could be formulated in a formal language adapted for propositional calculus. This would also facilitate the implementation of a user interface for this purpose . The programming language Prolog fulfils these requirements and we shall use it for the further description of the system . For the implementation however, other languages can be taken in account. An implementation could also be based on a complex configuration file, from which parameters are passed to core rules at runtime. 

BACK

Marc Geoffroy, Anton Güntsch & Walter G. Berendsohn

First version (German only): August 2001
Revised second (German and English) version: June 2002

__________________________________________________________________________

 

MoreTax (Rule-based association of taxonomic concepts) is a research and development project  financed by the Federal Agency for Nature Conservation of the German Ministry of the Environment.

Project co-ordinator: Walter Berendsohn
Project scientist: Marc Geoffroy

This page last updated on 12-11-2002

Abt. Biodiversitätsinformatik /
Dept. of Biodiversity Informatics
BGBM 
Index

© Freie Universität Berlin, Botanischer Garten und Botanisches Museum Berlin-Dahlem,
Seitenverantwortlicher / Page editor: M. Geoffroy.     BGBM Impressum / Imprint