Botanischer Garten und Botanisches Museum, Berlin-Dahlem Botanischer Garten und Botanisches Museum, Berlin-Dahlem

Biodiversity Informatics

Natural Substances in the Compositae:
The Bohlmann Files

Introduction

In the course of their research on the chemistry of the Compositae at the Technical University of Berlin, Prof. Bohlmann and his assistant C. Zdero started in the beginning of the 60ies to compile a card index on natural substances and taxa of the Compositae. Dr. J. Jakupovic was entrusted with the file by Bohlmann before his untimely death and C. Zdero kept the card index up to date to the present day.

The subject of the files are all natural substances occurring in Compositae, with some reference made to other families for compounds which are of particular chemotaxonomic relevance for the Compositae. The card index consists of two files: about 18,000 cards with structures plus taxon names and original literature references, and about 6400 cards by taxon name, with all compounds found in the respective literature references. The data stem from literature revisions and Bohlmann and Jakupovic's own work, about 95% of the data are published. With the exception of some journals, which still had to be revised for the last few years, the literature revision was considered complete.

In 1994, Zdero and Jakupovic started to transfer the data to an ISIS/PC database, including a partial revision of literature references. Dr. W. Berendsohn, a botanist specialised in biological informatics who works at the Botanical Garden and Botanical Museum Berlin-Dahlem joined the team in 1996 to assist in questions of database design, project execution, and botanical data. Data entry was almost exclusively executed by C. Zdero.

Having started in April 2000, a project funded by the German Federal Ministry of Research has been executed to make the data accessible on the World Wide Web.

The database project

The original flat-file format database consisted of the following attributes: (1) (Chemical) structure, (2) (trivial) name of the compound (if present), (3) molecular weight and (4) formula, both calculated from (1), (5) taxon or taxa where the compound was found with (6) reference to the literature, (7) reference citation for the original publication of the compound, (8) revision note(s), and (9) other notes. Literature citations were kept in a separate list of references. The ISISBase program allowed for a search on structures and partial structures as well as on the contents of the text fields. However, ISISBase's hierarchical data structure is rather inflexible if the database is to be extended or linked to external data. Therefore, the data were to be converted to a relational format as defined by the following data structure (draft, initial concept of data structure for the Bohlmann Files):

The attribute "Structure" was to contain a reference to a record in a table holding the chemical structure of the natural product. The software add-in Accord for Access was selected to manage the structure data, later being replaced by the JChem software of ChemAxon, because it allows searching on chemical substructures (structure diagrams). In addition to the data content of the nine fields held in the ISIS database, the relational data model was to include fields for a quality assessment of the assignation of a compound to a taxon name, as well as taxonomic status and synonymy of the names cited. The database was initially to be created using Microsoft Access 2000 and later be upgraded to an SQL-Server system.

World Wide Web access was to allow for searching by substructure as well as by names of taxa and chemical names.

The stated objective was to provide free access to this wealth of information to serve applied sciences (e.g. to identify potential sources for substances) as well as basic research (e.g. by providing additional clues to the taxonomy of groups within the Compositae) and to elucidate end-products of gene expression and biosynthetic pathways.

The database

The project database has been first set up under MS Access 2000, adapting the taxonomic module of the BoGART database system (see http://www.bgbm.org/BioDivInf/Projects/Bogart-e.htm) for the treatment of identification and nomenclature, and has later been migrated to MS SQLServer.

Data from the flat-file format ISIS database were parsed to separate and atomise plant names, synonyms, and literature data. Access forms were adapted to the task of nomenclatural editing. For checking and input of chemical data, the ISIS database was amended to hold additional information and to provide a better starting point for final conversion.

The new relational database has the following features:

Chemical structures are stored using the JChem software. Its structural search capabilities allow searches via the WWW and thus form the core of the web publishing component which is accessible at http://bohlmann.bgbm.org/bohlmann/ccq.

The data

The database now holds data on 6258 botanical names from about 839 genera representing all 17 tribes within the three subfamilies of the Compositae, with a total of about 20,000 chemical structures, which can be subdivided into the following 8 compound classes:

1. Sesquiterpene lactones

  • Characteristics: Many highly bioactive compounds, but most of them cytotoxic. Probably high potential for pharmaceutical utilisation. The characteristic compounds for Compositae, occurring in most species. Also found only in some Apiaceae and micro-organisms.
  • Current records: 5685 structures, 9200 references to 2112 species, 2120 literature references.
  • Completeness: >98 % of all structures recorded. Revision of primary literature sources excellent.

2. Monoterpenes

  • Characteristics: Very important in essential oils. Taxonomically not very important because seldom characteristic for a specific taxon.
  • Current records: 880 structures.

3. Sesquiterpenes

  • Current records: 3588 structures.
  • Completeness: >95% of all occurring compounds. Revision of primary literature sources very good.

4. Diterpenes

  • Current records: 2616 structures..
  • Completeness: >99% of all occurring compounds. Revision of primary literature sources very good.

5. Acetylenes

  • Current records: 1233 structures.

6. Aromatic compounds (includes flavonoids, coumarins, benzofurans, cromenes etc.)

  • Current records: 4199 structures.
  • Completeness: >95% of all occurring compounds. Revision of primary literature sources excellent.

7. Triterpenes

  • Current records: 919 structures.
  • Completeness: about 95% of all occurring compounds.

8. Alkaloids

  • Current records: 304 structures.
  • Completeness: >95% of all occurring compounds.

Publication

For further information please contact  Walter Berendsohn.

[W. Berendsohn, J. Jakupovic & C. Zdero]

Print Page