In the course of their research on the chemistry of the Compositae at the Technical University of Berlin, Prof. Bohlmann and his assistant C. Zdero started in the beginning of the 60ies to compile a card index on natural substances and taxa of the Compositae. Dr. J. Jakupovic was entrusted with the file by Bohlmann before his untimely death and C. Zdero kept the card index up to date to the present day.
The subject of the files are all natural substances occurring in Compositae, with some reference made to other families for compounds which are of particular chemotaxonomic relevance for the Compositae. The card index consists of two files: about 18,000 cards with structures plus taxon names and original literature references, and about 6400 cards by taxon name, with all compounds found in the respective literature references. The data stem from literature revisions and Bohlmann and Jakupovic's own work, about 95% of the data are published. With the exception of some journals, which still had to be revised for the last few years, the literature revision was considered complete.
In 1994, Zdero and Jakupovic started to transfer the data to an ISIS/PC database, including a partial revision of literature references. Dr. W. Berendsohn, a botanist specialised in biological informatics who works at the Botanical Garden and Botanical Museum Berlin-Dahlem joined the team in 1996 to assist in questions of database design, project execution, and botanical data. Data entry was almost exclusively executed by C. Zdero.
Having started in April 2000, a project funded by the German Federal Ministry of Research has been executed to make the data accessible on the World Wide Web.
The original flat-file format database consisted of the following attributes: (1) (Chemical) structure, (2) (trivial) name of the compound (if present), (3) molecular weight and (4) formula, both calculated from (1), (5) taxon or taxa where the compound was found with (6) reference to the literature, (7) reference citation for the original publication of the compound, (8) revision note(s), and (9) other notes. Literature citations were kept in a separate list of references. The ISISBase program allowed for a search on structures and partial structures as well as on the contents of the text fields. However, ISISBase's hierarchical data structure is rather inflexible if the database is to be extended or linked to external data. Therefore, the data were to be converted to a relational format as defined by the following data structure (draft, initial concept of data structure for the Bohlmann Files):

The attribute "Structure" was to contain a
reference to a record in a table holding the chemical structure of the natural
product. The software add-in Accord for Access was selected to manage the
structure data, later being replaced by the JChem software of ChemAxon, because it allows searching on chemical substructures (structure
diagrams). In addition to the data content of the nine fields held in the ISIS
database, the relational data model was to include fields for a quality
assessment of the assignation of a compound to a taxon name, as well as
taxonomic status and synonymy of the names cited. The database was initially to
be created using Microsoft Access 2000 and later be upgraded to an SQL-Server
system.
World Wide Web access was to allow for searching by substructure as well as by names of taxa and chemical names.
The stated objective was to provide free access to this wealth of information to serve applied sciences (e.g. to identify potential sources for substances) as well as basic research (e.g. by providing additional clues to the taxonomy of groups within the Compositae) and to elucidate end-products of gene expression and biosynthetic pathways.
The project database has been first set up under MS Access 2000, adapting the taxonomic module of the BoGART database system (see http://www.bgbm.org/BioDivInf/Projects/Bogart-e.htm) for the treatment of identification and nomenclature, and has later been migrated to MS SQLServer.
Data from the flat-file format ISIS database were parsed to separate and atomise plant names, synonyms, and literature data. Access forms were adapted to the task of nomenclatural editing. For checking and input of chemical data, the ISIS database was amended to hold additional information and to provide a better starting point for final conversion.
The new relational database has the following features:
Chemical structures are stored using the JChem software. Its structural search capabilities allow searches via the WWW and thus form the core of the web publishing component which is accessible at http://bohlmann.bgbm.org/bohlmann/ccq.
The database now holds data on 6258 botanical names from about 839 genera representing all 17 tribes within the three subfamilies of the Compositae, with a total of about 20,000 chemical structures, which can be subdivided into the following 8 compound classes:
For further information please contact Walter Berendsohn.
[W. Berendsohn, J. Jakupovic & C. Zdero]