TDWG Subgroup on Accession Data

Konstantin Savov: Notes on the Accession Data Standard (in reply to convenor's report 2)

I've prepared this text as a short summary of the most important comments on the Accession Data Standard I have at the moment. Most of comments are connected with HISPID3.

1. The Standard should deal with original (raw) data.

An original information can be changed (interpreted, transformed), while entering it in a database, exporting or importing. That is the most dangerous form of data loss. An exchange format should avoid fields and structures, which require interpretations of original data.

2. The Standard should be primarily arranged for institutions and software developers.

Personal databases are very variable considering their purposes, scope of information, technical experience of their authors, etc. They often are arranged for particular projects and comprise interpreted information instead of original data. If the standard is accepted and used by institutions and software developers, it will work. It will also have more chances to be accepted by personal database owners.

3. Core data model should precede exchange format.

We can accept an exchange format as a set of files (structured model) or as a flat-file. The last approach has many disadvantages, but it's possible. Anyway, core data model should be generally accepted before discussing details of exchange format. At least, a preliminary draft version of the model is urgent. Core model determines a set of files and their relations for structured exchange format and restrictions to be applied to file content for flat-file format. For example, we have to establish rules for maintaining identification history and collection management information in flat-file format.

4. The most important requirements to import/export format (considering HISPID as an example).

Taxonomy (hierarchy, synonyms, basionyms, etc.) should be excluded. Taxonomic and/or common name written on a label is needed only. It can be represented as a set of fields or as a text string to be linked to taxonomic data after loading in a database.
Geocodes, administrative units, biogeographic units and their hierarchies should be maintained separately. Gathering site descriptions are too variable to be represented as a simple hierarchy.
Import/export format shouldn't follow a particular software system and include technical details (internal numbers, codes, flags, etc.) accepted by that system. But, we'd need some fields indicating rules for data load.
Fields only may contain codes (abbreviations), if those are widely accepted in practice. By the way, some standards propose a lot of abbreviations, which are not widely accepted.

Konstantin Savov
October 6-7, 1996.

To index page. Contact: Walter G. Berendsohn, subgroup convener, wgb@zedat.fu-berlin.de. This page last updated Oct. 29, 1997