TDWG

Subgroup on Accession Data: Preparing for a Collection Data Standard


First Convener's Report


Contents

Introduction

1. Purpose of the Accessions Standard

2. Form of standard

3. Terminology

4. Scope of "Accession Data"

5. Data areas

6. Liaison (inofficial, in most cases)

7. Other comments received since October 1995

8. Procedural items


Introduction

This first report is trying to provide some structure to the discussion. All points being made are completely open to discussion. The first things to be decided are the purpose and the form the standard to be published, and a preliminary delimitation and subdivision of its contents.

Special consideration should be given to liaison with groups which are or have been working on projects involving collection data. I included a loose list of such projects under point 6 of this report.

The reports will be provided in a highly structured form to allow reference to previous issues (pages and paragraphs not providing citation anchors in a WWW document). After the report has been made public, only orthographic or grammatical errors will be corrected (native speakers, to the rescue!).


1. Purpose of the Accessions Standard

1.1. Provide field content definitions for biological collection databases which facilitate access, query, and data exchange between and over different or distributed databases.

1.2. Provide guidance to authors of datasets of any scale as to the structuring of their collection data, with the aim of minimizing loss of information or data quality in the case of import to a database.

1.3. Mirror the current state of knowlegde as the inherent structure of collection data, as far as a consensus can be reached.


2. Form of standard

To start with, the following three forms can be discussed:

2.1. Information model.

I think it is agreed that logical data modelling techniques are an indispensable tool for the analysis of complex data such as present in our field. However, truly implementation independent models reach a degree of abstraction which may make them very complicated reading to the avarage system designer, not to speak of the avarage individual creating a dataset. Furthermore, of the examples provided in the References document, non is implementation independent, most of them are based on the relational data model, some even directed to the implementation of a specific database. A new project proposed lately (Allison & Blum, "An Interdisciplinary Information Model for Biological Collections", supported by the ASC) promises to change this situation to a certain extend. In case that this project is funded, close liaison should be maintained to ensure that these efforts are not duplicated by TDWG.

2.2. Contents metastandard or complex data dictionary.

"Metadata is data about data. It is used to document information about datasets and is often used, at least in part, as an "index" or "directory" to the data." [ERIN 1995]

An example (and an explanation of the use and the no-no's of metadata) is provided by the Content Standard for Geographical Metadata (FGDC 1994). The "data structure diagrams" in the CDEFD model (Berendsohn & al.) also go into this direction.The concept of a hierarchical decomposition of data items in successively more restricted data elements is intruiging. It allows to quickly pick an choose data element relevant to a defined task at-hand, and a great deal of flexibility as to the delimitation of individual data fields is maintained, because higher hierarchical levels may effectively represent concatenations of atomic data items. It can cover the needs of large-scale databases as well as the needs of small scale databases or datasets set up by individuals. It is the latter category where most of the original data are stored, their incorporation into large scale databases is often made difficult by inprecise structural defintions.

2.3. Data dictionary

A fixed list of fields and their description. This would have been very useful to have by the end of the last decade (see Beach's document for the TDWG Accession subgroup). It definitly has the advantage to appear quite readable to any biologist and it could still play a role as a simple exchange format. However, in its basic form it does not support structure at all, and we would fail to mirror the consensus reached about at least some of the structural elements present in collection data. Also, such a fixed list is inflexible; the standard should allow for different forms of data elements to express a common contents, as long as these data elements can be converted from one to the other easily and exactly (i.e. without loss of data quality - example: treatment of geographical coordinates in the CDEFD model and in the ITF standard).


3. Terminology

Some terms are used in this document which have to be defined in advance.

Biological object: Anything which could be the object of a biological observation or study; i.e. a unit, a site, or a concept like a taxon or syntaxon.

Biological collection data: All information related to units which have been incorporated into a collection or a system of observations such as those used in floristic mapping projects.

Collection: An artificial assemblage of units.

Gathering: The act of collecting physical objects and/or information.

Site: A defined point or area in the biosphere.

Unit: A physical object which contains organisms, represents an organism, or is/was part of an organism.


4. Scope of "Accession Data"

To achieve a productive discussion, it is necessary to define the scope of the subgroup's task and to subdivide the data into data areas. Topics for discussion:

4.1. Data areas to be excluded

Existing TDWG standards should be respected and their improvement, if necessary, should be left to the respective subgroups. This concerns the following standards: Plant Names, Descriptors, Phytogeographic areas, geographical recording units. As a consequence, the present subgroup should not discuss taxonomic data. At least presently, I would also like to suggest exclusion of morphological, anatomical or physiological descriptors of the sampled organisms or the samples themselves. Such data are highly specific to individual collections or even groups of organisms. However, in a modelling context their links to general collection data should be considered.

4.2. Data areas to be covered completely

a) The activities of internal collection management, such as storage conditions, localization within collections, accession codes, preservation treatments, etc.

b) Transaction management in natural history or living collections. This includes data items related to itemizing, packaging, sending, transporting, receiving, accounting, etc. of loans, sales, gifts, or exchanges made between collections and/or individuals.

c) The gathering (i.e. collector(s)/observer(s), point in time, expedition information, collection numbering, etc.).

d) Taxon identification events, including data items such as the identifier, the date of identification, modifiers (e.g. "cf.") and verification level (IUCN) of the identification; not including the data pertaining to the result of the identification (the taxon name) itself.

e) Relationships between units

f) Unit descriptive data as far as relevant for collection management (Material category, non-taxonomic identifications).

4.3. Data areas where a selection of data items has to be made

a) Gathering site information as to geographic and/or geological position

b) Ecological gathering site information

c) Person-related information

d) Literature references


5. Data areas

This item is wide open to discussion. However, because the TDWG standard will probably not be published in one piece, it is a matter of priority to provide a working subdivision of the task at hand.

Some data areas have already been suggested under 4.2 and 4.3. All models or data dictionaries cited in the sources section provide some kind of subdivision. Apart from the taxon-group oriented "modules" mentioned by Charles below, we have to consider to single out data areas which apply only to certain types of units (unit subtypes in the CDEFD model). Examples include common descriptors like quantitative measurements (number, weight, etc.) as well as "exotic" things like chemical substance identification (rather important in natural substances collections, however). Comments received so far:

Charles Hussey, 11 Oct 1995: I should like to propose that any standards that may be produced are built around modules. In our own institution [British Museum] we are coping with the disparate requirements of our different Science Departments by identifying a set of core fields that will apply to all groups, whilst allowing each Department the flexibility to add fields in order to cope with their more specialised needs. It strikes me that, as TDWG expands its standards to embrace all organisms, it might be helpful to define a set of common "CORE" fields (perhaps related to those in the ITF) and subsequently add further fields to cope with the needs of: Herbaria, Gardens with living collections, Zoological collections, Zoos, Animal breeders who keep Stud Books.


6. Liaison (inofficial, in most cases)

ASC (Association of Systematics Collections)

Elaine Hoagland, May 15, 1996: The Assoc. of Systematics Collections would like to listen in on your discussion. .... Note that Stan Blum and Allen Allison (Bishop Museum) are talking about an extension of the ASC data model, and ASC supports their effort. Please get in contact with them.

W. Berendsohn: Stan Blum kept me informed about the proposal.

Stuart G. Poss, 15 May 1996: see under ASIH

Gary Rosenberg, 15 May 1996: I am interested in participating in the working group. I was a participant in the 1992 ASC Workshop, the 1996 Species 2000 conference in Manila and will attend the NAPC invertebrate paleontology database workshop next month. I maintain databases of mollusks on Internet, gopher://erato.acnatsci.org.

ASIH (American Society of Ichthyologists and Herpetologists)

Stuart G. Poss, 15 May 1996: My own interest in seeing that such standards become a reality in the relatively near term are substantial. I serve as chair of the American Society of Ichthyologists and Herpetologists subcommittee on data standards and am a member of the Association of Systematics Collections Computer Networking Committee. Efforts of both groups are directed toward standardiztion, issues fundamental to exchange of information. Although I should not attempt to speak for either committee at this time, ..... In both groups we are using various modeling methodologies to implement our ideas, in particular Object Role Modeling and Entity Relation Modeling, the former being particularly useful in conceptual design, while the latter is more widely known within the community. I would be interested to learn what approaches do you anticipate using to further discuss specific standardization issues.
Thank you for your initiative. I suspect we will be in further discussion through our respective committees.

CDEFD project

The project is essentially concluded and most of its results referring to our subject have been published (see Berendsohn & al. in References).

DNFM (Direktorenkonferenz der Naturwissenschaftlichen Forschungsmuseen), EDP working group

June 19, 1996: The head of the working group, D. Walossek (Ulm) agreed that Liaison is to be maintained by David Lazarus (Berlin).

Flora Europaea

John Edmondson, 17 May 96: ... My other interest is Flora Europaea, and the question of a Europe-wide floristic database project is under active discussion. It is my belief that a multi-centre project such as this can only succeed through inter-institutional cooperation if the various institutions agree to work to common standards. Hence the success of the project is critically dependent on the work you are doing and I would see it as the key to progress.

FLORIN

Konstantin Savov, 21 May 1996 I'm a botanist and one of the authors of FLORIN Information System designed to deal with information about plants.
... Here at DataX/FLORIN we have a team of botanists and professional software developers working with FLORIN. We try to implement FLORIN as rather universal information system. So, we had to establish data structure for botanical information, which can be applicable for different tasks. Our basic ideas are often similar to those described in CDEFD. ... I'm very interested in joining the discussion on standards.... You may also find more information about FLORIN Project at http://www.florin.ru/florin/
... This year, we've transferred FLORIN project from Yourdon structured method to Object Modelling Technique published by Rumbaugh (both methods are supported at our site by CASE system from former Westmount Technology, B.V.). So, we're going to publish FLORIN data model described in OMT terms (classes, associations, objects, etc.). ...

LASSI (??)

John Edmondson, 17 May 1996:... I am very interested to participate in the TDWG discussion on this subject, particularly from the point of view of wider compatibility. Working in a museum environment which ranges from systematic biology collections to ethnology and archaeology, we are involved in developing 'broad-brush' collections management systems to handle our 1.5 million biological specimens and artefacts. The current system, known as LASSI, was developed by a consortium of ten UK museums ....

Berendsohn: Information request sent to S. Keene, who forwarded it to Alice Grant.

SMASCH project

Tom Duncan, 15 May 1996:I am writing in response to your message about the TDWG Collection Data Standard. As you may know, the SMASCH project has been developing a relational database for botanical collections in California. In addition, the Museum Informatics Project (http://www.mip.berkeley.edu) has worked on data models for the Museum of Vertebrate Zoology. I believe that Berkeley has much to offer in the upcoming TDWG discussion and therefore I am very intersted in participating in the discussions about the TDWG Collection Data Standard.


7. Other comments received since October 1995

(excerpts, please protest if I left out something essential!)

7.1. Anita F. Cholewa, 15 May 1996: While I do not wish to become a member of the collections database group, ... for those of us who may have already developed some sort of collections database, what help will there be to convert our databases to conform to the new standards?

7.2. Charles Hussey: What I would wish to see is an evolutuion of the existing TDWG Data Exchange Standard to embrace Animal and Palaeo requirements. The format that you use to exchange information can differ (to an extent) from propietary systems (for example you might choose to record latitude and longitude within a single Museum as having separate fields for Degrees, Minutes, Seconds And Direction; but choose for data exchange to send Latitude as a single field - or you might decide that for exchange, all latitudes should be converted to decimal degrees).

7.3. Clinton Morse, 17 May 96 I am interested in the ongoing evolution of a database standard for collections information. I am a professional horticulturist, not a scientist nor programmer so my contributions to such a project would likely be minimal but I am interested in monitoring any ongoing debates etc concerning this subject.
For background, I manage the greenhouse collections for the Department of Ecology & Evolutionary Biology at the University of Connecticut USA. We currently have around 10,000 SQ FT (~950 SQ Meters?) of glasshouse space housing a general collection of over 3000 accessions in 165 families. All of this information is currently databased in FoxPro and is presented in an automatically generated/updated WWW system comprising some 6000+ HTML documents. It is a constantly evolving system and I would like to ensure that as a standard evolves, I incorporate these facets into future changes. One area I would like to expand upon is geographic occurrence data and the ability to generate distribution maps (on the WWW) on the fly for individuals in our collection.
PS - I saw this note on the AABGA list...

7.4. Stuart G. Poss:..... as a marine ichthyologist I particularly interested in ensuring that evolving standards, which have arisen primarily from information gathering on terrestrial organisms, also take into consideration some of the unique attributes associated with collecting information about marine organisms ..
As a curator of a major collection of fishes and invertebrates, I am particularly keen that the emerging standards also do a more adequate job of insuring that the essential archival aspects of original information source materials are preserved in the standardization process. Although there are a variety of conceptual approaches one may take to organize data, electronic data storage has the considerable potential to "loose" information, if the sources and upgradability of certain data elements are not given adequate consideration in the modeling process.

7.5. Adrian Rissone : The system *currently* in use in our Palaeontology department (UniData, which uses a data structure similar to PICK) has no limit on field length in any field. Data are stored as ASCII characters, even numerics, dates and times. Also, fields and groups of fields are repeatable. This has many advantages! As you probably know we are trying to choose a database architecture common to the whole Museum so there is doubt as to whether these advantages will be outweighed by other considerations.


8. Procedural items

Name of subgroup

Gregor Hagedorn suggested to rename the subgroup to "Natural History Collection Standards Working Group", because the name "Subgroup on Accession Data" was confusing.

W. Berendsohn: I am very reluctant to change an established name for a project or group, if its aims actually remain the same. If the name has to be changed, I would prefer something like "Biological Collection Standards Subgroup". However, if members of the subgroup agree on a new name, I will bring up the topic during the next TDWG meeting.


To index page. Contact: Walter G. Berendsohn, subgroup convener, wgb@zedat.fu-berlin.de. This page last updated June 23, 1996.