Walter G. Berendsohn
Preprint of an article to be published in the Proceedings of the
Second National Colloquium on Global Change Research, Bad Honnef, Jan.
26. - 27, 2001
IntroductionThe term "Biodiversity Informatics" was coined to circumscribe the application of IT tools and technology to biodiversity information, principally at the organismic level. It thus deals with information capture, storage, provision, retrieval, and analysis, focused on individual organisms, populations, and taxa, and their interaction. It covers the information generated by the fields of systematics (including molecular systematics), evolutionary biology, population biology, behavioural sciences, and synecological fields ranging from pollination biology to parasitism and phytosociology. Biodiversity Informatics is considered a part of biological informatics sandwiched between - and strongly overlapping with - environmental informatics and molecular bioinformatics. It will provide the skeleton for a generalized scientific information infrastructure in biology. Presently, the overriding objective of research and other activities in Biodiversity Informatics is to provide a sound information management infrastructure for biodiversity and Global Change research. The basic objectives are: ·
These are general problems, common to many scientific disciplines, so why is there a need for a new sub-discipline? The reason lies in the inherent complexity not only of the problem domain itself but also of its basic data. In meteorology, for example, the basic data are essentially location-referenced numeric data plus information on the methods used to obtain these measurements. This is not to negate that the interpretation of the data, e.g. those obtained with remote sensing techniques, may be highly complex. However, in biodiversity research and data capture we are dealing with concept-related data at the very basic level. To cite but one example: the presence or absence of a certain species at a specific place is not only a question of recording the name, location, and time, but it also relates to the definition of the species - what we find is the organism, what we record is its classification. The matter becomes even more complex in palaeontology, where even the location (stratum) where the species is found is the designation of a concept. There are also a number of specific sociological and political obstacles to general access to biodiversity information. For example, primary information, which has been made accessible for research purposes, may be used for commercial aims (bioprospecting). According to the Convention on Biological Diversity and to many national laws, the rights of stakeholders such as nations or indigenous people must be respected. Current priorities in biodiversity informaticsData capture is proceeding in most institutions in a more or less organized form, ranging from the individual scientist to entire organizations and thematic networks, from collections of text documents or spreadsheets to sophisticated databases. However, as mentioned before, the basic information we are dealing with is complex, and so structural features (e.g. in database design) and content definitions (e.g. in controlled vocabularies, i.e., lists of applicable terms) vary widely. Biodiversity Informatics has to meet this challenge by providing consensus reference systems. On the structural side, much progress has been made in the last decade, when several groups published standard formats or even comprehensive information models for subject areas such as natural history collections and biological collections in general, palaeontological collections, and taxon names (see TDWG/BioCISE 2001 for an overview centred on biological collections). On the content side progress is also notable, but less rapid. The main reference system for biodiversity information, the classification of animals, plants, fungi, and micro-organisms into groups called taxa is essentially a classification of concepts, and consequently no "standard" system can be devised without impeding progress in systematics and evolutionary research. Taxon based information systems (or systems using taxon names) must find ways to map individual taxon concepts reliably (see Berendsohn 1999 for an introduction to the subject area). However, taxon concepts are but one of the problem areas designers of biodiversity information systems face. Most biodiversity research is associated with organisms in the field, which must be properly identified. Practical species identification tools can and must be delivered for field and laboratory research. Up to now, this is still largely a process of formal scientific description, and the terminology used and ontology needed for the description of organisms is very extensive (the discipline of medical informatics has been struggling for years to create a comprehensive ontology for just one species of mammal, Homo sapiens, see e.g. Baud & al. 1998). Striving for universal coverage of descriptive terms is clearly not the way to go. The current challenge for biodiversity informatics lies in the creation of descriptive systems (and identification tools based on these), which are useful also for the non-specialist. In scientific research, results must be falsifiable, so in biodiversity research the possibility to re-examine the organism itself should be maintained wherever possible. This is and has been done by depositing voucher specimens, suitably conserved organisms or part of them, in natural history collections. These institutions not only provide a solid base for modern biodiversity research, they also serve as archives of biodiversity. The specimens deposited there over several hundreds of years represent samples of biological space-time. They document the existence of an organism at a certain time at a defined place. They are samples of biological material, which can be analysed for example to measure environmental parameters or genetic variation and document their change over time. Consequently, biological collections are a cornerstone of biodiversity research, and the electronic representation of their content (an estimated 2.5 billion specimens world-wide; Duckworth et al. 1993) is a priority task for Biodiversity Informatics. One of the principal reference systems for biodiversity research actually lies outside biology: the spatial reference system, including palaeontological space. Biodiversity information systems need sophisticated spatial reference systems, which are able to unite several layers of information related to a geographic reference point, which are able to scale horizontally and vertically from very large features right down to the small (e.g. strata in a rain forest) and the very small (e.g. soil communities inside a dung heap). Since many of the historic references found on specimens are not exact (referencing historic locality names and not co-ordinates), sophisticated gazetteers have to be incorporated which include the historical component of place names. In contrast to the domain of taxon names, the problem of language representation becomes very important in geographic names. Meeting the challengeInformation structures and data referencesAs mentioned before, research in information modelling and standardization of biological databases has been an important (though sadly under-funded) issue in the past decade. The IUBS Commission for Taxonomic Databases (TDWG 2001) has been a driving force and an important forum for information exchange. However, most of the experience was gained in the course of the implementation of information systems in individual biological institutions, large and small, with a heavy rate of failure due to sub-estimation of the complexity of tasks and lack of expertise and liaison. With respect to taxon names and biological collection data, detailed information models exist which can serve as a base for implementation projects. Some of the hot issues in that realm include concept mapping and data quality and automatic linkage evaluation tools. On the data side, we have extensive data collections, covering the names of higher plants (IPNI 1999), mosses (MOST 2000), fungi (CABI 2001), and (only from 1978 onwards in electronic form) zoological names (BIOSIS 2001). Apart from these name lists, promising attempts exist to unite information on the species level. However, a global federation of species databases, as envisioned by the Species 2000 project (Bisby 1998) is still lacking the coordinated funding effort, which is needed mainly to create the underlying group-specific information systems. The FishBase (2001) information system provides a particularly fine example for what several years of consistent and well-managed funding can achieve in terms of international research collaboration and data provision. For collections, attempts are made to create metadata based systems (e.g. BioCISE and collaborating projects) which are thought as an intermediate step on the way to specimen level information systems (Berendsohn & al. 2000). The research potential of specimen level information networks has been demonstrated by the Australian Environmental Resources Network (ERIN 2001), the NSF funded Species Analyst project (KUNHM 2001), the European Natural History Collection Information Network (ENHSIN 2001), the Comisión Nacional para el Conocimiento y Uso de la Biodiversidad (CONABIO 2001) and other systems on national level. Investigating user interestsTwo EU financed projects have recently examined the interests of (potential) users and data providers for an electronic information system on biological collections: ENHSIN - the ongoing European Natural History Specimen Information Network, a research infrastructure network initiated by CETAF, the Consortium of European large-scale Taxonomic Facilities and BioCISE, a concerted action project which finished in December 1999 and published its findings (Berendsohn 2000). Since collection data include many of the aforementioned aspects of biodiversity information, their results may here be used in a broader context. BioCISE differentiated four major groups of users of collection information (after Felinks & al. 2000): ·
This grouping is essentially valid for biodiversity information in general, although basic biodiversity research includes wider questions such as the actual basis for species richness, evolutionary mechanisms, etc. BioCISE also concluded that the primary data access points are geographical areas and taxon names. Users want user-group specific, visual, interactive, intuitive user interfaces, with scalable, standardized, high-quality and documented output. The information provider's viewInformation providers face a different set of challenges and requirements. Efficient data capture software is needed, which is flexible enough to be adapted to ever changing research needs. With respect to the networking of information sources, the question of the protection of intellectual property rights, ensuring the proper crediting of data sources, maintenance of control over the use of their information, and (sometimes) commercialisation of the information are issues ardently discussed among, e.g., the big natural history museums. Long term archiving of primary data from research projects, which are not directly connected to specimens, is an unsolved problem. Here, cooperation with the library community must be sought. Technical realizationBiodiversity data are mostly international in scope and interdisciplinary in content, so their acquisition and distribution asks for wide ranging collaboration. The World Wide Web has had a tremendous levelling effect on information distribution techniques, from early attempts (Gopher etc.) to HTTP and now XML standards. Web browsers offer universal common user interface functionality independent of hardware and of the location of the user. Information access through a network of distributed and largely independent nodes is now feasible. A prerequisite for this is the provision of standardized interfaces to data resources on the Internet, and international working groups are tackling this aim (e.g. the recently instituted CODATA working group on access to biological collection data). However, data retrieval is but one side of the equation. Biodiversity informatics will also be in the forefront of the development of user interfaces adequately representing highly complex data structures in distributed systems. Prospects: The Global Biodiversity Information FacilityIn the 1990ies, in the wake of the Convention on Biological Diversity, governments increasingly realized the scope of the needed informatics infrastructures related to biodiversity research. Consequently a (limited) dedication of research funding to took place, spearheaded by Australia and the United States, but also e.g. by Canada, Great Britain, and Mexico. In the course of tackling the problem on the national level, the global scope and magnitude of the problem became apparent. In January 1996, the U.S. submitted a proposal to the Organization for Economic Cooperation and Development's (OECD) Megascience Forum to create a Working Group on Biological Informatics, with a subgroup on Biodiversity Informatics (Edwards, pers. comm.). Over the following 3 years, this international expert panel elaborated a proposal for the creation of a Global Biodiversity Information Facility (GBIF, Edwards 1999). With the signing of an intergovernmental Memorandum of Understanding by 14 States the GBIF finally came into existence on the first of March, 2001. The creation of the GBIF, and the process leading to it, is of twofold importance. On the one hand, the Secretariat will provide funding bodies, institutions, and individuals with the much needed guidance on the allocation of resources and the availability of solutions. Current priorities for coordination and (to a limited amount) direct action address the problems specified above. Topmost items on the agenda are access to collection information and the creation of a global checklist of species names (for details see the GBIF Business Plan, GBIF 2000). On the other hand, the discussion process has alerted governmental funding bodies to the existence of the problem domain, and much discussion has be set in motion on the national level. The recent creation of specific programmes for Biodiversity Informatics in, e.g. Germany ("Biolog - Biodiversitätsinformatik, BMBF 1999), Belgium (OSTC 2000), and the creation of a European Biodiversity Information Network (EC 2001) with dedicated funding can be considered results of that process. Apart from the Secretariat, the GBIF is to be created by national research funding; so further programmes will have to follow. A first step to be taken by signatory states is to create national GBIF nodes, which are to organize and coordinate the contributions of member states. This process poses a challenge to existing structures many of which up to now work in splendid isolation, to co-ordinate their efforts and to clearly identify priorities. Conclusion: use of biodiversity information systemsThe realm of biodiversity informatics are information systems which use taxa, specimens, and (species) observation records as their reference and index systems. They are able to link geographic, climatic, and environmental information with data on the molecular or physiological aspect of organisms, with functional data, bioindicative value, human usage, natural substances, etc. Specimen and observation data are parts of a global environmental monitoring process, they document for example changes in the biotic spectrum of an area, including the occurrence and expansion of biological invasions, the distribution of indicator species (for example those mirroring air quality), the distribution of pests and diseases, changes in the behaviour of species, to name but a few. Biodiversity information systems also form part of obligations taken by governments e.g. in the context of the Convention on Biological Diversity, Bern Convention, Bonn Convention, and EU directives. Knowledge discovery and discovery of gaps in knowledge will be the direct result of the creation of a Global Biodiversity Information Facility. The role of biodiversity informatics in this context lies currently primarily in the documentation of biodiversity as a resource and of its change, the networking of inhomogeneous information resources, the development of adequate user interfaces for data capture and retrieval, the standardization of information exchange and content creation methods, the develpment of methods for quality control, and archiving as well as securing the recycling of research results. Literature citedBaud, R. H., Lovis, C., Rassinoux, A.-M. & Scherrer, J.-R. 1998: Alternative ways for knowledge collection, indexing and robust language retrieval. Meth. Inform. Med. 37: 315-326. Berendsohn, W. G. 1999: Names, Taxa, and Information. In: Blum, S. (ed.): Proceedings of the Taxonomic Authority Files Workshop, Washington, DC, June 22-23, 1998. - San Francisco. [http://research.calacademy.org/research/informatics/taf/proceedings/Berendsohn.html Berendsohn, W. G. (ed.) 2000: Resource Identification for a Biological Collection Information Service in Europe (BioCISE). - Berlin. Berendsohn, W. G., Costello, M. J., Emblow, C., Güntsch, A., Hahn, A., Koenemann, J., Thomas, C., Thomson, N. & White, R. 2000: Concepts for a European Portal to Biological Collections. Pp. 59-70 in: Berendsohn, W. G. (ed.), Resource Identification for a Biological Collection Information Service in Europe (BioCISE). - Berlin. BIOSIS 2001: Zoological Record. Zoological Society of London and BIOSIS U. K. - York. [http://www.biosis.org.uk/products_services/zoorecord.html] Bisby, F.A. 1998: Putting names to things and keeping track: the Species 2000 programme for a coordinated catalogue of life. - Pp. 59-68 in: Bridge, P., Jeffries, P., Morse, D. R. & Scott, P. R. (ed.), Information technology, plant pathology & biodiversity. - Oxon, New York. BMBF 2000: Federal Ministry of Education and Research. Announcement of the Funding Regulations Governing "Biodiversity and Global Change (BIOLOG)" under the Federal Government's Programme on "Research for the Environment" of 7 April 1999. - Bonn. CABI 2001: The CABI Bioscience Database of Fungal Names (Funindex). - Egham. [http://194.131.255.3/cabipages/Names/Names.asp] CONABIO 2001: Comisión Nacinal para el Conocimiento y Uso de la Biodiversidad. Homepage. [http://www.conabio.gob.mx/] Duckworth, W. D., Genoways, H. H. & Rose, C. L. 1993: Preserving natural science collections: Chronicle of our environmental heritage. - Washington. EC 2001: European Commission. Advance notice of a joint call for proposals of the specific programmes for research, technological development and demonstration on ‚Quality of life and management of living resources and ‚Energy, environment and sustainable development, Part A: Environment and sustainable development (1998 to 2002) to establish a European network of biodiversity information (ENBI). Off. J. Europ. Comm. 20.2.2001: C53/18. Edwards, J. L. 1999: The Global Biodiversity Information Facility: An international network of interoperable biodiversity Databases. ASC Newsletter June/August 1999:6-7 ENHSIN 2001: European Natural History Specimen Information Network. Homepage. [http://www.nhm.ac.uk/science/rco/enhsin/] ERIN 2001: Environmental Resources Information Network. Environment Australia. - Canberra. [http://www.ea.gov.au/sdd/erin/index.htm] Felinks, B., Hahn, A., Olsvig-Whittaker, L. & Los, W.: Users and uses of biological collections. Pp. 19-32 in: Berendsohn, W. G. (ed.), Resource Identification for a Biological Collection Information Service in Europe (BioCISE). - Berlin. Fishbase 2001: The Fishbase Consortium, Fishbase, a global information system on Fishes. - Manila. [http://www.fishbase.org/home.htm] GBIF 2000: Business Plan for the Global Biodiversity Information Facility. Discussion Draft 5. - Canberra. [http://www.gbif.org] IPNI 1999: The Plant Names Project. International Plant Names Index. - London, Harvard, Canberra. [http://www.ipni.org]. KUNHM 2001: The Species Analyst. Homepage. [http://habanero.nhm.ukans.edu/] MOST 2000: Moss Tropicos at the Missouri Botanical Garden. - St. Louis. [http://mobot.mobot.org/W3T/Search/most.html] OSTC 2000: Belgian Federal Office for Science, Technology and Cultural Affairs. Multiannual Information Society Support Programme. Call for Proposals. - Brussels. TDWG 2001: International Union of Biological Sciences, Taxonomic Database Working Group (TDWG). Homepage. - York. [http://www.tdwg.org] TDWG/BioCISE 2001: Standards, information models, and data dictionaries for biological collections. Ed.: W. G. Berendsohn. - Berlin. [http://www.bgbm.org/TDWG/acc/Referenc.htm] |