The Mammal Networked Information System

Project Summary (from proposal submitted to NSF, 8 Jan 2001)

The Museum of Vertebrate Zoology (MVZ), in collaboration with 16 other North American institutions, seeks funding to develop an integrated network for distributed databases of mammal specimen data. The objectives of this Mammal Networked Information System (MaNIS) are to 1) facilitate open access to combined specimen data from a web browser, 2) enhance the value of specimen collections, 3) conserve curatorial resources, and 4) use a design paradigm that can be easily adopted by other disciplines with similar needs. MaNIS is designed to achieve these objectives while avoiding both the long-term, external maintenance of a network and the centralized management of data. Development of this networked information system addresses the urgent call for natural history museums to come together to build and support a biodiversity informatics infrastructure in an open, collaborative manner.

This community-based approach to a common problem offers several distinct advantages. It is cost effective, imposing economies of scale and standardization that are both scalable and extensible. The increasing demand for large databases of natural history information as resources for inquiry learning at all educational levels is also easily addressed through the simplicity of a networked solution. Development of a networked information system allows institutions that could not finance, develop, or support such an enterprise on their own to distribute this information. Only by pooling and integrating collections data in a networked environment can museums enable their global use in research, education, and informed decision making. In an era of increasing population growth and accelerated environmental degradation, it is imperative that factual information about the earth’s biodiversity be readily available. With greatly improved access to these critical data, we can hope to maintain human health and welfare and wisely manage the earth's dwindling natural resources into the 21st century.

Using the ANSI/NISO Z39.50 standard for information retrieval, which has already been proven successful in enabling data sharing and exchange of knowledge among natural history collections, the proposed information system will consist of a network of equivalent servers. Each MaNIS server will consist of a web server with database and communication software on a dedicated workstation. Each MaNIS server will maintain a repository of public data derived from an existing local master database as well as summaries of the data that are resident on all servers on the network. This configuration will 1) insulate local master databases from unpredictable traffic and record locking, 2) obviate the need for additional security to protect master databases and the networks on which they reside, and 3) build a community capable of addressing shared technical problems. With each server as an equal partner, the network will also be protected from the numerous disadvantages of a centralized data warehouse. Though conceptually simple in design, the proposed network will make use of novel enhancements to existing technologies to allow efficient access to the combined data from any server on the network.

The proposed design is novel in two respects. First, there is no central repository or central server. Each MaNIS server will automatically maintain summary data (i.e., counts of specimen data and data dictionaries) from all servers on the network as well as tables of specimen data from its local master database. Second, the configuration of the replicated databases will be optimized for query performance rather than the curatorial utility for which the master databases are optimized. The design also allows institutions to retain control over public access to their data without creating new structures in their master databases. This control will be achieved by migrating data from the master databases to the MaNIS servers through scripts tailored to the rules of each institution.

The creation of a distributed database of mammal collections will represent the first time that many of these data will be available online and the first time that such data will be accessible together. This combined store of biodiversity knowledge will allow the predictive use of these data to reveal patterns and processes of evolutionary and ecological phenomena that have not been apparent heretofore. The simplicity of our network design will provide a low-cost opportunity for any institution to increase the visibility and use of its collections using a model that can be easily implemented within or across disciplines. It also provides ready access to detailed knowledge about the earth's biodiversity as we face the challenges of the 21st century.

John Wieczorek, 27 June 2001
Rev. 5 Sep 2002, JRW
University of California, Berkeley, CA 94720, Copyright © 2001, The Regents of the University of California.