Metadata in digital preservation
Metadata describes the attributes and characteristics of objects in a digital archive or repository. This makes it an essential part of any digital preservation strategy. Assigning and defining metadata allows us to identify and locate the digital objects in a digital archive. Furthermore, it makes objects accessible, understandable and reusable and also ensures the integrity and authenticity of digital resources.
The metadata used in digital preservation strategies represents a cross-section of all the different types of metadata, inluding:
Descriptive metadata primarily describes the content of a digital resource. It puts a resource in its context and ensures it is retrievable over the long term and can be properly understood in the future. To secure long-term access to archived resources, the description of each resource’s content (e.g. author, title, publishing institution, date of publication) must be preserved as fully as possible in the form of metadata within an information package.
The Rosetta digital preservation system, which is employed by ZB MED as its technical infrastructure, uses the Dublin Core (Dublin Core: dublincore.org) metadata element set to describe information resources. This metadata set is a popular and widely recognised standard for internet resources.
Broadly speaking, ZB MED’s digital preservation system uses two methods to access metadata sources, in each case choosing sources that have sufficient and viable metadata for the collections being archived.
For the first method cataloguing ZB MED’s holdings is necessary. During this process, teams of library staff enter comprehensive bibliographic metadata for each item in the Union Catalogue run by the North Rhine-Westphalian Library Service Centre (hbz). When digital objects are transferred to the archive for permanent storage, they are simultaneously enriched with metadata from the Union Catalogue, as the first method. In this process, the Aleph fields used in the Union Catalogue are mapped (i.e. assigned) to Dublin Core elements.
The second method incorporates Dublin Core metadata delivered through OAI interfaces set up by publishing platforms such as the German Medical Science portal (ZB MED/BfArM). In each case, the most appropriate metadata source is selected based on the requirements of the digital archive and the collection being archived.
Technical metadata is information about the creation, conversion and formatting aspects of the process used to preserve digital objects. Technical metadata is essential for ensuring the technical integrity, intactness and long-term readability of digital objects. It forms the basis of a successful preservation management strategy and includes the following elements:
- file name
- original path
- file size
- file format
- file well-formedness, file validity
- results of the virus test
Whenever a digital object is transferred to the archive (ingest), technical metadata is automatically collected, extracted and documented in the information package. The system identifies any missing information, which can then be added manually by specialist staff where required.
Structural metadata provides information on how the components of a digital information resource are organised and related so that the resource can be reconstructed and made available for use.
In order to map the logical structure (hierarchy, original, versions) of the metadata and of the digital objects, links are created between and within the data and metadata of an information entity (intellectual entity). The standard for structural metadata, METS (Metadata Encoding & Transmission Standard) contains different sections. In METS sections of a METS file, different metadata types can be deposited. Among the METS sections are e.g. the descriptive metadata section and the administrative metadata section.
Administrative metadata provides traceable documentation of all the internal processes relating to digital objects that are carried out within an institution over time.
- the origin of digital objects (creation, acquisition),
- the measures taken to preserve them (software, logs, methods, author),
- persistent identifiers,
- contractual arrangements for archiving,
- and information on rights.
By providing information on rights, the metadata clarifies whether there are any restrictions on accessing the object or whether it is freely accessible.
Before responsibility for the data passes to the digital archive, the legal status of digital objects must be known and recorded as information in the generated data.
Information is stored in the metadata to indicate which rights of use have been granted and clarify the general archiving rights that apply (right of redundant storage and transformation/migration into other formats).