Metadata in the ZBW Digital Archive

In digital archiving, metadata are crucial for information retrieval and to ensure long-term availability. There are bibliographic, technical, administrative, structural and rights metadata.

Bibliographic metadata

The data and files that belong to an intellectual entity are archived including all the already generated bibliographic metadata. This is also the case for the catalogue data in the union catalogue GVK (Gemeinsamer Verbundkatalog): If the archived object already has a metadata entry in the GVK, the archived intellectual entity adopts the metadata from the union catalogue.

Within the Digital Archive the Dublin Core scheme is used for bibliographic metadata. If the original metadata have another format like the PICA format in the GVK, the metadata are mapped to Dublin Core. For this purpose, the ZBW department “A2 Integrated Acquisition and Cataloguing” has developed a mapping from PICA to Dublin Core.

Technical metadata

To ensure the long-term readability of the data, technical metadata are extracted automatically during the transfer to the archive. These metadata are:

  • File name
  • Original path
  • File size
  • File format and version of the file format
  • Well-formedness and validity
  • checksums
  • findings of the virus check
  • creation date of the file

If the extraction of the corresponding metadata does not work, the archiving process pauses and the responsible staff member adds the missing data manually or documents that parts of the metadata are not available. In some cases missing metadata (e. g. file format is unknown) can bear a risk for the long-term availability and the affected objects will be observed from the Risk management workflows.

Administrative metadata

Edits and updates of the objects are always documented in the metadata. It is also documented which programs and tools were used to alter or migrate the files and which staff member has made the modifications.

In addition, persistent identifiers ensure the retrieval of the objects in the archive. For instance, for material from DSpace document servers the handle is used. Additionally, the archival system Rosetta uses its own identifiers.

Structural metadata

For some data the relations among the archived units and the hierarchy of the objects also have a meaning. Such data are archived as a so called collection. Within a collection, sub-collections can be created to enable the representation of the hierarchy and the structure between the objects.

If the structure of and the relation between the files within an archived unit have meaning, the original paths are presented in the metadata.

Access Rights metadata

Within Rosetta, the access rights policies control all the information about access rights and possible restrictions. Usually, for each workflow a certain access right policy is chosen which is assigned to every archived item automatically. However, it is also possible to assign an access right policy for an item individually.

The Digital Archive is a dark archive without external access. Users access the data via other platforms. Only ZBW staff members from the digital curation team can access the data within Rosetta. Nevertheless, it is possible that the Digital Archive will allow external access in the future. As it contains data with limited access rights, the access rights policies that are assigned to the affected data during the archival process can be altered in terms of restrictions any time later.