Risk management and preservation planning in the ZBW Digital Archive

In principle, digital data bear the risk of not being accessible in future times. Possible risks are:

  • File format obsolescence (decoder software is not available any more)
  • File is not valid and does not match the file format specifications and is therefore not accessible (any more)
  • The bitstream is no longer intact (this risk is mitigated via the backup strategy and the integrity check)

The ZBW Digital Archive offers a detailed risk management. It is possible to define risks to each file format, if there is a known risk. Furthermore, it is possible to define a risk to a file format that only applies under certain circumstances. For instance, a file format only bears a risk for long-term availability if the file does not match the file format specifications or if the file format exceeds a certain size.

The digital curation team always describes risks as detailed as possible. All the content archived in the digital archive is tested via the risk management tool on a regular basis.  For instance, the risks can be mitigated by migration to a more recent file format that bears fewer risks for the long-term availability.

The EconStor Deposit Licence of the Open Access document server EconStor includes the permission to convert to another format (Chapter 4).

For the national licences hosted at the ZBW, the ZBW usually also has the right to conversion as most of the national and alliance licences contain the following paragraph:  

“The Licensee is further permitted to make such copies or re-format the Licensed Material contained in the archival copies supplied by the Publisher in any way to ensure their future preservation and accessibility in accordance with this Licence.”

For content created at the ZBW, e. g. in the context of retrodigitisation, the ZBW has the right to convert and edit the data.

Possible risks and countermeasures in the ZBW Digital Archive:

File does not meet the format specifications, Example: invalid PDF.Creation of an additional representation of the file in the same file format, which meets the format specifications and is valid.
PDF file uses non-embedded fonts, therefore the correct display of the file cannot be guaranteed on every reading device.Retroactive embedding of the used fonts (if these are not copyright-protected).
Format identification tool cannot identify file format correctly, because there is data behind the EOF tag (End of File which marks the end of a file).Delete via a script the not required and distracting data behind the EOF tag, so that the file format can be identified correctly.
Data is not complete, e. g. parts of the image are missing and/or there is no EOF-tag.If possible: Ask the data producer for the complete file.
Alternative: Save the file again so that it is in itself complete again (incl. EOF tag). The missing image data will still be missing. But some tools cannot display files without EOF tag at all so that this kind of file bears a special risk for long-term access.
File format is unknown to the used format library.If the sample of the files of the unknown format is sufficiently big, add the file format to the format library.
Alternative: files with an unknown file format are especially risky, because the detailed risks are unknown as well. Document the data as well as possible.

Such countermeasures are planned and executed within the preservation planning module of the Digital Archive. All measures taken to ensure long-term availability of the archived content belong to the preservation planning.

All executed actions during the preservation planning are documented sustainably in the archive as is the preservation plan itself. This is also true for used software, which is also archived sustainably to document all alterations and conversions as transparently as possible.

The original file generally is long-term archived, even if it is obsolete and cannot be accessed easily any more. All additional derivates are additional representations and are archived on top and not instead.

The detailed workflow of the preservation planning in the ZBW Digital Archive is documented on the website of the TIB - Leibniz Information Centre for Science and Technology and University Library. The ZBW closely collaborates with the TIB on long-term preservation issues.

In line with the preservation watch, the digital preservation staff members always keep informed about which risks might apply to which file formats and what kind of countermeasures are current best practice. The workflows are regularly updated, evaluated and altered to meet the aim of long-term readability for the users of the ZBW.