Preferred file formats in the ZBW Digital Archive

In principle, the ZBW Digital Curation team cannot influence in which file formats the archived material is acquired and provided. The first priority in collection development is the acquisition of content. Purchasing the objects in file formats which are suitable for long-term archiving is less important.

If the ZBW itself is the data producer which is the case in retrodigitization, the Digital Curation team recommends the target file formats. In such cases the following guidelines apply:

  • standardized (e. g. via an ISO-norm)
  • not depending on only one or few display tools
  • (preferably world-wide) distribution and widely used
  • open format or format is made open retroactively

The decision for the long-term preservation of an object is based on content only [1]. The objects are always archived in their original file format in the Digital Archive. If another file format for the kind of object is known which is more suitable for long-term availability, an additional representation in that format is created during the Preservation Planning [2].

Furthermore, the ZBW Digital Curation team always aims to find file formats for the archived objects which bear as few risks as possible for long-term availability.

Text-based content

One example for a file format which fulfills the above-mentioned guidelines is the PDF format. An ISO-norm (32000-1:2008) exists since 01/07/2008 and it is an open standard. There are numerous software programs available which support PDF viewing or PDF editing, many of which are not proprietary either. The PDF format was initially published in 1993 and is known and used all over the world. There is also the PDF/A-specification (ISO 19005-1:2005) which specifically supports long-term availability of digital content.

The majority of the digital content hosted at the ZBW is acquired via the open access server EconStor. The guidelines of EconStor state that the content should be delivered in PDF format only [3]. ZBW users prefer the PDF format for text-based content, so this guideline is in accordance with users’ wishes.

However, the PDF format is not suitable for every kind of content and focusses mainly on text-based content. For more simple content like images less complex file formats (like TIFF and JPEG/JPEG2000) are more recommendable.

Digitalised material and image data

If image data only contains text for which a full text search is either unnecessary or impossible due to the quality of the text, the objects usually are only saved as an image. The preferred file format is the TIFF-format. The tagged file format (TIFF) is an open standard which has been stable since 1992. It is very widely used. As there are many TIFF files which do not comply with the standard, file format validation is especially important [2].

AV media

AV media and executable content are not in the focus of the archive right now and play a minor part in the collections of the ZBW. That is why no preferred file formats have been defined yet for this kind of content.

Future work

Experience with the various file formats is growing continually. The ZBW Digital Curation team aims to limit the number of different file formats within the archive in order to ensure easily managed maintenance and to mitigate the risk of obsolescence.

File formats which nowadays are preferred to ensure long-term availability might be obsolescent in the near future. For that reason this text will be reviewed in terms of actuality at least once a year.