PREFERRED FILE FORMATS IN THE DIGITAL ARCHIVE OF THE ZBW

In principle, the ZBW team Digital Curation cannot influence in which file formats the archived material is acquired and provided. The first priority in collection development is the acquisition of content. Purchasing the objects in file formats which are suitable for long-term archiving is less important.

If the ZBW itself is the data producer which is the case in retrodigitization, the team Digital Curation recommends the target file formats. In such cases the following guidelines apply:

  • standardized (e. g. via an ISO-norm)
  • not depending on only one or few display tools
  • (preferably world-wide) distribution and widely used
  • open format or format is made open retroactively

The decision for the long-term preservation of an object is based on content only [1].  The objects are always archived in their original file format in the Digital Archive. If another file format for the kind of object is known which is more suitable for long-term availability, an additional representation in that format is created during the Preservation Planning [2].

Furthermore, the ZBW team Digital Curation always aims to find file formats for the archived objects which bear as few risks as possible for long-term availability.

Text-based content

One example for a file format which fulfills the above-mentioned guidelines is the PDF format. An ISO-norm (32000-1:2008) exists since 01/07/2008 and it is an open standard. There are numerous software programs available which support PDF viewing or PDF editing, many of which are not proprietary either. The PDF format was initially published in 1993 and is known and used all over the world. There is also the PDF/A-specification (ISO 19005-1:2005) which specifically supports long-term availability of digital content.

The majority of the digital content hosted at the ZBW is acquired via the open access server EconStor. The guidelines of EconStor state that the content should be delivered in PDF format only [3]. ZBW users prefer the PDF format for text-based content, so this guideline is in accordance with users’ wishes.

However, the PDF format is not suitable for all kinds of content and focusses mainly on text-based content. For more simple content like images less complex file formats (like TIFF and JPEG/JPEG2000) are more recommendable.

Digitized material and image data

If image data only contains text for which a full text search either is unnecessary or impossible due to the quality of the text, the objects usually are only saved as an image. The preferred file format is the TIFF-format. The tagged file format (TIFF) is an open standard which has been stable since 1992. It is very widely used. As there are many TIFF files which do not comply with the standard, file format validation is especially important [2].

AV Media

AV media and executable content are not in the focus of the archive right now and play a minor part in the collections of the ZBW. That is why no preferred file formats have been defined yet for this kind of content.

Future work

Experience with the various file formats is growing continually. The ZBW Digital Curation team aims to limit the number of different file formats within the archive in order to ensure easily managed maintenance and mitigate the risk of obsolescence.

File formats which nowadays are preferred to ensure long-term availability might be obsolescent in the near future. For that reason this text will be reviewed in terms of actuality at least once a year.