AUTOMATIC GENERATION OF METADATA

The ZBW collects and indexes literature and subject-specific information in economics from all over the world. Indexing of form and content generates high-quality metadata for use in the ZBW’s portal EconBiz, and for the re-use by third parties.

The number of publications in economics is rising. The number of digital publications increases and opens up new technical possibilities. The strategy for indexing publications takes these changes into account. The working group “Automatic Indexation” at the ZBW is looking for automatic methods to complement the manual, ie intellectual, indexation, in order to continue providing high-quality and homogenous indexation and to cover as many publications as possible. Therefore the ZBW continually evaluates automatic procedures, develops them through its own research and implements them in productive indexing.

Formal metadata

One speciality of the ZBW is the bibliographic description of articles from journals and compilations, which creates on average 65,000 metadata sets p.a. for this type of document. The ZBW identifies and tests that enable (semi-)automatic generation of metadata for articles.

So far, the ZBW has tested procedures creating metadata for printed articles from the scanned and processed tables of contents of individual issues. For digital articles, the ZBW tested methods that re-use bibliographical records already available on the web. One of our research approaches is to use text and structure recognition procedures in order to extract formal and content-descriptive metadata elements from electronic full-texts automatically.

Content-descriptive metadata

The ZBW works with tools, procedures and processes for automatic subject indexing, but we also look at quality management and the copyright aspects of text and data mining.

Practice

The ZBW uses a fusion architecture system for automatic subject indexing. This combines associative and lexical methods in ways that retain the positive features of individual procedures as far as possible. This system architecture also allows to maintain individual quality requirements by incremental post-processing.

A team of subject specialists with indexing experience evaluates the quality of the automatic subject indexing with regular sampling and evaluation, thus supporting the further development of the algorithms.

Current productive procedures are based on content-descriptive metadata such as titles and keywords. Specific filtering rules ensure that substantial parts of the ZBW’s printed and digital holdings can be indexed precisely and exhaustively.

Research and development

The analysis of indexing methods has created the aforementioned fusion architecture and represents the focus of research activities of the working group. The special dynamic of topics in economic research must be taken into consideration. The procedures are directed mainly at the processing of short textual descriptions. Quality management is also an important subject of these research activities.

The working group looks at the latest research findings of the ZBW and other institutes and assesses their relevance for practical implementation. Several student papers have analysed various configurations of these procedures and individual methods of title processing, thus contributing to research transfer within the ZBW.