AUTOMATIZATION OF SUBJECT INDEXING USING METHODS FROM ARTIFICIAL INTELLIGENCE

The ZBW collects, processes and indexes literature and domain-specific information in the area of economics from all over the world. In order to ensure their findability ZBW annotates those resources with high-quality metadata which then serve as a database for the ZBW search portal EconBiz and can also be reused by other parties.

The number of publications, and of digital publications in particular, is rising in economics as well. At the same time, new technological options become available due to current developments in computer science and in information science which we adopt and integrate into our indexing strategy in order to ensure a continued coverage and quality of our metadata.

The automatization of various workflows in the indexing process and their seemless integration with intellectual cataloguing and subject indexing is a permanent task at ZBW. We continuously evaluate automated procedures, develop them further in the context of our research activities, and transfer them into productive operations.

Automatization of Subject Indexing (AutoSE)

In a research-based project (AutoIndex, until 2018) a ZBW-specific solution for subject indexing based on open source machine learning solutions was developed which combines several associative and lexical methods in a fusion approach, thereby achieving a higher performance level. The solution takes „short texts“ (title, author keywords) as an input from which it generates suggestions for descriptors from the Standardthesaurus Wirtschaft (STW) that adequately summarize the resource in question. Various rule-based postprocessing routines increase and secure the quality of the metadata thus generated. In addition, on occasion a group of experienced subject indexers evaluates random samples of the output intellectually, and we take their feedback as a base for the continuous development of the existing solution.

Research and Development

On top of the methods that we already use, we continually evaluate state-of-the-art results from subfields of the area of Artificial Intelligence, such as Deep Learning. Besides the automated generation of suggestions for descriptors, potential fields of application for neural networks comprise the control of the fusion approach in order to optimize the coordination of the individual methods, and an automated quality estimation on the document level so that documents can be conveyed directly towards the (automated or intellectual) subject indexing method that suits them best.

Additional topics that afford themselves for automatization are e.g. the extraction of structural elements from electronic fulltexts for the enrichment of metadata records, or the extraction of frequent terms in the context of automated subject indexing for the continuous development of the Standardthesaurus Wirtschaft (STW). In both areas we have obtained first results from two Master‘s theses that were written in the context of AutoIndex.

Transfer into Productive Operations

We focus specifically on the question how we can provide the research results that were obtained at ZBW systematically and sustainably as working instruments in our productive environment. In order to make progress in that direction, we identify the stakeholders that are involved, we analyze the necessary characteristics that a suitable software architecture for a productive use of our solutions has to feature, and we establish a roadmap for its implementation, in cooperation with the parties within ZBW that are involved. We also keep discussions going with national and international partners from research and from other information infrastructure institutions that face similar issues and challenges.

Publications

Publications on the subject can be found in the ZBW Publication Archive. Please search for the key words Automatic Subject Indexing.

For more presentations and publications also see the publication list of Anna Kasprzik.