The application of artificial intelligence methods plays an important role in certain areas at the ZBW. To continuously access and structure information according to certain requirements, to analyse information in its context, and to make it searchable with intelligent tools, the ZBW takes up the latest findings, methods, and tools of artificial intelligence, tests their practical viability, and either implements them in own applications or uses them for further development.
In particular, the ZBW is engaged in applied research in the field of machine learning and focuses on the following research and development topics:
Subject indexing, i.e. the annotation of literature resources with semantic information, is a core task of the ZBW in order to facilitate the discovery of relevant literature in the collection. Due to the digital flood of publications, it is hardly possible to annotate all resources intellectually, so that automation strategies have to be considered. From the perspective of machine learning, subject indexing is a so-called multi-label classification challenge, and with the rise of artificial intelligence in recent years, more and more methods, including open-source software, are now available to solve this task.
Especially deep learning methods (i.e. the use of multiple layers of neural networks) are currently attracting a lot of attention. A study on automatic subject indexing at the ZBW showed that neural networks were superior to all classical methods available at the time (such as nearest-neighbour classifiers or support vector machines) under the circumstances investigated, and the quality of the results was at a comparable level if only the title was used instead of the full text for subject indexing. Furthermore, it could be shown that neural networks using only title data even provide better results than full-text models under certain circumstances.
Results from applied research at the ZBW have also shown that a combination of several methods produces better results because the strengths of individual methods continue to come to bear, but their weaknesses are less significant. Machine learning methods can also be used to estimate the quality of the expected output of individual methods for different inputs and to decide accordingly which method should be applied to which resource. New research results on methods from the statistical or lexical-semantic area as well as for neural networks are constantly flowing into the applications.
The challenge of a practical application is to adapt the AI methods to the library context of the ZBW, to the controlled vocabulary and metadata, as well as to find sufficient training data for these methods. In the AutoSE (Automatic Subject Indexing), the ZBW is tackling the question of how machine learning solutions for automated subject indexing developed in-house can be integrated sustainably as a productive procedure in the library indexing process and how these can be continuously developed further during ongoing operations.
In addition to automated subject indexing, the methods of artificial intelligence also offer development opportunities for innovative downstream applications. With the help of Natural Language Processing, for example, it is possible to highlight similarities and differences between an existing collection of literature and a single document.
Users of the ZBW usually use the search portal Econbiz for their literature search. They formulate search queries and receive a sorted list of documents that match their search query. The individual knowledge of the user is not taken into account.
Using the latest insights in computational linguistics, prototypes are currently in development at the ZBW that are able to meet this challenge. Natural language processing/understanding methods can be used to provide information on how a new text fits into the documents already read. It is helping users to assess a search result, for example, to decide whether the content is completely new, or are there overlaps with documents that have already been read? The algorithms used here derive contexts and facts from single words and sentences in the texts and their relationships. Meaningful and appropriate keywords will be identified and associated concepts are recognised.
Currently, the ZBW is also researching the use of neural networks for further building blocks of literature search. In particular, the research focuses on possible applications and the factors influencing the developed models (e.g. network structure or the title of a publication). Recent developments for literature recommendation systems can, for example, point to possible missing citations based on the works cited or recommend further descriptors based on descriptors already indexed.
Another research field deals with the use of word vectors and matrices for literature search. Word vectors allow the detection of semantic similarities and connections between different words. With the help of machine learning, they can be generated from large amounts of unstructured text data. By taking the order of the words into account, which is usually neglected in classical word vectors, they can be extended to word matrices. Both concepts will be explored according to their potential to improve the response to search engine queries.
The existing metadata resources at the ZBW and in partner institutions are also highly appropriate for research on machine learning processes. In the project Q-Aktiv, the ZBW is studying the learning of representations in dynamic networks from bibliographic metadata. The objective of the Q-Aktiv project is to learn a representation for the concepts of a controlled vocabulary. The network structure between research papers, authors, concepts, journals, and institutions serves as a data basis. So far, different techniques for learning representations on concepts have been compared. Currently, this methodology is being extended for dynamic networks that change over time, for example, in order to analyse and predict scientific dynamics.
With these adaptations of existing AI procedures to the library context, the ZBW provides a transfer into practice. The research and concrete implementation into existing ZBW products and services provide valuable insights and lead to even better access to relevant information in economics. While working on these topics, we maintain an exchange with other libraries and research institutions. The ZBW is willing to contribute the application-oriented experiences and findings gained by the information science community, especially in machine learning, to current discourses and debates on the topic of information retrieval and artificial intelligence.
The Connect & Collect (CoCo) project coordinates the network of regional competence centres for labour research. The research and development activities aim at a "cloud of labour research" that supports interdisciplinary cooperation, promotes technological and social innovations, and provides structures for sustainable knowledge transfer.
To support the competence centres, the CoCo project is developing infrastructure with AI-supported tools and innovative methods that will enable a new approach to interdisciplinary labour research. A "cloud of labour research" will provide opportunities for cooperation and networking. An important component is a data and knowledge repository that enables the joint use of research data, knowledge, and resources. Also, it will support the transfer of results to industry.
The project will start in cooperation with three Fraunhofer Institutes, DIE, and ZBW in March 2021 for four years. ZBW is the leading technology partner.
- Fraunhofer Institute for Industrial Engineering IAO
Coordinator, networking and R&D on future work, infrastructure design, development of AI modules
- ZBW – Leibniz Information Centre for Economics
Digital information infrastructures, open science, research on incentive systems, infrastructure development
- Fraunhofer Institute for Factory Operation and Automation IFF
socio-technical knowledge and collaboration systems, conception/development of data and knowledge storage
- Fraunhofer Center for International Management and Knowledge Economy IMW
Objective, business models, concept/development of the cloud of labour research, events for competence centres
- German Institute for Adult Education - Leibniz Centre for Lifelong Learning (DiE)
Lifelong learning, network moderation, continuous evaluation
- Galke, Lukas / Mai, Florian / Schelten, Alan / Brunsch, Dennis / Scherp, Ansgar:
Using Titles vs. Full-text as Source for Automated Semantic Document Annotation
In: K-CAP 2017: Proceedings of the Knowledge Capture Conference, Article No. 20, Austin, TX, USA — December 04 - 06, 2017 New York, NY: ACM, 2017, doi:10.1145/3148011.3148039
- Mai, Florian / Galke, Lukas / Scherp, Ansgar:
Using Deep Learning for Title-Based Semantic Subject Indexing to Reach Competitive Performance to Full-Text
In: Proceedings of the 18th ACM/IEEE on Joint Conference on Digital Libraries New York: ACM Press, 2018, pp. 169-178, doi:10.1145/3197026.3197039
- Toepfer, Martin / Seifert, Christin:
Descriptor-Invariant Fusion Architectures for Automatic Subject Indexing.
In: Proceedings of Joint Conference on Digital Libraries, 2017 Piscataway: IEEE, 2017, pp. 1-10, doi:10.1109/JCDL.2017.7991557
- Toepfer, Martin / Seifert, Christin:
Fusion architectures for automatic subject indexing under concept drift.
In: Int J Digit Libr 21, pp. 169–189, 2020, doi.org/10.1007/s00799-018-0240-3;
- Toepfer, Martin / Seifert, Christin:
Content-Based Quality Estimation for Automatic Subject Indexing of Short Texts Under Precision and Recall Constraints
In: Digital Libraries for Open Knowledge TPDL at TPDL: International Conference on Theory and Practice of Digital Libraries, 2018, Pages 3-15, doi.org/10.1007/978-3-030-00066-0_1
- Mikolov, Tomas / Sutskever, Ilya / Chen, Kai / Corrado, Greg S. / Dean, Jeff:
Distributed Representations of Words and Phrases and their Compositionality
Part of: Advances in Neural Information Processing Systems 26 (NIPS 2013)