Research Project
Full description Traditional named entity linking (NEL) tools have largely employed a generalized approach, spanning across various entities such as persons, organizations, locations, and events in a multitude of contexts. While multi-modal entity linking datasets exist (e.g., disambiguation of person names with the help of photographs), there is a need to develop domain-specific resources that represent the unique challenges present in domains like cultural heritage (e.g., stylistic changes through time, diversity of social and political contexts). To address this gap, our work presents the development of a novel multimodal entity linking benchmark dataset for the art domain together with a comprehensive experimental evaluation of contemporary NEL methods. We apply distinct metrics that assess ambiguity and diversity in the data, offering a thorough evaluation of our dataset. We introduce an automated process that facilitates the generation of art datasets, harnessing data from multiple sources (ArtPedia, Wikidata and Wikimedia Commons) to ensure reliability and comprehensiveness. Furthermore, our paper provides extensive statistical data, delineates best practices for the integration of art datasets, and presents a detailed performance analysis of entity linking systems, when applied to domain-specific datasets. Through our research, we aim to bridge the lack of datasets for NEL in the art domain, paving the way for more nuanced and contextually rich entity linking methods in the realm of art and cultural heritage.