2021-02-172021-02-172020Rivero, C.R. y Ruiz Cortés, D. (2020). Selecting Suitable Configurations for Automated Link Discovery. En SAC 2020: 35th Annual ACM Symposium on Applied Computing (907-914), Brno, Czech Republic: ACM Digital Library.978-1-4503-6866-7https://hdl.handle.net/11441/105072Linking individuals in one dataset to other same individuals in existing datasets is a major problem known as link discovery. Existing automated link discovery techniques make users responsible for selecting suitable properties, distances and transformations, a.k.a. configurations, which is challenging for both researchers and practitioners. Furthermore, failing to provide suitable configurations dramatically increases the complexity of link discovery since many configurations need to be evaluated. Current approaches to help users select proper configurations assume datasets are not heterogeneous or require the existence of a schema or ontology, making them less appealing in the context of Linked Data. In this paper, we present an approach to help users select suitable configurations solely based on data, i.e., no schema or ontology is required. We rely on the concepts of universality and uniqueness, i.e., properties that are present in many individuals of the datasets to link (universality) and do not have repeated objects (uniqueness). We use the concept of singularity to focus on configurations in which only a few individuals are very similar while the rest are very dissimilar. We evaluate our approach using eight commonlyused scenarios, in which, on average, we only suggest 5% of all the possible configurations. Additionally, selected configurations consistently generate links achieving high precision and recall with respect to a ground truth. Finally, we provide a number of guidelines to apply our approach in additional scenarios.application/pdf8engAttribution-NonCommercial-NoDerivatives 4.0 Internacionalhttp://creativecommons.org/licenses/by-nc-nd/4.0/Linked dataLink discoveryData integrationSelecting Suitable Configurations for Automated Link Discoveryinfo:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/openAccess10.1145/3341105.3373882