Show simple item record


dc.creatorRivero, Carlos
dc.creatorRuiz Cortés, Davides
dc.identifier.citationRivero, C.R. y Ruiz Cortés, D. (2020). Selecting Suitable Configurations for Automated Link Discovery. En SAC 2020: 35th Annual ACM Symposium on Applied Computing (907-914), Brno, Czech Republic: ACM Digital Library.
dc.description.abstractLinking individuals in one dataset to other same individuals in existing datasets is a major problem known as link discovery. Existing automated link discovery techniques make users responsible for selecting suitable properties, distances and transformations, a.k.a. configurations, which is challenging for both researchers and practitioners. Furthermore, failing to provide suitable configurations dramatically increases the complexity of link discovery since many configurations need to be evaluated. Current approaches to help users select proper configurations assume datasets are not heterogeneous or require the existence of a schema or ontology, making them less appealing in the context of Linked Data. In this paper, we present an approach to help users select suitable configurations solely based on data, i.e., no schema or ontology is required. We rely on the concepts of universality and uniqueness, i.e., properties that are present in many individuals of the datasets to link (universality) and do not have repeated objects (uniqueness). We use the concept of singularity to focus on configurations in which only a few individuals are very similar while the rest are very dissimilar. We evaluate our approach using eight commonlyused scenarios, in which, on average, we only suggest 5% of all the possible configurations. Additionally, selected configurations consistently generate links achieving high precision and recall with respect to a ground truth. Finally, we provide a number of guidelines to apply our approach in additional
dc.description.sponsorshipMinisterio de Economía y Competitividad TIN2016-75394-Res
dc.publisherACM Digital Libraryes
dc.relation.ispartofSAC 2020: 35th Annual ACM Symposium on Applied Computing (2020), pp. 907-914.
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.subjectLinked dataes
dc.subjectLink discoveryes
dc.subjectData integrationes
dc.titleSelecting Suitable Configurations for Automated Link Discoveryes
dc.contributor.affiliationUniversidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticoses
dc.eventtitleSAC 2020: 35th Annual ACM Symposium on Applied Computinges
dc.eventinstitutionBrno, Czech Republices
dc.relation.publicationplaceNew York, USAes
dc.contributor.funderMinisterio de Economía y Competitividad (MINECO). Españaes

Selecting suitable configurations ...2.364MbIcon   [PDF] View/Open  

This item appears in the following collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Except where otherwise noted, this item's license is described as: Attribution-NonCommercial-NoDerivatives 4.0 Internacional