dc.creator | Rivero, Carlos R. | es |
dc.creator | Ruiz Cortés, David | es |
dc.date.accessioned | 2021-02-17T11:08:50Z | |
dc.date.available | 2021-02-17T11:08:50Z | |
dc.date.issued | 2020 | |
dc.identifier.citation | Rivero, C.R. y Ruiz Cortés, D. (2020). Selecting Suitable Configurations for Automated Link Discovery. En SAC 2020: 35th Annual ACM Symposium on Applied Computing (907-914), Brno, Czech Republic: ACM Digital Library. | |
dc.identifier.isbn | 978-1-4503-6866-7 | es |
dc.identifier.uri | https://hdl.handle.net/11441/105072 | |
dc.description.abstract | Linking individuals in one dataset to other same individuals in
existing datasets is a major problem known as link discovery. Existing
automated link discovery techniques make users responsible
for selecting suitable properties, distances and transformations,
a.k.a. configurations, which is challenging for both researchers and
practitioners. Furthermore, failing to provide suitable configurations
dramatically increases the complexity of link discovery since
many configurations need to be evaluated. Current approaches to
help users select proper configurations assume datasets are not
heterogeneous or require the existence of a schema or ontology,
making them less appealing in the context of Linked Data. In this
paper, we present an approach to help users select suitable configurations
solely based on data, i.e., no schema or ontology is
required. We rely on the concepts of universality and uniqueness,
i.e., properties that are present in many individuals of the datasets
to link (universality) and do not have repeated objects (uniqueness).
We use the concept of singularity to focus on configurations in
which only a few individuals are very similar while the rest are
very dissimilar. We evaluate our approach using eight commonlyused
scenarios, in which, on average, we only suggest 5% of all
the possible configurations. Additionally, selected configurations
consistently generate links achieving high precision and recall with
respect to a ground truth. Finally, we provide a number of guidelines
to apply our approach in additional scenarios. | es |
dc.description.sponsorship | Ministerio de Economía y Competitividad TIN2016-75394-R | es |
dc.format | application/pdf | es |
dc.format.extent | 8 | es |
dc.language.iso | eng | es |
dc.publisher | ACM Digital Library | es |
dc.relation.ispartof | SAC 2020: 35th Annual ACM Symposium on Applied Computing (2020), pp. 907-914. | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Linked data | es |
dc.subject | Link discovery | es |
dc.subject | Data integration | es |
dc.title | Selecting Suitable Configurations for Automated Link Discovery | es |
dc.type | info:eu-repo/semantics/conferenceObject | es |
dcterms.identifier | https://ror.org/03yxnpp24 | |
dc.type.version | info:eu-repo/semantics/submittedVersion | es |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es |
dc.contributor.affiliation | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos | es |
dc.relation.projectID | TIN2016-75394-R | es |
dc.relation.publisherversion | https://dl.acm.org/doi/abs/10.1145/3341105.3373882 | es |
dc.identifier.doi | 10.1145/3341105.3373882 | es |
dc.publication.initialPage | 907 | es |
dc.publication.endPage | 914 | es |
dc.eventtitle | SAC 2020: 35th Annual ACM Symposium on Applied Computing | es |
dc.eventinstitution | Brno, Czech Republic | es |
dc.relation.publicationplace | New York, USA | es |
dc.contributor.funder | Ministerio de Economía y Competitividad (MINECO). España | es |