Mostrar el registro sencillo del ítem

Artículo

dc.creatorJiménez Aguirre, Patriciaes
dc.creatorRoldán Salvador, Juan Carloses
dc.creatorCorchuelo Gil, Rafaeles
dc.date.accessioned2022-04-08T11:12:59Z
dc.date.available2022-04-08T11:12:59Z
dc.date.issued2022
dc.identifier.citationJiménez Aguirre, P., Roldán Salvador, J.C. y Corchuelo Gil, R. (2022). A hybrid quantum approach to leveraging data from HTML tables. Knowledge and Information Systems, 64 (2), 441-474.
dc.identifier.issn0219-1377es
dc.identifier.urihttps://hdl.handle.net/11441/131991
dc.description.abstractThe Web provides many data that are encoded using HTML tables. This facilitates rendering them, but obfuscates their structure and makes it difficult for automated business processes to leverage them. This has motivated many authors to work on proposals to extract them as automatically as possible. In this article, we present a new unsupervised proposal that uses a hybrid approach in which a standard computer is used to perform pre and post-processing tasks and a quantum computer is used to perform the core task: guessing whether the cells have labels or values. The problem is addressed using a clustering approach that is known to be NP using standard computers, but our proposal can solve it in polynomial time, which implies a significant performance improvement. It is novel in that it relies on an entropy-preservation metaphor that has proven to work very well on two large collections of real-world tables from the Wikipedia and the Dresden Web Table Corpus. Our experiments prove that our proposal can beat the state-of-the-art proposal in terms of both effectiveness and efficiency; the key difference is that our proposal is totally unsupervised, whereas the state-of-the-art proposal is supervised.es
dc.description.sponsorshipMinisterio de Economía y Competitividad TIN2016-75394-Res
dc.description.sponsorshipMinisterio de Ciencia e Innovación PID2020-112540RB-C44es
dc.description.sponsorshipJunta de Andalucía P18-RT-1060es
dc.formatapplication/pdfes
dc.format.extent34es
dc.language.isoenges
dc.publisherSpringeres
dc.relation.ispartofKnowledge and Information Systems, 64 (2), 441-474.
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectHTML tableses
dc.subjectData extractiones
dc.subjectQuantum computinges
dc.titleA hybrid quantum approach to leveraging data from HTML tableses
dc.typeinfo:eu-repo/semantics/articlees
dcterms.identifierhttps://ror.org/03yxnpp24
dc.type.versioninfo:eu-repo/semantics/submittedVersiones
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses
dc.contributor.affiliationUniversidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticoses
dc.relation.projectIDTIN2016-75394-Res
dc.relation.projectIDPID2020-112540RB-C44es
dc.relation.projectIDP18-RT-1060es
dc.relation.publisherversionhttps://link.springer.com/article/10.1007/s10115-021-01636-7es
dc.identifier.doi10.1007/s10115-021-01636-7es
dc.contributor.groupUniversidad de Sevilla. TIC258: Data-centric Computing Research Hubes
dc.journaltitleKnowledge and Information Systemses
dc.publication.volumen64es
dc.publication.issue2es
dc.publication.initialPage441es
dc.publication.endPage474es
dc.contributor.funderMinisterio de Economía y Competitividad (MINECO). Españaes
dc.contributor.funderMinisterio de Ciencia e Innovación (MICIN). Españaes
dc.contributor.funderJunta de Andalucíaes

FicherosTamañoFormatoVerDescripción
Jiménez2022_Article_AHybridQua ...2.168MbIcon   [PDF] Ver/Abrir  

Este registro aparece en las siguientes colecciones

Mostrar el registro sencillo del ítem

Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Excepto si se señala otra cosa, la licencia del ítem se describe como: Attribution-NonCommercial-NoDerivatives 4.0 Internacional