Mostrar el registro sencillo del ítem

Artículo

dc.creatorJiménez Aguirre, Patriciaes
dc.creatorRoldán Salvador, Juan Carloses
dc.creatorCorchuelo Gil, Rafaeles
dc.date.accessioned2022-04-07T07:57:01Z
dc.date.available2022-04-07T07:57:01Z
dc.date.issued2021
dc.identifier.citationJiménez Aguirre, P., Roldán Salvador, J.C. y Corchuelo Gil, R. (2021). A clustering approach to extract data from HTML tables. Information Processing and Management, 58 (6, art.nº102683)
dc.identifier.issn0306-4573es
dc.identifier.urihttps://hdl.handle.net/11441/131911
dc.description.abstractHTML tables have become pervasive on the Web. Extracting their data automatically is difficult because finding the relationships between their cells is not trivial due to the many different layouts, encodings, and formats available. In this article, we introduce Melva, which is an unsupervised domain-agnostic proposal to extract data from HTML tables without requiring any external knowledge bases. It relies on a clustering approach that helps make label cells apart from value cells and establish their relationships. We compared Melva to four competitors on more than 3 000 HTML tables from the Wikipedia and the Dresden Web Table Corpus. The conclusion is that our proposal is 21.70% better than the best unsupervised competitor and equals the best supervised competitor regarding effectiveness, but it is 99.14% better regarding efficiencyes
dc.description.sponsorshipMinisterio de Ciencia e Innovación PID2020-112540RB-C44es
dc.description.sponsorshipMinisterio de Economía y Competitividad TIN2016-75394-Res
dc.description.sponsorshipJunta de Andalucía P18-RT-1060es
dc.formatapplication/pdfes
dc.format.extent13es
dc.language.isoenges
dc.publisherElsevieres
dc.relation.ispartofInformation Processing and Management, 58 (6, art.nº102683)
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectHTML tableses
dc.subjectData extractiones
dc.subjectClusteringes
dc.subjectGenetic algorithmses
dc.titleA clustering approach to extract data from HTML tableses
dc.typeinfo:eu-repo/semantics/articlees
dcterms.identifierhttps://ror.org/03yxnpp24
dc.type.versioninfo:eu-repo/semantics/submittedVersiones
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses
dc.contributor.affiliationUniversidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticoses
dc.relation.projectIDPID2020-112540RB-C44es
dc.relation.projectIDTIN2016-75394-Res
dc.relation.projectIDP18-RT-1060es
dc.relation.publisherversionhttps://www.sciencedirect.com/science/article/pii/S0306457321001680?via%3Dihubes
dc.identifier.doi10.1016/j.ipm.2021.102683es
dc.contributor.groupUniversidad de Sevilla. TIC258: Data-centric Computing Research Hubes
dc.journaltitleInformation Processing and Managementes
dc.publication.volumen58es
dc.publication.issue6, art.nº102683es
dc.contributor.funderMinisterio de Ciencia e Innovación (MICIN). Españaes
dc.contributor.funderMinisterio de Economía y Competitividad (MINECO). Españaes
dc.contributor.funderJunta de Andalucíaes

FicherosTamañoFormatoVerDescripción
A clustering approach to extract ...1.440MbIcon   [PDF] Ver/Abrir  

Este registro aparece en las siguientes colecciones

Mostrar el registro sencillo del ítem

Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Excepto si se señala otra cosa, la licencia del ítem se describe como: Attribution-NonCommercial-NoDerivatives 4.0 Internacional