dc.creator | Roldán Salvador, Juan Carlos | es |
dc.creator | Jiménez Aguirre, Patricia | es |
dc.creator | Corchuelo Gil, Rafael | es |
dc.date.accessioned | 2022-04-08T07:17:04Z | |
dc.date.available | 2022-04-08T07:17:04Z | |
dc.date.issued | 2020 | |
dc.identifier.citation | Roldán Salvador, J.C., Jiménez Aguirre, P. y Corchuelo Gil, R. (2020). On Extracting Data from Tables that are Encoded using HTML. Knowledge-Based Systems, 190 (February 2020, art. nº 105157) | |
dc.identifier.issn | 0950-7051 | es |
dc.identifier.uri | https://hdl.handle.net/11441/131963 | |
dc.description.abstract | Tables are a common means to display data in human-friendly formats. Many
authors have worked on proposals to extract those data back since this has
many interesting applications. In this article, we summarise and compare many
of the proposals to extract data from tables that are encoded using HTML and
have been published between 2000 and 2018. We first present a vocabulary that
homogenises the terminology used in this field; next, we use it to summarise
the proposals; finally, we compare them side by side. Our analysis highlights
several challenges to which no proposal provides a conclusive solution and a
few more that have not been addressed sufficiently; simply put, no proposal
provides a complete solution to the problem, which seems to suggest that this
research field shall keep active in the near future. We have also realised that
there is no consensus regarding the datasets and the methods used to evaluate
the proposals, which hampers comparing the experimental results. | es |
dc.description.sponsorship | Ministerio de Economía y Competitividad TIN2013-40848-R | es |
dc.description.sponsorship | Ministerio de Economía y Competitividad TIN2016-75394-R | es |
dc.format | application/pdf | es |
dc.format.extent | 43 | es |
dc.language.iso | eng | es |
dc.publisher | Elsevier | es |
dc.relation.ispartof | Knowledge-Based Systems, 190 (February 2020, art. nº 105157) | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | HTML documents | es |
dc.subject | Web tables | es |
dc.subject | Table mining | es |
dc.subject | Data extraction | es |
dc.title | On Extracting Data from Tables that are Encoded using HTML | es |
dc.type | info:eu-repo/semantics/article | es |
dcterms.identifier | https://ror.org/03yxnpp24 | |
dc.type.version | info:eu-repo/semantics/submittedVersion | es |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es |
dc.contributor.affiliation | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos | es |
dc.relation.projectID | TIN2013-40848-R | es |
dc.relation.projectID | TIN2016-75394-R | es |
dc.relation.publisherversion | https://www.sciencedirect.com/science/article/pii/S095070511930509X?via%3Dihub | es |
dc.identifier.doi | 10.1016/j.knosys.2019.105157 | es |
dc.contributor.group | Universidad de Sevilla. TIC258: Data-centric Computing Research Hub | es |
dc.journaltitle | Knowledge-Based Systems | es |
dc.publication.volumen | 190 | es |
dc.publication.issue | February 2020, art. nº 105157 | es |
dc.identifier.sisius | 21888241 | es |
dc.contributor.funder | Ministerio de Economía y Competitividad (MINECO). España | es |