dc.creator | Jiménez Aguirre, Patricia | es |
dc.creator | Roldán Salvador, Juan Carlos | es |
dc.creator | Corchuelo Gil, Rafael | es |
dc.date.accessioned | 2022-04-08T10:53:08Z | |
dc.date.available | 2022-04-08T10:53:08Z | |
dc.date.issued | 2022 | |
dc.identifier.citation | Jiménez Aguirre, P., Roldán Salvador, J.C. y Corchuelo Gil, R. (2022). A coral-reef approach to extract information from HTML tables. Applied Soft Computing, 115 (January 2022, art. nº107980) | |
dc.identifier.issn | 1568-4946 | es |
dc.identifier.uri | https://hdl.handle.net/11441/131990 | |
dc.description.abstract | his article presents Coraline, which is a new table-understanding proposal. Its novelty lies in a
coral-reef optimisation algorithm that addresses the problem of feature selection in synchrony with a
clustering technique and some custom heuristics that help extract information in a totally unsupervised
manner. Our experimental analysis was performed on a large collection of tables with a variety of
layouts, encoding problems, and formatting alternatives. Coraline could achieve an F1 score as high as
0.90 and took 7.07 CPU seconds per table, which improves on the best supervised proposal by 6.67%
regarding effectiveness and 40.54% regarding efficiency; it also improves on the best unsupervised
proposal by 11.11% regarding effectiveness while it remains very competitive regarding efficiency | es |
dc.description.sponsorship | Ministerio de Ciencia e Innovación PID2020-112540RB-C44 | es |
dc.description.sponsorship | Junta de Andalucía P18-RT-1060 | es |
dc.format | application/pdf | es |
dc.format.extent | 9 | es |
dc.language.iso | eng | es |
dc.publisher | Elsevier | es |
dc.relation.ispartof | Applied Soft Computing, 115 (January 2022, art. nº107980) | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | HTML tables | es |
dc.subject | Information extraction | es |
dc.subject | Coral-reef optimisation | es |
dc.subject | Feature selection | es |
dc.subject | Clustering | es |
dc.title | A coral-reef approach to extract information from HTML tables | es |
dc.type | info:eu-repo/semantics/article | es |
dc.type.version | info:eu-repo/semantics/submittedVersion | es |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es |
dc.contributor.affiliation | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos | es |
dc.relation.projectID | PID2020-112540RB-C44 | es |
dc.relation.projectID | P18-RT-1060 | es |
dc.relation.publisherversion | https://www.sciencedirect.com/science/article/pii/S1568494621009029?via%3Dihub | es |
dc.identifier.doi | 10.1016/j.asoc.2021.107980 | es |
dc.contributor.group | Universidad de Sevilla. TIC258: Data-centric Computing Research Hub | es |
dc.journaltitle | Applied Soft Computing | es |
dc.publication.volumen | 115 | es |
dc.publication.issue | January 2022, art. nº107980 | es |
dc.contributor.funder | Ministerio de Ciencia e Innovación (MICIN). España | es |
dc.contributor.funder | Junta de Andalucía | es |