Show simple item record

Article

dc.creatorJiménez Aguirre, Patriciaes
dc.creatorRoldán Salvador, Juan Carloses
dc.creatorCorchuelo Gil, Rafaeles
dc.date.accessioned2022-04-08T10:53:08Z
dc.date.available2022-04-08T10:53:08Z
dc.date.issued2022
dc.identifier.citationJiménez Aguirre, P., Roldán Salvador, J.C. y Corchuelo Gil, R. (2022). A coral-reef approach to extract information from HTML tables. Applied Soft Computing, 115 (January 2022, art. nº107980)
dc.identifier.issn1568-4946es
dc.identifier.urihttps://hdl.handle.net/11441/131990
dc.description.abstracthis article presents Coraline, which is a new table-understanding proposal. Its novelty lies in a coral-reef optimisation algorithm that addresses the problem of feature selection in synchrony with a clustering technique and some custom heuristics that help extract information in a totally unsupervised manner. Our experimental analysis was performed on a large collection of tables with a variety of layouts, encoding problems, and formatting alternatives. Coraline could achieve an F1 score as high as 0.90 and took 7.07 CPU seconds per table, which improves on the best supervised proposal by 6.67% regarding effectiveness and 40.54% regarding efficiency; it also improves on the best unsupervised proposal by 11.11% regarding effectiveness while it remains very competitive regarding efficiencyes
dc.description.sponsorshipMinisterio de Ciencia e Innovación PID2020-112540RB-C44es
dc.description.sponsorshipJunta de Andalucía P18-RT-1060es
dc.formatapplication/pdfes
dc.format.extent9es
dc.language.isoenges
dc.publisherElsevieres
dc.relation.ispartofApplied Soft Computing, 115 (January 2022, art. nº107980)
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectHTML tableses
dc.subjectInformation extractiones
dc.subjectCoral-reef optimisationes
dc.subjectFeature selectiones
dc.subjectClusteringes
dc.titleA coral-reef approach to extract information from HTML tableses
dc.typeinfo:eu-repo/semantics/articlees
dc.type.versioninfo:eu-repo/semantics/submittedVersiones
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses
dc.contributor.affiliationUniversidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticoses
dc.relation.projectIDPID2020-112540RB-C44es
dc.relation.projectIDP18-RT-1060es
dc.relation.publisherversionhttps://www.sciencedirect.com/science/article/pii/S1568494621009029?via%3Dihubes
dc.identifier.doi10.1016/j.asoc.2021.107980es
dc.contributor.groupUniversidad de Sevilla. TIC258: Data-centric Computing Research Hubes
dc.journaltitleApplied Soft Computinges
dc.publication.volumen115es
dc.publication.issueJanuary 2022, art. nº107980es
dc.contributor.funderMinisterio de Ciencia e Innovación (MICIN). Españaes
dc.contributor.funderJunta de Andalucíaes

FilesSizeFormatViewDescription
1-s2.0-S1568494621009029-main.pdf1.686MbIcon   [PDF] View/Open  

This item appears in the following collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Except where otherwise noted, this item's license is described as: Attribution-NonCommercial-NoDerivatives 4.0 Internacional