Artículo
A coral-reef approach to extract information from HTML tables
Autor/es | Jiménez Aguirre, Patricia
![]() ![]() ![]() ![]() ![]() ![]() ![]() Roldán Salvador, Juan Carlos Corchuelo Gil, Rafael ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Departamento | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos |
Fecha de publicación | 2022 |
Fecha de depósito | 2022-04-08 |
Publicado en |
|
Resumen | his article presents Coraline, which is a new table-understanding proposal. Its novelty lies in a
coral-reef optimisation algorithm that addresses the problem of feature selection in synchrony with a
clustering technique ... his article presents Coraline, which is a new table-understanding proposal. Its novelty lies in a coral-reef optimisation algorithm that addresses the problem of feature selection in synchrony with a clustering technique and some custom heuristics that help extract information in a totally unsupervised manner. Our experimental analysis was performed on a large collection of tables with a variety of layouts, encoding problems, and formatting alternatives. Coraline could achieve an F1 score as high as 0.90 and took 7.07 CPU seconds per table, which improves on the best supervised proposal by 6.67% regarding effectiveness and 40.54% regarding efficiency; it also improves on the best unsupervised proposal by 11.11% regarding effectiveness while it remains very competitive regarding efficiency |
Agencias financiadoras | Ministerio de Ciencia e Innovación (MICIN). España Junta de Andalucía |
Identificador del proyecto | PID2020-112540RB-C44
![]() P18-RT-1060 ![]() |
Cita | Jiménez Aguirre, P., Roldán Salvador, J.C. y Corchuelo Gil, R. (2022). A coral-reef approach to extract information from HTML tables. Applied Soft Computing, 115 (January 2022, art. nº107980) |
Ficheros | Tamaño | Formato | Ver | Descripción |
---|---|---|---|---|
1-s2.0-S1568494621009029-main.pdf | 1.686Mb | ![]() | Ver/ | |