Article
A coral-reef approach to extract information from HTML tables
Author/s | Jiménez Aguirre, Patricia
![]() ![]() ![]() ![]() ![]() ![]() ![]() Roldán Salvador, Juan Carlos Corchuelo Gil, Rafael ![]() ![]() ![]() ![]() ![]() ![]() ![]() |
Department | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos |
Publication Date | 2022 |
Deposit Date | 2022-04-08 |
Published in |
|
Abstract | his article presents Coraline, which is a new table-understanding proposal. Its novelty lies in a
coral-reef optimisation algorithm that addresses the problem of feature selection in synchrony with a
clustering technique ... his article presents Coraline, which is a new table-understanding proposal. Its novelty lies in a coral-reef optimisation algorithm that addresses the problem of feature selection in synchrony with a clustering technique and some custom heuristics that help extract information in a totally unsupervised manner. Our experimental analysis was performed on a large collection of tables with a variety of layouts, encoding problems, and formatting alternatives. Coraline could achieve an F1 score as high as 0.90 and took 7.07 CPU seconds per table, which improves on the best supervised proposal by 6.67% regarding effectiveness and 40.54% regarding efficiency; it also improves on the best unsupervised proposal by 11.11% regarding effectiveness while it remains very competitive regarding efficiency |
Funding agencies | Ministerio de Ciencia e Innovación (MICIN). España Junta de Andalucía |
Project ID. | PID2020-112540RB-C44
![]() P18-RT-1060 ![]() |
Citation | Jiménez Aguirre, P., Roldán Salvador, J.C. y Corchuelo Gil, R. (2022). A coral-reef approach to extract information from HTML tables. Applied Soft Computing, 115 (January 2022, art. nº107980) |
Files | Size | Format | View | Description |
---|---|---|---|---|
1-s2.0-S1568494621009029-main.pdf | 1.686Mb | ![]() | View/ | |