idUS - Buscar

Mostrando ítems 1-2 de 2

Artículo

TOMATE: A heuristic-based approach to extract data from HTML tables

Roldán Salvador, Juan Carlos; Jiménez Aguirre, Patricia; Szekely, Pedro; Corchuelo Gil, Rafael (Elsevier, 2021)

Extracting data from user-friendly HTML tables is difficult because of their different lay outs, formats, and encoding problems. In this article, we present a new proposal that first applies several pre-processing heuristics ...

Artículo

A clustering approach to extract data from HTML tables

Jiménez Aguirre, Patricia; Roldán Salvador, Juan Carlos; Corchuelo Gil, Rafael (Elsevier, 2021)

HTML tables have become pervasive on the Web. Extracting their data automatically is difficult because finding the relationships between their cells is not trivial due to the many different layouts, encodings, and formats ...