idUS - Buscar

Mostrando ítems 1-10 de 11

Artículo

TOMATE: A heuristic-based approach to extract data from HTML tables

Roldán Salvador, Juan Carlos; Jiménez Aguirre, Patricia; Szekely, Pedro; Corchuelo Gil, Rafael (Elsevier, 2021)

Extracting data from user-friendly HTML tables is difficult because of their different lay outs, formats, and encoding problems. In this article, we present a new proposal that first applies several pre-processing heuristics ...

Artículo

ARIEX: Automated ranking of information extractors

Jiménez Aguirre, Patricia; Corchuelo Gil, Rafael; Sleiman, Hassan A. (Elsevier, 2016)

Information extractors are used to transform the user-friendly information in a web document into structured information that can be used to feed a knowledge-based system. Researchers are interested in ranking them to ...

Artículo

A clustering approach to extract data from HTML tables

Jiménez Aguirre, Patricia; Roldán Salvador, Juan Carlos; Corchuelo Gil, Rafael (Elsevier, 2021)

HTML tables have become pervasive on the Web. Extracting their data automatically is difficult because finding the relationships between their cells is not trivial due to the many different layouts, encodings, and formats ...

Artículo

On Extracting Data from Tables that are Encoded using HTML

Roldán Salvador, Juan Carlos; Jiménez Aguirre, Patricia; Corchuelo Gil, Rafael (Elsevier, 2020)

Tables are a common means to display data in human-friendly formats. Many authors have worked on proposals to extract those data back since this has many interesting applications. In this article, we summarise and compare ...

Artículo

Roller: A novel approach to web information extraction

Jiménez Aguirre, Patricia; Corchuelo Gil, Rafael (Springer, 2016)

The research regarding web information extraction focuses on learning rules to extract some selected information from web documents. Many proposals are ad-hoc and cannot benefit from the advances in machine learning; ...

Artículo

On Learning Web Information Extraction Rules with TANGO

Jiménez Aguirre, Patricia; Corchuelo Gil, Rafael (Elsevier, 2016)

The research on Enterprise Systems Integration focuses on proposals to support business processes by re-using existing systems. Wrappers help re-use web ap plications that provide a user interface only. They emulate a ...

Artículo

On the synthesis of metadata tags for HTML files

Jiménez Aguirre, Patricia; Roldán Salvador, Juan Carlos; Gallego, Fernando O.; Corchuelo Gil, Rafael (Wiley, 2020)

RDFa, JSON-LD, Microdata, and Microformats allow to endow the data in HTML files with metadata tags that help software agents understand them. Unluckily, there are many HTML files that do not have any metadata tags, which ...

Artículo

On validating web information extraction proposals

Jiménez Aguirre, Patricia; Corchuelo Gil, Rafael (Elsevier, 2022)

Many people who have to make informed decisions in today’s always-on culture use information extractors to feed their systems with information that comes from human-friendly documents. Unfortunately, many proposals that ...

Artículo

On exploring data lakes by finding compact, isolated clusters

Jiménez Aguirre, Patricia; Roldán Salvador, Juan Carlos; Corchuelo Gil, Rafael (Elsevier, 2022)

Data engineers are very interested in data lake technologies due to the incredible abun dance of datasets. They typically use clustering to understand the structure of the datasets before applying other methods to infer ...

Artículo

A coral-reef approach to extract information from HTML tables

Jiménez Aguirre, Patricia; Roldán Salvador, Juan Carlos; Corchuelo Gil, Rafael (Elsevier, 2022)

his article presents Coraline, which is a new table-understanding proposal. Its novelty lies in a coral-reef optimisation algorithm that addresses the problem of feature selection in synchrony with a clustering technique ...

Buscar

Filtros

TOMATE: A heuristic-based approach to extract data from HTML tables

ARIEX: Automated ranking of information extractors

A clustering approach to extract data from HTML tables

On Extracting Data from Tables that are Encoded using HTML

Roller: A novel approach to web information extraction

On Learning Web Information Extraction Rules with TANGO

On the synthesis of metadata tags for HTML files

On validating web information extraction proposals

On exploring data lakes by finding compact, isolated clusters

A coral-reef approach to extract information from HTML tables