idUS - Buscar

Mostrando ítems 1-10 de 18

Artículo

TOMATE: A heuristic-based approach to extract data from HTML tables

Roldán Salvador, Juan Carlos; Jiménez Aguirre, Patricia; Szekely, Pedro; Corchuelo Gil, Rafael (Elsevier, 2021)

Extracting data from user-friendly HTML tables is difficult because of their different lay outs, formats, and encoding problems. In this article, we present a new proposal that first applies several pre-processing heuristics ...

Ponencia

A Novel Approach to Web Information Extraction

Reina Quintero, Antonia María; Jiménez Aguirre, Patricia; Corchuelo Gil, Rafael (Springer, 2015)

Business Intelligence requires the acquisition and aggrega tion of key pieces of knowledge from multiple sources in order to provide valuable information to customers. The Web is the largest source of infor mation nowadays. ...

Ponencia

An Unsupervised Technique to Extract Information from Semi-structured Web Pages

Sleiman, Hassan A.; Corchuelo Gil, Rafael (Springer, 2012-11)

We propose a technique that takes two or more web pages generated by the same server-side template and tries to learn a regular expression that represents it and helps extract relevant information from similar pages. Our ...

Artículo

ARIEX: Automated ranking of information extractors

Jiménez Aguirre, Patricia; Corchuelo Gil, Rafael; Sleiman, Hassan A. (Elsevier, 2016)

Information extractors are used to transform the user-friendly information in a web document into structured information that can be used to feed a knowledge-based system. Researchers are interested in ranking them to ...

Artículo

A clustering approach to extract data from HTML tables

Jiménez Aguirre, Patricia; Roldán Salvador, Juan Carlos; Corchuelo Gil, Rafael (Elsevier, 2021)

HTML tables have become pervasive on the Web. Extracting their data automatically is difficult because finding the relationships between their cells is not trivial due to the many different layouts, encodings, and formats ...

Ponencia

A Novel Approach to Web Information Extraction

Reina Quintero, Antonia María; Jiménez Aguirre, Patricia; Corchuelo Gil, Rafael (Springer International Publishing AG, 2015-06)

Business Intelligence requires the acquisition and aggregation of key pieces of knowledge from multiple sources in order to provide valuable information to customers. The Web is the largest source of information nowadays. ...

Artículo

A Methodology to Evaluate the Maintainability of Enterprise Application Integration Frameworks

Frantz, Rafael Z.; Corchuelo Gil, Rafael; Roos Frantz, Fabricia (Inderscience Enterprises Ltd, 2015-12)

Consulting companies that specialise in Enterprise Application Integration commonly require adapting existing frameworks to specific domains. Currently, there are many such frameworks available, most of which provide a ...

Ponencia

Feeding Software Agents with Web Information

Jiménez Aguirre, Patricia; Sleiman, Hassan A.; Corchuelo Gil, Rafael (Springer, 2015)

Many software agents require information that is available in web documents. Unfortunately, the existing proposals to learn extrac tion rules are tightly coupled with the learning component and do not result in resilient ...

Artículo

Roller: A novel approach to web information extraction

Jiménez Aguirre, Patricia; Corchuelo Gil, Rafael (Springer, 2016)

The research regarding web information extraction focuses on learning rules to extract some selected information from web documents. Many proposals are ad-hoc and cannot benefit from the advances in machine learning; ...

Artículo

Trinity: On Using Trinary Trees for Unsupervised Web Data Extraction

Sleiman, Hassan A.; Corchuelo Gil, Rafael (IEEE Xplore, 2014-06)

Web data extractors are used to extract data from web documents in order to feed automated processes. In this article, we propose a technique that works on two or more web documents generated by the same server-side template ...

Buscar

Filtros

TOMATE: A heuristic-based approach to extract data from HTML tables

A Novel Approach to Web Information Extraction

An Unsupervised Technique to Extract Information from Semi-structured Web Pages

ARIEX: Automated ranking of information extractors

A clustering approach to extract data from HTML tables

A Novel Approach to Web Information Extraction

A Methodology to Evaluate the Maintainability of Enterprise Application Integration Frameworks

Feeding Software Agents with Web Information

Roller: A novel approach to web information extraction

Trinity: On Using Trinary Trees for Unsupervised Web Data Extraction