Presentation
On Extracting Information from Semi-structured Deep Web Documents
Author/s | Jiménez Aguirre, Patricia
Corchuelo Gil, Rafael |
Department | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos |
Publication Date | 2015 |
Deposit Date | 2022-04-08 |
Published in |
|
ISBN/ISSN | 978-3-319-19026-6 1865-1348 |
Abstract | Some software agents need information that is provided by
some web sites, which is difficult if they lack a query API. Information
extractors are intended to extract the information of interest automati cally and offer ... Some software agents need information that is provided by some web sites, which is difficult if they lack a query API. Information extractors are intended to extract the information of interest automati cally and offer it in a structured format. Unfortunately, most of them rely on ad-hoc techniques, which make them fade away as the Web evolves. In this paper, we present a proposal that relies on an open catalogue of features that allows to adapt it easily; we have also devised an optimi sation that allows it to be very efficient. Our experimental results prove that our proposal outperforms other state-of-the-art proposals. |
Funding agencies | Ministerio de Educación y Ciencia (MEC). España Junta de Andalucía Ministerio de Ciencia e Innovación (MICIN). España Ministerio de Economia, Industria y Competitividad (MINECO). España Ministerio de Economía y Competitividad (MINECO). España |
Project ID. | TIN2007-64119
P07-TIC-2602 P08-TIC-4100 TIN2008-04718-E TIN2010-21744 TIN2010-09809-E TIN2010-10811-E TIN2010-09988-E TIN2011-15497-E TIN2013-40848-R |
Citation | Jiménez Aguirre, P. y Corchuelo Gil, R. (2015). On Extracting Information from Semi-structured Deep Web Documents. En BIS 2015 : 18th International Conference on Business Information Systems (140-151), Poznań, Poland: Springer. |
Files | Size | Format | View | Description |
---|---|---|---|---|
Jiménez-Corchuelo2015_Chapter_ ... | 3.051Mb | [PDF] | View/ | |