Ponencia
A Novel Approach to Web Information Extraction
Autor/es | Reina Quintero, Antonia María
Jiménez Aguirre, Patricia Corchuelo Gil, Rafael |
Departamento | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos |
Fecha de publicación | 2015 |
Fecha de depósito | 2022-04-11 |
Publicado en |
|
ISBN/ISSN | 978-3-319-19026-6 1865-1348 |
Resumen | Business Intelligence requires the acquisition and aggrega tion of key pieces of knowledge from multiple sources in order to provide
valuable information to customers. The Web is the largest source of infor mation nowadays. ... Business Intelligence requires the acquisition and aggrega tion of key pieces of knowledge from multiple sources in order to provide valuable information to customers. The Web is the largest source of infor mation nowadays. Unfortunately, the information it provides is available in semi-structured human-friendly formats, which makes it difficult to be processed by automated business processes. Classical propositional and ILP machine-learning techniques have been applied for this pur pose. However, the former have not enough expressive power, whereas the latter are more expressive but intractable with large datasets. Propo sitionalisation was devised as a means to provide propositional techniques with more expressive power, enabling them to exploit structural infor mation in a propositional way that allows them to be efficient. In this paper, we present a proposal to extract information from semi-structured web documents that uses this approach. It leverages a classical propo sitional machine learning technique and enhances it with the ability to learn from an unbounded context, which helps increase its precision and recall. Our experiments prove that our proposal outperforms other state of-art techniques in the literature. |
Agencias financiadoras | Ministerio de Educación y Ciencia (MEC). España Junta de Andalucía Ministerio de Ciencia e Innovación (MICIN). España Ministerio de Economia, Industria y Competitividad (MINECO). España Ministerio de Economía y Competitividad (MINECO). España |
Identificador del proyecto | TIN2007-64119
P07-TIC-2602 P08-TIC-4100 TIN2008-04718-E TIN2010-21744 TIN2010-09809-E TIN2010-10811-E TIN2010-09988-E TIN2011-15497-E TIN2013-40848-R |
Cita | Reina Quintero, A.M., Jiménez Aguirre, P. y Corchuelo Gil, R. (2015). A Novel Approach to Web Information Extraction. En BIS 2015 : 18th International Conference on Business Information Systems (152-161), Poznań, Poland: Springer. |
Ficheros | Tamaño | Formato | Ver | Descripción |
---|---|---|---|---|
ReinaQuintero2015_Chapter_ANov ... | 415.8Kb | [PDF] | Ver/ | |