Ponencia
Towards a Method for Unsupervised Web Information Extraction
Autor/es | Sleiman, Hassan A.
Corchuelo Gil, Rafael |
Departamento | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos |
Fecha de publicación | 2012-07 |
Fecha de depósito | 2023-03-30 |
Publicado en |
|
ISBN/ISSN | 978-3-642-31752-1 (impreso) 978-3-642-31753-8 (online) |
Resumen | The literature provides a variety of techniques to build the information extractors on which some data integration systems rely. In formation extraction techniques are usually based on extraction rules that require maintenance ... The literature provides a variety of techniques to build the information extractors on which some data integration systems rely. In formation extraction techniques are usually based on extraction rules that require maintenance and adaptation if web sources change. We present our preliminary steps towards an unsupervised information ex traction technique that searches web documents for shared patterns and fragments them until finding the relevant information that should be ex tracted. Experimental results on 1230 real-web documents demonstrate that our system performs fast and achieves promising results. |
Agencias financiadoras | Ministerio de Ciencia y Tecnología (MCYT). España Junta de Andalucía Ministerio de Ciencia e Innovación (MICIN). España Ministerio de Economía, Industria y Competitividad |
Identificador del proyecto | TIN2007-64119
P07-TIC-2602 P08- TIC-4100 TIN2008-04718-E TIN2010- 21744 TIN2010-09809-E TIN2010-10811-E TIN2010-09988-E |
Cita | Sleiman, H.A. y Corchuelo Gil, R. (2012). Towards a Method for Unsupervised Web Information Extraction. En 12th International Conference: Web Engineering (ICWE 2012) (427-430), Berlín (Alemania): Springer. |
Ficheros | Tamaño | Formato | Ver | Descripción |
---|---|---|---|---|
Towards a method for unsupervised ... | 107.9Kb | [PDF] | Ver/ | |