Ponencia
A Reference Architecture to Devise Web Information Extractors
Autor/es | Sleiman, Hassan A.
Corchuelo Gil, Rafael |
Departamento | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos |
Fecha de publicación | 2012-06 |
Fecha de depósito | 2023-03-15 |
Publicado en |
|
ISBN/ISSN | 978-3-642-31068-3 (impreso) 978-3-642-31069-0 (online) |
Resumen | The Web is the largest repository of human-friendly information. Unfortunately, web information is embedded in formatting tags and is surrounded by irrelevant information. Researchers are working on information extractors ... The Web is the largest repository of human-friendly information. Unfortunately, web information is embedded in formatting tags and is surrounded by irrelevant information. Researchers are working on information extractors that allow transforming this information into structured data for its later integration into automated processes. Devising a new information extraction technique requires an array of tasks that are specific to this technique and many tasks that are actually common between all techniques. The lack of a reference architectural proposal in the literature to guide software engineers in the design and implementation of information extractors, amounts to little reuse and the focus is usually blurred because of irrelevant details. In this paper, we present a reference architecture to design and implement rule learners for information extractors. We have implemented a software framework to support our architecture, and we have validated it by means of four case studies and a number of experiments that prove that our proposal helps reduce development costs significantly. |
Agencias financiadoras | Ministerio de Educación y Ciencia (MEC). España Junta de Andalucía Ministerio de Ciencia e Innovación (MICIN). España Ministerio de Economía, Industria y Competitividad |
Identificador del proyecto | TIN2007-64119
P07-TIC-2602 P08-TIC-4100 TIN2008-04718-E TIN2010-21744 TIN2010-09809-E TIN2010-10811-E TIN2010-09988-E |
Cita | Sleiman, H.A. y Corchuelo Gil, R. (2012). A Reference Architecture to Devise Web Information Extractors. En CAiSE 2012: Advanced Information Systems Engineering Workshops (235-248), Gdańsk (Polonia): SpringerLink. |
Ficheros | Tamaño | Formato | Ver | Descripción |
---|---|---|---|---|
A reference architecture to ... | 183.2Kb | [PDF] | Ver/ | |