Ponencia
On Mining DOM Trees to build Information Extractors
Autor/es | Fernández, Gretel
Sleiman, Hassan A. Corchuelo Gil, Rafael Zancan Frantz, Rafael |
Departamento | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos |
Fecha de publicación | 2011 |
Fecha de depósito | 2023-03-31 |
Publicado en |
|
Resumen | The Web is the largest information repository. The information it contains is usually available in human-friendly formats. Companies are interested in using this information. The problem is that they need it in structured ... The Web is the largest information repository. The information it contains is usually available in human-friendly formats. Companies are interested in using this information. The problem is that they need it in structured formats so that they can use it in automated business processes. In the literature, there are many proposals to infer information extractors. They build on machine learning techniques that attempt to infer a pattern in the HTML or XPath sources. To the best of our knowledge, no-one has ever explored using datamining techniques on DOM trees. In this paper, we report on a methodology that builds on datamining CSS features and a few other DOM features. Our results prove that this methodology is promising. |
Agencias financiadoras | Ministerio de Ciencia y Tecnología (MCYT). España Junta de Andalucía Ministerio de Ciencia e Innovación (MICIN). España Ministerio de Economía, Industria y Competitividad |
Identificador del proyecto | TIN2007-64119
P07- TIC-2602 P08-TIC-4100 TIN2008-04718-E TIN2010-09809-E TIN2010-10811-E TIN2010-09988-E |
Cita | Fernández, G., Sleiman, H.A., Corchuelo Gil, R. y Zancan Frantz, R. (2011). On Mining DOM Trees to build Information Extractors. En International Conference on Internet Computing (ICOMP 2011). |
Ficheros | Tamaño | Formato | Ver | Descripción |
---|---|---|---|---|
On mining dom trees to build ... | 119.4Kb | [PDF] | Ver/ | |