Repositorio de producción científica de la Universidad de Sevilla

A Conceptual Framework for Efficient Web Crawling in Virtual Integration Contexts

Opened Access A Conceptual Framework for Efficient Web Crawling in Virtual Integration Contexts

Citas

buscar en

Estadísticas
Icon
Exportar a
Autor: Hernández Salmerón, Inmaculada Concepción
Sleiman, Hassan A.
Ruiz Cortés, David
Corchuelo Gil, Rafael
Departamento: Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos
Fecha: 2011
Publicado en: WISM 2011: International Conference on Web Information Systems and Mining (2011), p 282-291
ISBN/ISSN: 978-3-642-23981-6
Tipo de documento: Ponencia
Resumen: Virtual Integration systems require a crawling tool able to navigate and reach relevant pages in the Web in an efficient way. Existing proposals in the crawling area are aware of the efficiency problem, but still most of them need to download pages in order to classify them as relevant or not. In this paper, we present a conceptual framework for designing crawlers supported by a web page classifier that relies solely on URLs to determine page relevance. Such a crawler is able to choose in each step only the URLs that lead to relevant pages, and therefore reduces the number of unnecessary pages downloaded, optimising bandwidth and making it efficient and suitable for virtual integration systems. Our preliminary experiments show that such a classifier is able to distinguish between links leading to different kinds of pages, without previous intervention from the user.
Cita: Hernández Salmerón, I.C., Sleiman, H.A., Ruiz Cortés, D. y Corchuelo Gil, R. (2011). A Conceptual Framework for Efficient Web Crawling in Virtual Integration Contexts. En WISM 2011: International Conference on Web Information Systems and Mining (282-291), Taiyuan, China: Springer.
Tamaño: 211.6Kb
Formato: PDF

URI: http://hdl.handle.net/11441/65832

DOI: 10.1007/978-3-642-23982-3_35

Ver versión del editor

Mostrar el registro completo del ítem


Esta obra está bajo una Licencia Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 Internacional

Este registro aparece en las siguientes colecciones