Opened Access A Tool for Link-Based Web Page Classification

Citas

buscar en

Estadísticas
Icon
Exportar a
Autor: Hernández Salmerón, Inmaculada Concepción
Rivero, Carlos R.
Ruiz Cortés, David
Corchuelo Gil, Rafael
Departamento: Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos
Fecha: 2011
Publicado en: CAEPIA 2011: 14th Conference of the Spanish Association for Artificial Intelligence (2011), p 443-452
ISBN/ISSN: 978-3-642-25273-0
Tipo de documento: Ponencia
Resumen: Virtual integration systems require a crawler to navigate through web sites automatically, looking for relevant information. This process is online, so whilst the system is looking for the required information, the user is waiting for a response. Therefore, downloading a minimum number of irrelevant pages is mandatory to improve the crawler efficiency. Most crawlers need to download a page to determine its relevance, which results in a high number of irrelevant pages downloaded. In this paper, we propose a classifier that helps crawlers to efficiently navigate through web sites. This classifier is able to determine if a web page is relevant by analysing exclusively its URL, minimising the number of irrelevant pages downloaded, improving crawling efficiency and reducing used bandwidth, making it suitable for virtual integration systems.
Cita: Hernández Salmerón, I.C., Rivero, C.R., Ruiz Cortés, D. y Corchuelo Gil, R. (2011). A Tool for Link-Based Web Page Classification. En CAEPIA 2011: 14th Conference of the Spanish Association for Artificial Intelligence (443-452), La Laguna, España: Springer.
Tamaño: 379.6Kb
Formato: PDF

URI: http://hdl.handle.net/11441/65970

DOI: 10.1007/978-3-642-25274-7_45

Ver versión del editor

Mostrar el registro completo del ítem


Esta obra está bajo una Licencia Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 Internacional

Este registro aparece en las siguientes colecciones