Repositorio de producción científica de la Universidad de Sevilla

A Tool for Link-Based Web Page Classification

 

Advanced Search
 
Opened Access A Tool for Link-Based Web Page Classification
Cites

Show item statistics
Icon
Export to
Author: Hernández Salmerón, Inmaculada Concepción
Rivero, Carlos R.
Ruiz Cortés, David
Corchuelo Gil, Rafael
Department: Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos
Date: 2011
Published in: CAEPIA 2011: 14th Conference of the Spanish Association for Artificial Intelligence (2011), p 443-452
ISBN/ISSN: 978-3-642-25273-0
Document type: Presentation
Abstract: Virtual integration systems require a crawler to navigate through web sites automatically, looking for relevant information. This process is online, so whilst the system is looking for the required information, the user is waiting for a response. Therefore, downloading a minimum number of irrelevant pages is mandatory to improve the crawler efficiency. Most crawlers need to download a page to determine its relevance, which results in a high number of irrelevant pages downloaded. In this paper, we propose a classifier that helps crawlers to efficiently navigate through web sites. This classifier is able to determine if a web page is relevant by analysing exclusively its URL, minimising the number of irrelevant pages downloaded, improving crawling efficiency and reducing used bandwidth, making it suitable for virtual integration systems.
Cite: Hernández Salmerón, I.C., Rivero, C.R., Ruiz Cortés, D. y Corchuelo Gil, R. (2011). A Tool for Link-Based Web Page Classification. En CAEPIA 2011: 14th Conference of the Spanish Association for Artificial Intelligence (443-452), La Laguna, España: Springer.
Size: 379.6Kb
Format: PDF

URI: http://hdl.handle.net/11441/65970

DOI: 10.1007/978-3-642-25274-7_45

See editor´s version

This work is under a Creative Commons License: 
Attribution-NonCommercial-NoDerivatives 4.0 Internacional

This item appears in the following Collection(s)