Ponencia
A Tool for Link-Based Web Page Classification
Autor/es | Hernández Salmerón, Inmaculada Concepción
Rivero, Carlos R. Ruiz Cortés, David Corchuelo Gil, Rafael |
Departamento | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos |
Fecha de publicación | 2011 |
Fecha de depósito | 2017-11-13 |
Publicado en |
|
ISBN/ISSN | 978-3-642-25273-0 |
Resumen | Virtual integration systems require a crawler to navigate
through web sites automatically, looking for relevant information. This
process is online, so whilst the system is looking for the required information,
the user ... Virtual integration systems require a crawler to navigate through web sites automatically, looking for relevant information. This process is online, so whilst the system is looking for the required information, the user is waiting for a response. Therefore, downloading a minimum number of irrelevant pages is mandatory to improve the crawler efficiency. Most crawlers need to download a page to determine its relevance, which results in a high number of irrelevant pages downloaded. In this paper, we propose a classifier that helps crawlers to efficiently navigate through web sites. This classifier is able to determine if a web page is relevant by analysing exclusively its URL, minimising the number of irrelevant pages downloaded, improving crawling efficiency and reducing used bandwidth, making it suitable for virtual integration systems. |
Identificador del proyecto | TIN2007-64119
P07-TIC-2602 P08- TIC-4100 TIN2008-04718-E TIN2010-21744 TIN2010-09809-E TIN2010-10811-E TIN2010-09988-E |
Cita | Hernández Salmerón, I.C., Rivero, C.R., Ruiz Cortés, D. y Corchuelo Gil, R. (2011). A Tool for Link-Based Web Page Classification. En CAEPIA 2011: 14th Conference of the Spanish Association for Artificial Intelligence (443-452), La Laguna, España: Springer. |
Ficheros | Tamaño | Formato | Ver | Descripción |
---|---|---|---|---|
A Tool for Link.pdf | 379.6Kb | [PDF] | Ver/ | |