Repositorio de producción científica de la Universidad de Sevilla

A Statistical Approach to URL-Based Web Page Clustering

Opened Access A Statistical Approach to URL-Based Web Page Clustering

Citas

buscar en

Estadísticas
Icon
Exportar a
Autor: Hernández Salmerón, Inmaculada Concepción
Rivero, Carlos R.
Ruiz Cortés, David
Corchuelo Gil, Rafael
Departamento: Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos
Fecha: 2012
Publicado en: WWW 2012: 21st International Conference on World Wide Web (2012), p 525-526
ISBN/ISSN: 978-1-4503-1230-1
Tipo de documento: Ponencia
Resumen: Most web page classifiers use features from the page content, which means that it has to be downloaded to be classified. We propose a technique to cluster web pages by means of their URL exclusively. In contrast to other proposals, we analyse features that are outside the page, hence, we do not need to download a page to classify it. Also, it is non-supervised, requiring little intervention from the user. Fur-thermore, we do not need to crawl extensively a site to build a classifier for that site, but only a small subset of pages. We have performed an experiment over 21 highly visited web-sites to evaluate the performance of our classifier, obtaining good precision and recall results.
Cita: Hernández Salmerón, I.C., Rivero, C.R., Ruiz Cortés, D. y Corchuelo Gil, R. (2012). A Statistical Approach to URL-Based Web Page Clustering. En WWW 2012: 21st International Conference on World Wide Web (525-526), Lyon, France: ACM.
Tamaño: 467.7Kb
Formato: PDF

URI: http://hdl.handle.net/11441/65918

DOI: 10.1145/2187980.2188109

Ver versión del editor

Mostrar el registro completo del ítem


Esta obra está bajo una Licencia Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 Internacional

Este registro aparece en las siguientes colecciones