Ponencia
An Experiment to Test URL Features for Web Page Classification
Autor/es | Hernández Salmerón, Inmaculada Concepción
Rivero, Carlos R. Ruiz Cortés, David Arjona Fernández, José Luis |
Departamento | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos |
Fecha de publicación | 2012 |
Fecha de depósito | 2017-11-17 |
Publicado en |
|
ISBN/ISSN | 978-3-642-28794-7 1865-1348 |
Resumen | Web page classification has been extensively researched, using different
types of features that are extracted either from the page content, the page structure
or from other pages that link to that page. Using features ... Web page classification has been extensively researched, using different types of features that are extracted either from the page content, the page structure or from other pages that link to that page. Using features from the page itself implies having to download it before its classification. We present an experiment to proof that URL tokens contain information enough to extract features to classify web pages. A classifier based on these features is able to classify a web page without having to download it previously, avoiding unnecessary downloads. |
Identificador del proyecto | TIN2007-64119
P07-TIC-2602 P08- TIC-4100 TIN2008-04718-E TIN2010-21744 TIN2010-09809-E TIN2010-10811-E TIN2010-09988-E |
Cita | Hernández Salmerón, I.C., Rivero, C.R., Ruiz Cortés, D. y Arjona Fernández, J.L. (2012). An Experiment to Test URL Features for Web Page Classification. En PAAMS 2012: 10th International Conference on Practical Applications of Agents and Multi-Agent Systems (109-116), Salamanca, España: Springer. |
Ficheros | Tamaño | Formato | Ver | Descripción |
---|---|---|---|---|
An Experiment to Test.pdf | 320.8Kb | [PDF] | Ver/ | |