Repositorio de producción científica de la Universidad de Sevilla

A Statistical Approach to URL-Based Web Page Clustering


Advanced Search

Show simple item record

dc.creator Hernández Salmerón, Inmaculada Concepción es
dc.creator Rivero, Carlos R. es
dc.creator Ruiz Cortés, David es
dc.creator Corchuelo Gil, Rafael es 2017-11-10T10:18:21Z 2017-11-10T10:18:21Z 2012
dc.identifier.citation Hernández Salmerón, I.C., Rivero, C.R., Ruiz Cortés, D. y Corchuelo Gil, R. (2012). A Statistical Approach to URL-Based Web Page Clustering. En WWW 2012: 21st International Conference on World Wide Web (525-526), Lyon, France: ACM.
dc.identifier.isbn 978-1-4503-1230-1 es
dc.description.abstract Most web page classifiers use features from the page content, which means that it has to be downloaded to be classified. We propose a technique to cluster web pages by means of their URL exclusively. In contrast to other proposals, we analyse features that are outside the page, hence, we do not need to download a page to classify it. Also, it is non-supervised, requiring little intervention from the user. Fur-thermore, we do not need to crawl extensively a site to build a classifier for that site, but only a small subset of pages. We have performed an experiment over 21 highly visited web-sites to evaluate the performance of our classifier, obtaining good precision and recall results. es
dc.description.sponsorship Junta de Andalucía P08-TIC-4100 es
dc.description.sponsorship Ministerio de Ciencia e Innovación TIN2010-21744 es
dc.format application/pdf es
dc.language.iso eng es
dc.publisher ACM es
dc.relation.ispartof WWW 2012: 21st International Conference on World Wide Web (2012), p 525-526
dc.rights Attribution-NonCommercial-NoDerivatives 4.0 Internacional *
dc.rights.uri *
dc.subject URL Classification es
dc.subject URL Patterns es
dc.subject Web Page Clustering es
dc.title A Statistical Approach to URL-Based Web Page Clustering es
dc.type info:eu-repo/semantics/conferenceObject es
dc.type.version info:eu-repo/semantics/submittedVersion es
dc.rights.accessrights info:eu-repo/semantics/openAccess es
dc.contributor.affiliation Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos es
dc.relation.projectID P08-TIC-4100 es
dc.relation.projectID TIN2010-21744 es
dc.relation.publisherversion es
dc.identifier.doi 10.1145/2187980.2188109 es Universidad de Sevilla. TIC134: Sistemas Informáticos es
idus.format.extent 2 es
dc.publication.initialPage 525 es
dc.publication.endPage 526 es
dc.eventtitle WWW 2012: 21st International Conference on World Wide Web es
dc.eventinstitution Lyon, France es
dc.relation.publicationplace New York, USA es
dc.contributor.funder Junta de Andalucía
dc.contributor.funder Ministerio de Ciencia e Innovación (MICIN). España
Size: 467.7Kb
Format: PDF

This item appears in the following Collection(s)

Show simple item record