Mostrar el registro sencillo del ítem
Artículo
CALA: An unsupervised URL-based web page classification system
dc.creator | Hernández Salmerón, Inmaculada Concepción | es |
dc.creator | Rivero, Carlos R. | es |
dc.creator | Ruiz Cortés, David | es |
dc.creator | Corchuelo Gil, Rafael | es |
dc.date.accessioned | 2017-11-22T11:28:27Z | |
dc.date.available | 2017-11-22T11:28:27Z | |
dc.date.issued | 2014 | |
dc.identifier.citation | Hernández Salmerón, I.C., Rivero, C.R., Ruiz Cortés, D. y Corchuelo Gil, R. (2014). CALA: An unsupervised URL-based web page classification system. Knowledge-Based Systems, 57 (February 2014), 168-180. | |
dc.identifier.issn | 0950-7051 | es |
dc.identifier.uri | http://hdl.handle.net/11441/66444 | |
dc.description.abstract | Unsupervised web page classification refers to the problem of clustering the pages in a web site so that each cluster includes a set of web pages that can be classified using a unique class. The existing proposals to perform web page classification do not fulfill a number of requirements that would make them suitable for enterprise web information integration, namely: to be based on a lightweight crawling, so as to avoid interfering with the normal operation of the web site, to be unsupervised, which avoids the need for a training set of pre-classified pages, or to use features from outside the page to be classified, which avoids having to download it. In this article, we propose CALA, a new automated proposal to generate URL-based web page classifiers. Our proposal builds a number of URL patterns that represent the different classes of pages in a web site, so further pages can be classified by matching their URLs to the patterns. Its salient features are that it fulfills all of the previous requirements, and it has been validated by a number of experiments using real-world, top-visited web sites. Our validation proves that CALA is very effective and efficient in practice. | es |
dc.description.sponsorship | Ministerio de Educación y Ciencia TIN2007-64119 | es |
dc.description.sponsorship | Junta de Andalucía P07-TIC-2602 | es |
dc.description.sponsorship | Junta de Andalucía P08- TIC-4100 | es |
dc.description.sponsorship | Ministerio de Ciencia e Innovación TIN2008-04718-E | es |
dc.description.sponsorship | Ministerio de Ciencia e Innovación TIN2010-21744 | es |
dc.description.sponsorship | Ministerio de Ciencia e Innovación TIN2010-09809-E | es |
dc.description.sponsorship | Ministerio de Ciencia e Innovación TIN2010-10811-E | es |
dc.description.sponsorship | Ministerio de Ciencia e Innovación TIN2010-09988-E | es |
dc.description.sponsorship | Ministerio de Economía y Competitividad TIN2011-15497-E | es |
dc.format | application/pdf | es |
dc.language.iso | eng | es |
dc.publisher | Elsevier | es |
dc.relation.ispartof | Knowledge-Based Systems, 57 (February 2014), 168-180. | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Web Page Classification | es |
dc.subject | URL Classification | es |
dc.subject | URL Patterns | es |
dc.subject | Enterprise web information integration | es |
dc.subject | Web Page Clustering | es |
dc.title | CALA: An unsupervised URL-based web page classification system | es |
dc.type | info:eu-repo/semantics/article | es |
dcterms.identifier | https://ror.org/03yxnpp24 | |
dc.type.version | info:eu-repo/semantics/submittedVersion | es |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es |
dc.contributor.affiliation | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos | es |
dc.relation.projectID | TIN2007-64119 | es |
dc.relation.projectID | P07-TIC-2602 | es |
dc.relation.projectID | P08-TIC-4100 | es |
dc.relation.projectID | TIN2008-04718-E | es |
dc.relation.projectID | TIN2010-21744 | es |
dc.relation.projectID | TIN2010-09809-E | es |
dc.relation.projectID | TIN2010-10811-E | es |
dc.relation.projectID | TIN2010-09988-E | es |
dc.relation.projectID | TIN2011-15497-E | es |
dc.relation.publisherversion | http://www.sciencedirect.com/science/article/pii/S0950705113003997 | es |
dc.identifier.doi | 10.1016/j.knosys.2013.12.019 | es |
dc.contributor.group | Universidad de Sevilla. TIC134: Sistemas Informáticos | es |
idus.format.extent | 13 | es |
dc.journaltitle | Knowledge-Based Systems | es |
dc.publication.volumen | 57 | es |
dc.publication.issue | February 2014 | es |
dc.publication.initialPage | 168 | es |
dc.publication.endPage | 180 | es |
dc.identifier.sisius | 20649208 | es |