Hernández Salmerón, Inmaculada ConcepciónRivero, Carlos R.Ruiz Cortés, DavidCorchuelo Gil, Rafael2017-11-132017-11-132011Hernández Salmerón, I.C., Rivero, C.R., Ruiz Cortés, D. y Corchuelo Gil, R. (2011). A Tool for Link-Based Web Page Classification. En CAEPIA 2011: 14th Conference of the Spanish Association for Artificial Intelligence (443-452), La Laguna, España: Springer.978-3-642-25273-0http://hdl.handle.net/11441/65970Virtual integration systems require a crawler to navigate through web sites automatically, looking for relevant information. This process is online, so whilst the system is looking for the required information, the user is waiting for a response. Therefore, downloading a minimum number of irrelevant pages is mandatory to improve the crawler efficiency. Most crawlers need to download a page to determine its relevance, which results in a high number of irrelevant pages downloaded. In this paper, we propose a classifier that helps crawlers to efficiently navigate through web sites. This classifier is able to determine if a web page is relevant by analysing exclusively its URL, minimising the number of irrelevant pages downloaded, improving crawling efficiency and reducing used bandwidth, making it suitable for virtual integration systems.application/pdfengAttribution-NonCommercial-NoDerivatives 4.0 Internacionalhttp://creativecommons.org/licenses/by-nc-nd/4.0/CrawlingWeb Page ClassificationVirtual IntegrationA Tool for Link-Based Web Page Classificationinfo:eu-repo/semantics/conferenceObjectinfo:eu-repo/semantics/openAccesshttps://doi.org/10.1007/978-3-642-25274-7_45