Presentation
An Architecture for Efficient Web Crawling
Author/s | Hernández Salmerón, Inmaculada Concepción
Rivero, Carlos R. Ruiz Cortés, David Corchuelo Gil, Rafael |
Department | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos |
Publication Date | 2012 |
Deposit Date | 2017-11-17 |
Published in |
|
ISBN/ISSN | 978-3-642-31068-3 1865-1348 |
Abstract | Virtual Integration systems require a crawling tool able to
navigate and reach relevant pages in the Deep Web in an efficient way.
Existing proposals in the crawling area fulfill some of these requirements,
but most of ... Virtual Integration systems require a crawling tool able to navigate and reach relevant pages in the Deep Web in an efficient way. Existing proposals in the crawling area fulfill some of these requirements, but most of them need to download pages in order to classify them as relevant or not. We propose a crawler supported by a web page classifier that uses solely a page URL to determine page relevance. Such a crawler is able to choose in each step only the URLs that lead to relevant pages, and therefore reduces the number of unnecessary pages downloaded, minimising bandwidth and making it efficient and suitable for virtual integration systems. |
Project ID. | TIN2007-64119
P07-TIC-2602 P08- TIC-4100 TIN2008-04718-E TIN2010-21744 TIN2010-09809-E TIN2010-10811-E TIN2010-09988-E |
Citation | Hernández Salmerón, I.C., Rivero, C.R., Ruiz Cortés, D. y Corchuelo Gil, R. (2012). An Architecture for Efficient Web Crawling. En CAiSE 2012: International Conference on Advanced Information Systems Engineering (228-234), Gdańsk, Poland: Springer. |
Files | Size | Format | View | Description |
---|---|---|---|---|
An Architecture for Efficien.pdf | 166.5Kb | [PDF] | View/ | |