dc.creator | Hernández Salmerón, Inmaculada Concepción | es |
dc.creator | Rivero, Carlos R. | es |
dc.creator | Ruiz Cortés, David | es |
dc.date.accessioned | 2021-02-16T12:21:52Z | |
dc.date.available | 2021-02-16T12:21:52Z | |
dc.date.issued | 2019 | |
dc.identifier.citation | Hernández Salmerón, I.C., Rivero, C.R. y Ruiz Cortés, D. (2019). Deep Web crawling: a survey. World Wide Web, 22, 1577-1610. | |
dc.identifier.issn | 1386-145X | es |
dc.identifier.uri | https://hdl.handle.net/11441/105031 | |
dc.description.abstract | Deep Web crawling refers to the problem of traversing the collection of pages
in a deep Web site, which are dynamically generated in response to a particular query that
is submitted using a search form. To achieve this, crawlers need to be endowed with some
features that go beyond merely following links, such as the ability to automatically discover
search forms that are entry points to the deep Web, fill in such forms, and follow certain
paths to reach the deep Web pages with relevant information. Current surveys that analyse
the state of the art in deep Web crawling do not provide a framework that allows comparing
the most up-to-date proposals regarding all the different aspects involved in the deep Web
crawling process. In this article, we propose a framework that analyses the main features
of existing deep Web crawling-related techniques, including the most recent proposals, and
provides an overall picture regarding deep Web crawling, including novel features that to the
present day had not been analysed by previous surveys. Our main conclusion is that crawler
evaluation is an immature research area due to the lack of a standard set of performance
measures, or a benchmark or publicly available dataset to evaluate the crawlers. In addition,
we conclude that the future work in this area should be focused on devising crawlers to deal
with ever-evolving Web technologies and improving the crawling efficiency and scalability,
in order to create effective crawlers that can operate in real-world contexts. | es |
dc.description.sponsorship | Ministerio de Economía y Competitividad TIN2016-75394-R | es |
dc.description.sponsorship | Ministerio de Economía y Competitividad TIN2013-40848-R | es |
dc.format | application/pdf | es |
dc.format.extent | 34 | es |
dc.language.iso | eng | es |
dc.publisher | Springer | es |
dc.relation.ispartof | World Wide Web, 22, 1577-1610. | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Deep Web | es |
dc.subject | Web Crawling | es |
dc.subject | Form filling | es |
dc.subject | Query selection | es |
dc.subject | Survey | es |
dc.title | Deep Web crawling: a survey | es |
dc.type | info:eu-repo/semantics/article | es |
dcterms.identifier | https://ror.org/03yxnpp24 | |
dc.type.version | info:eu-repo/semantics/submittedVersion | es |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es |
dc.contributor.affiliation | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos | es |
dc.relation.projectID | TIN2016-75394-R | es |
dc.relation.projectID | TIN2013-40848-R | es |
dc.relation.publisherversion | https://link.springer.com/article/10.1007/s11280-018-0602-1 | es |
dc.identifier.doi | 10.1007/s11280-018-0602-1 | es |
dc.journaltitle | World Wide Web | es |
dc.publication.issue | 22 | es |
dc.publication.initialPage | 1577 | es |
dc.publication.endPage | 1610 | es |
dc.identifier.sisius | 21582564 | es |
dc.contributor.funder | Ministerio de Economía y Competitividad (MINECO). España | es |
dc.contributor.funder | Ministerio de Economía y Competitividad (MINECO). España | es |