Mostrar el registro sencillo del ítem

Artículo

dc.creatorSleiman, Hassan A.es
dc.creatorCorchuelo Gil, Rafaeles
dc.date.accessioned2023-03-28T10:43:19Z
dc.date.available2023-03-28T10:43:19Z
dc.date.issued2013-02
dc.identifier.citationSleiman, H.A. y Corchuelo Gil, R. (2013). TEX: An efficient and effective unsupervised Web information extractor. Knowledge-Based Systems, 39, 109-123. https://doi.org/10.1016/j.knosys.2012.10.009.
dc.identifier.issn0950-7051 (impreso)es
dc.identifier.issn1872-7409 (online)es
dc.identifier.urihttps://hdl.handle.net/11441/143639
dc.description.abstractThe World Wide Web is an immense information resource. Web information extraction is the task that transforms human friendly Web information into structured information that can be consumed by auto mated business processes. In this article, we propose an unsupervised information extractor that works on two or more web documents generated by the same server side template. It finds and removes shared token sequences amongst these web documents until finding the relevant information that should be extracted from them. The technique is completely unsupervised and does not require maintenance, it allows working on malformed web documents, and does not require the relevant information to be for matted using repetitive patterns. Our complexity analysis reveals that our proposal is computationally tractable and our empirical study on real-world web documents demonstrates that it performs very fast and has a very high precision and recall.es
dc.description.sponsorshipMinisterio de Ciencia y Tecnología TIN2007-64119es
dc.description.sponsorshipJunta de Andalucía P07-TIC-2602es
dc.description.sponsorshipJunta de Andalucía P08-TIC-4100es
dc.description.sponsorshipMinisterio de Ciencia e Innovación TIN2008-04718-Ees
dc.description.sponsorshipMinisterio de Ciencia e Innovación TIN2010-21744es
dc.description.sponsorshipMinisterio de Economía, Industria y Competitividad TIN2010-09809-Ees
dc.description.sponsorshipMinisterio de Ciencia e Innovación TIN2010-10811-Ees
dc.description.sponsorshipMinisterio de Ciencia e Innovación TIN2010-09988-Ees
dc.description.sponsorshipMinisterio de Economía y Competitividad TIN2011-15497-Ees
dc.formatapplication/pdfes
dc.format.extent15es
dc.language.isoenges
dc.publisherScienceDirectes
dc.relation.ispartofKnowledge-Based Systems, 39, 109-123.
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectInformation extractiones
dc.subjectSemi-structured web documentses
dc.subjectMalformed documentses
dc.subjectUnsupervised techniquees
dc.subjectHeuristic-based techniquees
dc.titleTEX: An efficient and effective unsupervised Web information extractores
dc.typeinfo:eu-repo/semantics/articlees
dcterms.identifierhttps://ror.org/03yxnpp24
dc.type.versioninfo:eu-repo/semantics/publishedVersiones
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses
dc.contributor.affiliationUniversidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticoses
dc.relation.projectIDTIN2007-64119es
dc.relation.projectIDP07-TIC-2602es
dc.relation.projectIDP08-TIC-4100es
dc.relation.projectIDTIN2008-04718-Ees
dc.relation.projectIDTIN2010-21744es
dc.relation.projectIDTIN2010-09809-Ees
dc.relation.projectIDTIN2010-10811-Ees
dc.relation.projectIDTIN2010-09988-Ees
dc.relation.projectIDTIN2011-15497-Ees
dc.relation.publisherversionhttps://www.sciencedirect.com/science/article/pii/S0950705112002900es
dc.identifier.doi10.1016/j.knosys.2012.10.009es
dc.journaltitleKnowledge-Based Systemses
dc.publication.volumen39es
dc.publication.initialPage109es
dc.publication.endPage123es
dc.contributor.funderMinisterio de Ciencia y Tecnología (MCYT). Españaes
dc.contributor.funderJunta de Andalucíaes
dc.contributor.funderMinisterio de Ciencia e Innovación (MICIN). Españaes
dc.contributor.funderMinisterio de Economía, Industria y Competitividad. Españaes
dc.contributor.funderMinisterio de Economía y Competitividad (MINECO). Españaes

FicherosTamañoFormatoVerDescripción
Tex An efficient and effective ...1.224MbIcon   [PDF] Ver/Abrir  

Este registro aparece en las siguientes colecciones

Mostrar el registro sencillo del ítem

Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Excepto si se señala otra cosa, la licencia del ítem se describe como: Attribution-NonCommercial-NoDerivatives 4.0 Internacional