Mostrar el registro sencillo del ítem

Artículo

dc.creatorJiménez Aguirre, Patriciaes
dc.creatorCorchuelo Gil, Rafaeles
dc.date.accessioned2022-04-08T08:05:32Z
dc.date.available2022-04-08T08:05:32Z
dc.date.issued2016
dc.identifier.citationJiménez Aguirre, P. y Corchuelo Gil, R. (2016). On Learning Web Information Extraction Rules with TANGO. Information Systems, 62 (December 2016), 74-103.
dc.identifier.issn0306-4379es
dc.identifier.urihttps://hdl.handle.net/11441/131977
dc.description.abstractThe research on Enterprise Systems Integration focuses on proposals to support business processes by re-using existing systems. Wrappers help re-use web ap plications that provide a user interface only. They emulate a human user who interacts with them and extracts the information of interest in a structured for mat. In this article, we present TANGO, which is our proposal to learn rules to extract information from semi-structured web documents with high precision and recall, which is a must in the context of Enterprise Systems Integration. It relies on an open catalogue of features that helps map the input documents into a knowledge base in which every DOM node is represented by means of HTML, DOM, CSS, relational, and user-defined features. Then a procedure with many variation points is used to learn extraction rules from that knowledge base; the variation points include heuristics that range from how to select a condition to how to simplify the resulting rules. We also provide a systematic method to help re-configure our proposal. Our exhaustive experimentation proves that it beats others regarding effectiveness and is efficient enough for practical purposes. Our proposal was devised to be as configurable as possible, which helps adapt it to particular web sites and evolve it when necessary.es
dc.description.sponsorshipMinisterio de Educación y Ciencia TIN2007-64119es
dc.description.sponsorshipJunta de Andalucía P07-TIC-2602es
dc.description.sponsorshipJunta de Andalucía P08-TIC-4100es
dc.description.sponsorshipMinisterio de Ciencia e Innovación TIN2008-04718-Ees
dc.description.sponsorshipMinisterio de Ciencia e Innovación TIN2010-21744es
dc.description.sponsorshipMinisterio de Economía, Industria y Competitividad TIN2010-09809-Ees
dc.description.sponsorshipMinisterio de Ciencia e Innovación TIN2010-10811-Ees
dc.description.sponsorshipMinisterio de Ciencia e Innovación TIN2010-09988-Ees
dc.description.sponsorshipMinisterio de Economía y Competitividad TIN2011-15497-Ees
dc.description.sponsorshipMinisterio de Economía y Competitividad TIN2013-40848-Res
dc.formatapplication/pdfes
dc.format.extent50es
dc.language.isoenges
dc.publisherElsevieres
dc.relation.ispartofInformation Systems, 62 (December 2016), 74-103.
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectWeb information extractiones
dc.subjectSemi-structured documentses
dc.subjectOpen catalogues of featureses
dc.subjectLearning ruleses
dc.subjectVariation pointses
dc.subjectConfiguration methodes
dc.titleOn Learning Web Information Extraction Rules with TANGOes
dc.typeinfo:eu-repo/semantics/articlees
dcterms.identifierhttps://ror.org/03yxnpp24
dc.type.versioninfo:eu-repo/semantics/submittedVersiones
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses
dc.contributor.affiliationUniversidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticoses
dc.relation.projectIDTIN2007-64119es
dc.relation.projectIDP07-TIC-2602es
dc.relation.projectIDP08-TIC-4100es
dc.relation.projectIDTIN2008-04718-Ees
dc.relation.projectIDTIN2010-21744es
dc.relation.projectIDTIN2010-09809-Ees
dc.relation.projectIDTIN2010-10811-Ees
dc.relation.projectIDTIN2010-09988-Ees
dc.relation.projectIDTIN2011-15497-Ees
dc.relation.projectIDTIN2013-40848-Res
dc.relation.publisherversionhttps://www.sciencedirect.com/science/article/pii/S0306437915300405?via%3Dihubes
dc.identifier.doi10.1016/j.is.2016.05.003es
dc.contributor.groupUniversidad de Sevilla. TIC258: Data-centric Computing Research Hubes
dc.journaltitleInformation Systemses
dc.publication.volumen62es
dc.publication.issueDecember 2016es
dc.publication.initialPage74es
dc.publication.endPage103es
dc.identifier.sisius20928471es
dc.contributor.funderMinisterio de Educación y Ciencia (MEC). Españaes
dc.contributor.funderJunta de Andalucíaes
dc.contributor.funderMinisterio de Ciencia e Innovación (MICIN). Españaes
dc.contributor.funderMinisterio de Economia, Industria y Competitividad (MINECO). Españaes
dc.contributor.funderMinisterio de Economía y Competitividad (MINECO). Españaes

FicherosTamañoFormatoVerDescripción
On_learning_web_information_ex ...546.4KbIcon   [PDF] Ver/Abrir  

Este registro aparece en las siguientes colecciones

Mostrar el registro sencillo del ítem

Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Excepto si se señala otra cosa, la licencia del ítem se describe como: Attribution-NonCommercial-NoDerivatives 4.0 Internacional