Mostrar el registro sencillo del ítem

Ponencia

dc.creatorRoldán Salvador, Juan Carloses
dc.creatorJiménez Aguirre, Patriciaes
dc.creatorCorchuelo Gil, Rafaeles
dc.date.accessioned2022-04-07T11:04:44Z
dc.date.available2022-04-07T11:04:44Z
dc.date.issued2017
dc.identifier.citationRoldán Salvador, J.C., Jiménez Aguirre, P. y Corchuelo Gil, R. (2017). Extracting Web Information using Representation Patterns. En HotWeb 2017 : 5th ACM/IEEE Workshop on Hot Topics in Web Systems and Technologies (4:1-4:5), San Jose, CA, USA: Association for Computing Machinery (ACM).
dc.identifier.isbn978-1-4503-5527-8es
dc.identifier.urihttps://hdl.handle.net/11441/131931
dc.description.abstractFeeding decision support systems with Web information typically requires sifting through an unwieldy amount of information that is available in human-friendly formats only. Our focus is on a scalable proposal to extract information from semi-structured documents in a structured format, with an emphasis on it being scalable and open. By semi-structured we mean that it must focus on informa tion that is rendered using regular formats, not free text; by scal able, we mean that the system must require a minimum amount of human intervention and it must not be targeted to extracting in formation from a particular domain or web site; by open, we mean that it must extract as much useful information as possible and not be subject to any pre-defined data model. In the literature, there is only one open but not scalable proposal, since it requires human supervision on a per-domain basis. In this paper, we present a new proposal that relies on a number of heuristics to identify patterns that are typically used to represent the information in a web docu ment. Our experimental results confirm that our proposal is very competitive in terms of effectiveness and efficiency.es
dc.description.sponsorshipMinisterio de Economía y Competitividad TIN2016-75394-Res
dc.description.sponsorshipMinisterio de Economía y Competitividad TIN2013-40848-Res
dc.formatapplication/pdfes
dc.format.extent5es
dc.language.isoenges
dc.publisherAssociation for Computing Machinery (ACM)es
dc.relation.ispartofHotWeb 2017 : 5th ACM/IEEE Workshop on Hot Topics in Web Systems and Technologies (2017), pp. 4:1-4:5.
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectWeb information extractiones
dc.subjectopen information extractiones
dc.subjectWeb representation patternses
dc.subjectSemi-structured documentses
dc.subjectScalabilityes
dc.titleExtracting Web Information using Representation Patternses
dc.typeinfo:eu-repo/semantics/conferenceObjectes
dcterms.identifierhttps://ror.org/03yxnpp24
dc.type.versioninfo:eu-repo/semantics/submittedVersiones
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses
dc.contributor.affiliationUniversidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticoses
dc.relation.projectIDTIN2016-75394-Res
dc.relation.projectIDTIN2013-40848-Res
dc.relation.publisherversionhttps://dl.acm.org/doi/10.1145/3132465.3133840es
dc.identifier.doi10.1145/3132465.3133840es
dc.contributor.groupUniversidad de Sevilla. TIC258: Data-centric Computing Research Hubes
dc.publication.initialPage4:1es
dc.publication.endPage4:5es
dc.eventtitleHotWeb 2017 : 5th ACM/IEEE Workshop on Hot Topics in Web Systems and Technologieses
dc.eventinstitutionSan Jose, CA, USAes
dc.relation.publicationplaceNew York, USAes
dc.contributor.funderMinisterio de Economía y Competitividad (MINECO). Españaes

FicherosTamañoFormatoVerDescripción
Extracting web information using ...294.5KbIcon   [PDF] Ver/Abrir  

Este registro aparece en las siguientes colecciones

Mostrar el registro sencillo del ítem

Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Excepto si se señala otra cosa, la licencia del ítem se describe como: Attribution-NonCommercial-NoDerivatives 4.0 Internacional