Mostrar el registro sencillo del ítem

Artículo

dc.creatorJiménez Aguirre, Patriciaes
dc.creatorRoldán Salvador, Juan Carloses
dc.creatorGallego, Fernando O.es
dc.creatorCorchuelo Gil, Rafaeles
dc.date.accessioned2022-04-08T09:11:35Z
dc.date.available2022-04-08T09:11:35Z
dc.date.issued2020
dc.identifier.citationJiménez Aguirre, P., Roldán Salvador, J.C., Gallego, F.O. y Corchuelo Gil, R. (2020). On the synthesis of metadata tags for HTML files. Software: Practice and Experience, 50 (12), 2169-2192.
dc.identifier.issn0038-0644es
dc.identifier.urihttps://hdl.handle.net/11441/131982
dc.description.abstractRDFa, JSON-LD, Microdata, and Microformats allow to endow the data in HTML files with metadata tags that help software agents understand them. Unluckily, there are many HTML files that do not have any metadata tags, which has motivated many authors to work on proposals to synthesize them. But they have some problems: the authors either provide an overall picture of their designs without too many details on the techniques behind the scenes or focus on the techniques but do not describe the design of the software systems that support them; many of them cannot deal with data that are encoded using semistructured formats like forms, listings, or tables; and the few proposals that can work on tables can deal with horizontal listings only. In this article, we describe the design of a system that overcomes the previous limitations using a novel embedding approach that has proven to outperform four state-of-the-art techniques on a repository with randomly selected HTML files from 40 differ ent sites. According to our experimental analysis, our proposal can achieve an F1 score that outperforms the others by 10.14%; this difference was confirmed to be statistically significant at the standard confidence level.es
dc.description.sponsorshipJunta de Andalucía P18-RT-1060es
dc.description.sponsorshipMinisterio de Economía y Competitividad TIN2013-40848-Res
dc.description.sponsorshipMinisterio de Economía y Competitividad TIN2016-75394-Res
dc.formatapplication/pdfes
dc.format.extent24es
dc.language.isoenges
dc.publisherWileyes
dc.relation.ispartofSoftware: Practice and Experience, 50 (12), 2169-2192.
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectEmbedding techniqueses
dc.subjectHTML fileses
dc.subjectMetadata tagses
dc.titleOn the synthesis of metadata tags for HTML fileses
dc.typeinfo:eu-repo/semantics/articlees
dcterms.identifierhttps://ror.org/03yxnpp24
dc.type.versioninfo:eu-repo/semantics/submittedVersiones
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses
dc.contributor.affiliationUniversidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticoses
dc.relation.projectIDP18-RT-1060es
dc.relation.projectIDTIN2013-40848-Res
dc.relation.projectIDTIN2016-75394-Res
dc.relation.publisherversionhttps://onlinelibrary.wiley.com/doi/10.1002/spe.2886es
dc.identifier.doi10.1002/spe.2886es
dc.contributor.groupUniversidad de Sevilla. TIC258: Data-centric Computing Research Hubes
dc.journaltitleSoftware: Practice and Experiencees
dc.publication.volumen50es
dc.publication.issue12es
dc.publication.initialPage2169es
dc.publication.endPage2192es
dc.contributor.funderJunta de Andalucíaes
dc.contributor.funderMinisterio de Economía y Competitividad (MINECO). Españaes

FicherosTamañoFormatoVerDescripción
On the synthesis of metadata ...3.013MbIcon   [PDF] Ver/Abrir  

Este registro aparece en las siguientes colecciones

Mostrar el registro sencillo del ítem

Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Excepto si se señala otra cosa, la licencia del ítem se describe como: Attribution-NonCommercial-NoDerivatives 4.0 Internacional