dc.creator | Jiménez Aguirre, Patricia | es |
dc.creator | Roldán Salvador, Juan Carlos | es |
dc.creator | Gallego, Fernando O. | es |
dc.creator | Corchuelo Gil, Rafael | es |
dc.date.accessioned | 2022-04-08T09:11:35Z | |
dc.date.available | 2022-04-08T09:11:35Z | |
dc.date.issued | 2020 | |
dc.identifier.citation | Jiménez Aguirre, P., Roldán Salvador, J.C., Gallego, F.O. y Corchuelo Gil, R. (2020). On the synthesis of metadata tags for HTML files. Software: Practice and Experience, 50 (12), 2169-2192. | |
dc.identifier.issn | 0038-0644 | es |
dc.identifier.uri | https://hdl.handle.net/11441/131982 | |
dc.description.abstract | RDFa, JSON-LD, Microdata, and Microformats allow to endow the data in
HTML files with metadata tags that help software agents understand them.
Unluckily, there are many HTML files that do not have any metadata tags,
which has motivated many authors to work on proposals to synthesize them.
But they have some problems: the authors either provide an overall picture of
their designs without too many details on the techniques behind the scenes or
focus on the techniques but do not describe the design of the software systems
that support them; many of them cannot deal with data that are encoded using
semistructured formats like forms, listings, or tables; and the few proposals that
can work on tables can deal with horizontal listings only. In this article, we
describe the design of a system that overcomes the previous limitations using a
novel embedding approach that has proven to outperform four state-of-the-art
techniques on a repository with randomly selected HTML files from 40 differ ent sites. According to our experimental analysis, our proposal can achieve an
F1 score that outperforms the others by 10.14%; this difference was confirmed
to be statistically significant at the standard confidence level. | es |
dc.description.sponsorship | Junta de Andalucía P18-RT-1060 | es |
dc.description.sponsorship | Ministerio de Economía y Competitividad TIN2013-40848-R | es |
dc.description.sponsorship | Ministerio de Economía y Competitividad TIN2016-75394-R | es |
dc.format | application/pdf | es |
dc.format.extent | 24 | es |
dc.language.iso | eng | es |
dc.publisher | Wiley | es |
dc.relation.ispartof | Software: Practice and Experience, 50 (12), 2169-2192. | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Embedding techniques | es |
dc.subject | HTML files | es |
dc.subject | Metadata tags | es |
dc.title | On the synthesis of metadata tags for HTML files | es |
dc.type | info:eu-repo/semantics/article | es |
dcterms.identifier | https://ror.org/03yxnpp24 | |
dc.type.version | info:eu-repo/semantics/submittedVersion | es |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es |
dc.contributor.affiliation | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos | es |
dc.relation.projectID | P18-RT-1060 | es |
dc.relation.projectID | TIN2013-40848-R | es |
dc.relation.projectID | TIN2016-75394-R | es |
dc.relation.publisherversion | https://onlinelibrary.wiley.com/doi/10.1002/spe.2886 | es |
dc.identifier.doi | 10.1002/spe.2886 | es |
dc.contributor.group | Universidad de Sevilla. TIC258: Data-centric Computing Research Hub | es |
dc.journaltitle | Software: Practice and Experience | es |
dc.publication.volumen | 50 | es |
dc.publication.issue | 12 | es |
dc.publication.initialPage | 2169 | es |
dc.publication.endPage | 2192 | es |
dc.contributor.funder | Junta de Andalucía | es |
dc.contributor.funder | Ministerio de Economía y Competitividad (MINECO). España | es |