dc.creator | Jiménez Aguirre, Patricia | es |
dc.creator | Corchuelo Gil, Rafael | es |
dc.date.accessioned | 2022-04-11T07:41:48Z | |
dc.date.available | 2022-04-11T07:41:48Z | |
dc.date.issued | 2022 | |
dc.identifier.citation | Jiménez Aguirre, P. y Corchuelo Gil, R. (2022). On validating web information extraction proposals. Expert Systems with Applications, 199 (August 2022, art. nº 116700) | |
dc.identifier.issn | 0957-4174 | es |
dc.identifier.uri | https://hdl.handle.net/11441/131997 | |
dc.description.abstract | Many people who have to make informed decisions in today’s always-on culture use information extractors
to feed their systems with information that comes from human-friendly documents. Unfortunately, many
proposals that validate information extractors have deficiencies that make it difficult to perform homogeneous
comparisons, confirm or refute performance hypotheses, or draw unbiased conclusions. Consequently, it is
very difficult to select the best-performing proposal on a sound basis. The state-of-the-art validation method
overcomes many deficiencies in the previous proposals, but still overlooks the following issues: completeness
of the validation datasets, that is, whether they provide a complete set of annotations or not; structure
of the information, that is, whether they check the structure of the record instances extracted or just the
attribute instances; and, finally, how extractions and annotations are matched. The decisions made regarding
the previous issues have an impact on the effectiveness results. In this article, we have exhaustively analysed
the literature and we have also highlighted the main weaknesses to tackle. We present a guideline and a method
to compute the effectiveness, which complements and enhances the state-of-the-art validation method. | es |
dc.description.sponsorship | Ministerio de Economía y Competitividad TIN2016-75394-R | es |
dc.description.sponsorship | Ministerio de Ciencia e Innovación PID2020-112540RB-C44 | es |
dc.description.sponsorship | Junta de Andalucía P18-RT-1060 | es |
dc.description.sponsorship | Junta de Andalucía US-1381375 | es |
dc.format | application/pdf | es |
dc.format.extent | 9 | es |
dc.language.iso | eng | es |
dc.publisher | Elsevier | es |
dc.relation.ispartof | Expert Systems with Applications, 199 (August 2022, art. nº 116700) | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Web information extraction | es |
dc.subject | Validation method | es |
dc.title | On validating web information extraction proposals | es |
dc.type | info:eu-repo/semantics/article | es |
dcterms.identifier | https://ror.org/03yxnpp24 | |
dc.type.version | info:eu-repo/semantics/publishedVersion | es |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es |
dc.contributor.affiliation | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos | es |
dc.relation.projectID | TIN2016-75394-R | es |
dc.relation.projectID | PID2020-112540RB-C44 | es |
dc.relation.projectID | P18-RT-1060 | es |
dc.relation.projectID | US-1381375 | es |
dc.relation.publisherversion | https://www.sciencedirect.com/science/article/pii/S0957417422001798?via%3Dihub | es |
dc.identifier.doi | 10.1016/j.eswa.2022.116700 | es |
dc.contributor.group | Universidad de Sevilla. TIC258: Data-centric Computing Research Hub | es |
dc.journaltitle | Expert Systems with Applications | es |
dc.publication.volumen | 199 | es |
dc.publication.issue | August 2022, art. nº 116700 | es |
dc.contributor.funder | Ministerio de Economía y Competitividad (MINECO). España | es |
dc.contributor.funder | Ministerio de Ciencia e Innovación (MICIN). España | es |
dc.contributor.funder | Junta de Andalucía | es |