dc.creator | Reina Quintero, Antonia María | es |
dc.creator | Jiménez Aguirre, Patricia | es |
dc.creator | Corchuelo Gil, Rafael | es |
dc.date.accessioned | 2022-04-11T07:59:08Z | |
dc.date.available | 2022-04-11T07:59:08Z | |
dc.date.issued | 2015 | |
dc.identifier.citation | Reina Quintero, A.M., Jiménez Aguirre, P. y Corchuelo Gil, R. (2015). A Novel Approach to Web Information Extraction. En BIS 2015 : 18th International Conference on Business Information Systems (152-161), Poznań, Poland: Springer. | |
dc.identifier.isbn | 978-3-319-19026-6 | es |
dc.identifier.issn | 1865-1348 | es |
dc.identifier.uri | https://hdl.handle.net/11441/131998 | |
dc.description.abstract | Business Intelligence requires the acquisition and aggrega tion of key pieces of knowledge from multiple sources in order to provide
valuable information to customers. The Web is the largest source of infor mation nowadays. Unfortunately, the information it provides is available
in semi-structured human-friendly formats, which makes it difficult to
be processed by automated business processes. Classical propositional
and ILP machine-learning techniques have been applied for this pur pose. However, the former have not enough expressive power, whereas
the latter are more expressive but intractable with large datasets. Propo sitionalisation was devised as a means to provide propositional techniques
with more expressive power, enabling them to exploit structural infor mation in a propositional way that allows them to be efficient. In this
paper, we present a proposal to extract information from semi-structured
web documents that uses this approach. It leverages a classical propo sitional machine learning technique and enhances it with the ability to
learn from an unbounded context, which helps increase its precision and
recall. Our experiments prove that our proposal outperforms other state of-art techniques in the literature. | es |
dc.description.sponsorship | Ministerio de Educación y Ciencia TIN2007-64119 | es |
dc.description.sponsorship | Junta de Andalucía P07-TIC-2602 | es |
dc.description.sponsorship | Junta de Andalucía P08-TIC-4100 | es |
dc.description.sponsorship | Ministerio de Ciencia e Innovación TIN2008-04718-E | es |
dc.description.sponsorship | Ministerio de Ciencia e Innovación TIN2010-21744 | es |
dc.description.sponsorship | Ministerio de Economía, Industria y Competitividad TIN2010-09809-E | es |
dc.description.sponsorship | Ministerio de Ciencia e Innovación TIN2010-10811-E | es |
dc.description.sponsorship | Ministerio de Ciencia e Innovación TIN2010-09988-E | es |
dc.description.sponsorship | Ministerio de Economía y Competitividad TIN2011-15497-E | es |
dc.description.sponsorship | Ministerio de Economía y Competitividad TIN2013-40848-R | es |
dc.format | application/pdf | es |
dc.format.extent | 10 | es |
dc.language.iso | eng | es |
dc.publisher | Springer | es |
dc.relation.ispartof | BIS 2015 : 18th International Conference on Business Information Systems (2015), pp. 152-161. | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.title | A Novel Approach to Web Information Extraction | es |
dc.type | info:eu-repo/semantics/conferenceObject | es |
dcterms.identifier | https://ror.org/03yxnpp24 | |
dc.type.version | info:eu-repo/semantics/submittedVersion | es |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es |
dc.contributor.affiliation | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos | es |
dc.relation.projectID | TIN2007-64119 | es |
dc.relation.projectID | P07-TIC-2602 | es |
dc.relation.projectID | P08-TIC-4100 | es |
dc.relation.projectID | TIN2008-04718-E | es |
dc.relation.projectID | TIN2010-21744 | es |
dc.relation.projectID | TIN2010-09809-E | es |
dc.relation.projectID | TIN2010-10811-E | es |
dc.relation.projectID | TIN2010-09988-E | es |
dc.relation.projectID | TIN2011-15497-E | es |
dc.relation.projectID | TIN2013-40848-R | es |
dc.relation.publisherversion | https://link.springer.com/chapter/10.1007/978-3-319-19027-3_13 | es |
dc.identifier.doi | 10.1007/978-3-319-19027-3_13 | es |
dc.contributor.group | Universidad de Sevilla. TIC258: Data-centric Computing Research Hub | es |
dc.publication.initialPage | 152 | es |
dc.publication.endPage | 161 | es |
dc.eventtitle | BIS 2015 : 18th International Conference on Business Information Systems | es |
dc.eventinstitution | Poznań, Poland | es |
dc.relation.publicationplace | Cham, Switzerland | es |
dc.contributor.funder | Ministerio de Educación y Ciencia (MEC). España | es |
dc.contributor.funder | Junta de Andalucía | es |
dc.contributor.funder | Ministerio de Ciencia e Innovación (MICIN). España | es |
dc.contributor.funder | Ministerio de Economia, Industria y Competitividad (MINECO). España | es |
dc.contributor.funder | Ministerio de Economía y Competitividad (MINECO). España | es |