Por motivos de mantenimiento se ha deshabilitado el inicio de sesión temporalmente. Rogamos disculpen las molestias.
Artículo
Entity reconciliation in big data sources: A systematic mapping study
Autor/es | González Enríquez, José
Domínguez Mayo, Francisco José Escalona Cuaresma, María José Ross, M. Staples, G. |
Departamento | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos |
Fecha de publicación | 2017 |
Fecha de depósito | 2018-01-19 |
Publicado en |
|
Premios | Premio Mensual Publicación Científica Destacada de la US. Escuela Técnica Superior de Ingeniería Informática |
Resumen | The entity reconciliation (ER) problem aroused much interest as a research topic in today’s Big Dataera, full of big and open heterogeneous data sources. This problem poses when relevant information ona topic needs to be ... The entity reconciliation (ER) problem aroused much interest as a research topic in today’s Big Dataera, full of big and open heterogeneous data sources. This problem poses when relevant information ona topic needs to be obtained using methods based on: (i) identifying records that represent the samereal world entity, and (ii) identifying those records that are similar but do not correspond to the samereal-world entity. ER is an operational intelligence process, whereby organizations can unify differentand heterogeneous data sources in order to relate possible matches of non-obvious entities. Besides, thecomplexity that the heterogeneity of data sources involves, the large number of records and differencesamong languages, for instance, must be added. This paper describes a Systematic Mapping Study (SMS) ofjournal articles, conferences and workshops published from 2010 to 2017 to solve the problem describedbefore, first trying to understand the state-of-the-art, and then identifying any gaps in current research.Eleven digital libraries were analyzed following a systematic, semiautomatic and rigorous process thathas resulted in 61 primary studies. They represent a great variety of intelligent proposals that aim tosolve ER. The conclusion obtained is that most of the research is based on the operational phase asopposed to the design phase, and most studies have been tested on real-world data sources, where a lotof them are heterogeneous, but just a few apply to industry. There is a clear trend in research techniquesbased on clustering/blocking and graphs, although the level of automation of the proposals is hardly evermentioned in the research work. |
Identificador del proyecto | TIN2013-46928-C3-3-R
TIN2016-76956-C3-2-R TIN2015-71938-REDT |
Cita | González Enríquez, J., Domínguez Mayo, F.J., Escalona Cuaresma, M.J., Ross, M. y Staples, G. (2017). Entity reconciliation in big data sources: A systematic mapping study. Expert Systems with Applications, 80 (september 2017), 14-27. |
Ficheros | Tamaño | Formato | Ver | Descripción |
---|---|---|---|---|
EnriquezEtAl2017.pdf | 1.420Mb | [PDF] | Ver/ | |