Mostrar el registro sencillo del ítem

Artículo

dc.creatorJiménez Aguirre, Patriciaes
dc.creatorRoldán Salvador, Juan Carloses
dc.creatorCorchuelo Gil, Rafaeles
dc.date.accessioned2022-04-11T07:31:38Z
dc.date.available2022-04-11T07:31:38Z
dc.date.issued2022
dc.identifier.citationJiménez Aguirre, P., Roldán Salvador, J.C. y Corchuelo Gil, R. (2022). On exploring data lakes by finding compact, isolated clusters. Information Sciences, 591 (April 2022), 103-127.
dc.identifier.issn0020-0255es
dc.identifier.urihttps://hdl.handle.net/11441/131996
dc.description.abstractData engineers are very interested in data lake technologies due to the incredible abun dance of datasets. They typically use clustering to understand the structure of the datasets before applying other methods to infer knowledge from them. This article presents the first proposal that explores how to use a meta-heuristic to address the problem of multi-way single-subspace automatic clustering, which is very appropriate in the context of data lakes. It was confronted with five strong competitors that combine the state-of-the-art attribute selection proposal with three classical single-way clustering proposals, a recent quantum-inspired one, and a recent deep-learning one. The evaluation focused on explor ing their ability to find compact and isolated clusterings as well as the extent to which such clusterings can be considered good classifications. The statistical analyses conducted on the experimental results prove that it ranks the first regarding effectiveness using six stan dard coefficients and it is very efficient in terms of CPU time, not to mention that it did not result in any degraded clusterings or timeouts. Summing up: this proposal contributes to the array of techniques that data engineers can use to explore their data lakeses
dc.description.sponsorshipMinisterio de Economía y Competitividad TIN2016-75394-Res
dc.description.sponsorshipMinisterio de Ciencia e Innovación PID2020-112540RB-C44es
dc.description.sponsorshipJunta de Andalucía P18-RT-1060es
dc.description.sponsorshipJunta de Andalucía US-1381375es
dc.formatapplication/pdfes
dc.format.extent25es
dc.language.isoenges
dc.publisherElsevieres
dc.relation.ispartofInformation Sciences, 591 (April 2022), 103-127.
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectData lakeses
dc.subjectClusteringes
dc.subjectMeta-heuristicses
dc.subjectGenetic algorithmses
dc.titleOn exploring data lakes by finding compact, isolated clusterses
dc.typeinfo:eu-repo/semantics/articlees
dcterms.identifierhttps://ror.org/03yxnpp24
dc.type.versioninfo:eu-repo/semantics/publishedVersiones
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses
dc.contributor.affiliationUniversidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticoses
dc.relation.projectIDTIN2016-75394-Res
dc.relation.projectIDPID2020-112540RB-C44es
dc.relation.projectIDP18-RT-1060es
dc.relation.projectIDUS-1381375es
dc.relation.publisherversionhttps://www.sciencedirect.com/science/article/pii/S0020025521012664?via%3Dihubes
dc.identifier.doi10.1016/j.ins.2021.12.045es
dc.contributor.groupUniversidad de Sevilla. TIC258: Data-centric Computing Research Hubes
dc.journaltitleInformation Scienceses
dc.publication.volumen591es
dc.publication.issueApril 2022es
dc.publication.initialPage103es
dc.publication.endPage127es
dc.contributor.funderMinisterio de Economía y Competitividad (MINECO). Españaes
dc.contributor.funderMinisterio de Ciencia e Innovación (MICIN). Españaes
dc.contributor.funderJunta de Andalucíaes

FicherosTamañoFormatoVerDescripción
1-s2.0-S0020025521012664-main.pdf7.329MbIcon   [PDF] Ver/Abrir  

Este registro aparece en las siguientes colecciones

Mostrar el registro sencillo del ítem

Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Excepto si se señala otra cosa, la licencia del ítem se describe como: Attribution-NonCommercial-NoDerivatives 4.0 Internacional