Artículo
An approach to validity indices for clustering techniques in Big Data
Autor/es | Luna Romera, José María
García Gutiérrez, Jorge Martínez Ballesteros, María del Mar Riquelme Santos, José Cristóbal |
Departamento | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos |
Fecha de publicación | 2018 |
Fecha de depósito | 2022-04-12 |
Publicado en |
|
Resumen | Clustering analysis is one of the most used
Machine Learning techniques to discover groups among data
objects. Some clustering methods require the number of clus ters into which the data is going to be partitioned. There ... Clustering analysis is one of the most used Machine Learning techniques to discover groups among data objects. Some clustering methods require the number of clus ters into which the data is going to be partitioned. There exist several cluster validity indices that help us to approximate the optimal number of clusters of the dataset. However, such indices are not suitable to deal with Big Data due to its size limitation and runtime costs. This paper presents two cluster ing validity indices that handle large amount of data in low computational time. Our indices are based on redefinitions of traditional indices by simplifying the intra-cluster distance calculation. Two types of tests have been carried out over 28 synthetic datasets to analyze the performance of the proposed indices. First, we test the indices with small and medium size datasets to verify that our indices have a similar effectiveness to the traditional ones. Subsequently, tests on datasets of up to 11 million records and 20 features have been executed to check their efficiency. The results show that both indices can handle Big Data in a very low computational time with an effectiveness similar to the traditional indices using Apache Spark framework. |
Agencias financiadoras | Ministerio de Economía y Competitividad (MINECO). España |
Identificador del proyecto | TIN2014-55894-C2-1-R |
Cita | Luna Romera, J.M., García Gutiérrez, J., Martínez Ballesteros, M.d.M. y Riquelme Santos, J.C. (2018). An approach to validity indices for clustering techniques in Big Data. Progress in Artificial Intelligence, 7 (2), 81-94. |
Ficheros | Tamaño | Formato | Ver | Descripción |
---|---|---|---|---|
An approach to validity indices ... | 1.531Mb | [PDF] | Ver/ | |