Ponencia
An Approach to Silhouette and Dunn Clustering Indices Applied to Big Data in Spark
Autor/es | Luna Romera, José María
Martínez Ballesteros, María del Mar García Gutiérrez, Jorge Riquelme Santos, José Cristóbal |
Departamento | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos |
Fecha de publicación | 2016 |
Fecha de depósito | 2022-04-12 |
Publicado en |
|
ISBN/ISSN | 978-3-319-44635-6 0302-9743 |
Resumen | K-Means and Bisecting K-Means clustering algorithms need the optimal number into which the dataset may be divided. Spark implementations of these algorithms include a method that is used to calculate this number. Unfortunately, ... K-Means and Bisecting K-Means clustering algorithms need the optimal number into which the dataset may be divided. Spark implementations of these algorithms include a method that is used to calculate this number. Unfortunately, this measurement presents a lack of precision because it only takes into account a sum of intra-cluster distances misleading the results. Moreover, this measurement has not been well-contrasted in previous researches about clustering indices. Therefore, we introduce a new Spark implementation of Silhouette and Dunn indices. These clustering indices have been tested in previous works. The results obtained show the potential of Silhouette and Dunn to deal with Big Data. |
Agencias financiadoras | Ministerio de Economía y Competitividad (MINECO). España |
Identificador del proyecto | TIN2014-55894-C2-1-R |
Cita | Luna Romera, J.M., Martínez Ballesteros, M.d.M., García Gutiérrez, J. y Riquelme Santos, J.C. (2016). An Approach to Silhouette and Dunn Clustering Indices Applied to Big Data in Spark. En CAEPIA 2016: 17th Conference of the Spanish Association for Artificial Intelligence (160-169), Salamanca, España: Springer. |
Ficheros | Tamaño | Formato | Ver | Descripción |
---|---|---|---|---|
An approach to silhouette and ... | 798.7Kb | [PDF] | Ver/ | |