Repositorio de producción científica de la Universidad de Sevilla

Preliminary Comparison of Techniques for Dealing with Imbalance in Software Defect Prediction

Opened Access Preliminary Comparison of Techniques for Dealing with Imbalance in Software Defect Prediction

Citas

buscar en

Estadísticas
Icon
Exportar a
Autor: Rodríguez, Daniel
Herraiz, Israel
Harrison, Rachel
Dolado, Javier
Riquelme Santos, José Cristóbal
Departamento: Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos
Fecha: 2014
Publicado en: 18th International Conference on Evaluation and Assessment in Software Engineering, EASE'14 (2014), 43-1-43-10
ISBN/ISSN: 978-1-4503-2476-2
Tipo de documento: Ponencia
Resumen: Imbalanced data is a common problem in data mining when dealing with classi cation problems, where samples of a class vastly outnumber other classes. In this situation, many data mining algorithms generate poor models as they try to opti- mize the overall accuracy and perform badly in classes with very few samples. Software Engineering data in general and defect prediction datasets are not an exception and in this paper, we compare different approaches, namely sampling, cost-sensitive, ensemble and hybrid approaches to the prob- lem of defect prediction with different datasets preprocessed differently. We have used the well-known NASA datasets curated by Shepperd et al. There are differences in the re- sults depending on the characteristics of the dataset and the evaluation metrics, especially if duplicates and inconsisten- cies are removed as a preprocessing step.
Cita: Rodríguez, D., Herraiz, I., Harrison, R., Dolado, J. y Riquelme Santos, J.C. (2014). Preliminary Comparison of Techniques for Dealing with Imbalance in Software Defect Prediction. En 18th International Conference on Evaluation and Assessment in Software Engineering, EASE'14 (43-1-43-10), London: ACM.
Tamaño: 166.8Kb
Formato: PDF

URI: http://hdl.handle.net/11441/42731

DOI: http://dx.doi.org/10.1145/2601248.2601294

Mostrar el registro completo del ítem


Esta obra está bajo una Licencia Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 Internacional

Este registro aparece en las siguientes colecciones