Presentation
Finding Defective Modules from Highly Unbalanced Datasets
Author/s | Riquelme Santos, José Cristóbal
Ruiz Sánchez, Roberto Rodríguez García, Daniel Moreno, J. |
Department | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos |
Publication Date | 2008-10 |
Deposit Date | 2023-05-04 |
Published in |
|
ISBN/ISSN | 1988–3455 |
Abstract | Many software engineering datasets are highly unbalanced, i.e., the number of instances of a one class outnumber the number of instances of the other class. In this work, we analyse two balancing techniques with two common ... Many software engineering datasets are highly unbalanced, i.e., the number of instances of a one class outnumber the number of instances of the other class. In this work, we analyse two balancing techniques with two common classification algorithms using five open public datasets from the PROMISE repository in order to find defective modules. The results show that although balancing techniques may not improve the percentage of correctly classified instances, they do improve the AUC measure, i.e., they classify better those instances that belong to the minority class from the minority class. |
Funding agencies | Ministerio de Ciencia Y Tecnología (MCYT). España |
Project ID. | TIN2007-68084-C02-00 |
Citation | Riquelme Santos, J.C., Ruiz Sánchez, R., Rodríguez García, D. y Moreno, J. (2008). Finding Defective Modules from Highly Unbalanced Datasets. En Apoyo a la Decisión en Ingeniería del Software (ADIS'08) (67-74), Gijón, España: Sociedad de Ingeniería de Software y Tecnologías de Desarrollo de Software (SISTEDES). |
Files | Size | Format | View | Description |
---|---|---|---|---|
Finding defective modules from ... | 106.4Kb | [PDF] | View/ | |