dc.creator | Tallón Ballesteros, Antonio Javier | es |
dc.creator | Riquelme Santos, José Cristóbal | es |
dc.date.accessioned | 2016-06-27T08:53:00Z | |
dc.date.available | 2016-06-27T08:53:00Z | |
dc.date.issued | 2015 | |
dc.identifier.isbn | 978-3-319-18832-4 | es |
dc.identifier.issn | 0302-9743 | es |
dc.identifier.uri | http://hdl.handle.net/11441/42752 | |
dc.description.abstract | This paper presents a novel procedure to apply in a sequential
way two data preparation techniques from a different nature such as
data cleansing and feature selection. For the former we have experienced
with a partial removal of outliers via inter-quartile range whereas for
the latter we have chosen relevant attributes with two widespread feature
subset selectors like CFS (Correlation-based Feature Selection) and
CNS (Consistency-based Feature Selection), which are founded on correlation
and consistency measures, respectively. Empirical results on seven
difficult binary and multi-class data sets, that is, with a test error rate of
at least a 10%, according to accuracy, with C4.5 or 1-nearest neighbour
classifiers without any kind of prior data pre-processing are outlined.
Non-parametric statistical tests assert that the meeting of the aforementioned
two data preparation strategies using a correlation measure for
feature selection with C4.5 algorithm is significant better, measured with
roc measure, than the single application of the data cleansing approach.
Last but not least, a weak and not very powerful learner like PART
achieved promising results with the new proposal based on a consistency
measure and is able to compete with the best configuration of C4.5. To
sum up, bearing in mind the new approach, for roc measure PART classifier
with a consistency metric behaves slightly better than C4.5 and a
correlation measure | es |
dc.description.sponsorship | MICYT TIN2007-68084-C02- 02 | |
dc.description.sponsorship | MICYT TIN2011-28956-C02-02 | |
dc.description.sponsorship | Junta de Andalucía P11-TIC-7528 | |
dc.format | application/pdf | es |
dc.language.iso | eng | es |
dc.publisher | Springer | es |
dc.relation.ispartof | Bioinspired Computation in Artificial Systems : International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2015, Elche, Spain, June 1-5, 2015, Proceedings, Part II. Lectures notes in Computer Science, v.9108 | es |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Data cleansing | es |
dc.subject | Feature selection | es |
dc.subject | classification | es |
dc.subject | outlier detection | es |
dc.subject | inter-quartile range | es |
dc.title | Data Cleansing Meets Feature Selection: A Supervised Machine Learning Approach | es |
dc.type | info:eu-repo/semantics/bookPart | es |
dcterms.identifier | https://ror.org/03yxnpp24 | |
dc.type.version | info:eu-repo/semantics/acceptedVersion | es |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es |
dc.contributor.affiliation | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos | es |
dc.relation.projectID | TIN2007-68084-C02- 02 | es |
dc.relation.projectID | TIN2011-28956-C02-02 | es |
dc.relation.projectID | P11-TIC-7528 | es |
dc.identifier.doi | http://dx.doi.org/10.1007/978-3-319-18833-1_39 | es |
idus.format.extent | 10 | es |
dc.publication.initialPage | 369 | es |
dc.publication.endPage | 378 | es |
dc.relation.publicationplace | Switzerland | es |
dc.identifier.idus | https://idus.us.es/xmlui/handle/11441/42752 | |