Show simple item record

Chapter of Book

dc.creatorTallón Ballesteros, Antonio Javieres
dc.creatorRiquelme Santos, José Cristóbales
dc.date.accessioned2016-06-27T08:53:00Z
dc.date.available2016-06-27T08:53:00Z
dc.date.issued2015
dc.identifier.isbn978-3-319-18832-4es
dc.identifier.issn0302-9743es
dc.identifier.urihttp://hdl.handle.net/11441/42752
dc.description.abstractThis paper presents a novel procedure to apply in a sequential way two data preparation techniques from a different nature such as data cleansing and feature selection. For the former we have experienced with a partial removal of outliers via inter-quartile range whereas for the latter we have chosen relevant attributes with two widespread feature subset selectors like CFS (Correlation-based Feature Selection) and CNS (Consistency-based Feature Selection), which are founded on correlation and consistency measures, respectively. Empirical results on seven difficult binary and multi-class data sets, that is, with a test error rate of at least a 10%, according to accuracy, with C4.5 or 1-nearest neighbour classifiers without any kind of prior data pre-processing are outlined. Non-parametric statistical tests assert that the meeting of the aforementioned two data preparation strategies using a correlation measure for feature selection with C4.5 algorithm is significant better, measured with roc measure, than the single application of the data cleansing approach. Last but not least, a weak and not very powerful learner like PART achieved promising results with the new proposal based on a consistency measure and is able to compete with the best configuration of C4.5. To sum up, bearing in mind the new approach, for roc measure PART classifier with a consistency metric behaves slightly better than C4.5 and a correlation measurees
dc.description.sponsorshipMICYT TIN2007-68084-C02- 02
dc.description.sponsorshipMICYT TIN2011-28956-C02-02
dc.description.sponsorshipJunta de Andalucía P11-TIC-7528
dc.formatapplication/pdfes
dc.language.isoenges
dc.publisherSpringeres
dc.relation.ispartofBioinspired Computation in Artificial Systems : International Work-Conference on the Interplay Between Natural and Artificial Computation, IWINAC 2015, Elche, Spain, June 1-5, 2015, Proceedings, Part II. Lectures notes in Computer Science, v.9108es
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectData cleansinges
dc.subjectFeature selectiones
dc.subjectclassificationes
dc.subjectoutlier detectiones
dc.subjectinter-quartile rangees
dc.titleData Cleansing Meets Feature Selection: A Supervised Machine Learning Approaches
dc.typeinfo:eu-repo/semantics/bookPartes
dc.type.versioninfo:eu-repo/semantics/acceptedVersiones
dc.rights.accessrightsinfo:eu-repo/semantics/openAccesses
dc.contributor.affiliationUniversidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticoses
dc.relation.projectIDTIN2007-68084-C02- 02es
dc.relation.projectIDTIN2011-28956-C02-02es
dc.relation.projectIDP11-TIC-7528es
dc.identifier.doihttp://dx.doi.org/10.1007/978-3-319-18833-1_39es
idus.format.extent10es
dc.publication.initialPage369es
dc.publication.endPage378es
dc.relation.publicationplaceSwitzerlandes
dc.identifier.idushttps://idus.us.es/xmlui/handle/11441/42752

FilesSizeFormatViewDescription
Data cleansing.pdf239.3KbIcon   [PDF] View/Open  

This item appears in the following collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Except where otherwise noted, this item's license is described as: Attribution-NonCommercial-NoDerivatives 4.0 Internacional