New models and methods for classification and feature selection. a mathematical optimization perspective

Benítez Peña, Sandra

Tesis Doctoral

dc.contributor.advisor	Blanquero Bravo, Rafael	es
dc.contributor.advisor	Carrizosa Priego, Emilio José	es
dc.contributor.advisor	Ramírez Cobo, Josefa	es
dc.creator	Benítez Peña, Sandra	es
dc.date.accessioned	2021-12-02T12:02:29Z
dc.date.available	2021-12-02T12:02:29Z
dc.date.issued	2021-07-27
dc.identifier.citation	Benítez Peña, S. (2021). New models and methods for classification and feature selection. a mathematical optimization perspective. (Tesis Doctoral Inédita). Universidad de Sevilla, Sevilla.
dc.identifier.uri	https://hdl.handle.net/11441/127950
dc.description.abstract	The objective of this PhD dissertation is the development of new models for Supervised Classification and Benchmarking, making use of Mathematical Optimization and Statistical tools. Particularly, we address the fusion of instruments from both disciplines, with the aim of extracting knowledge from data. In such a way, we obtain innovative methodologies that overcome to those existing ones, bridging theoretical Mathematics with real-life problems. The developed works along this thesis have focused on two fundamental methodologies in Data Science: support vector machines (SVM) and Benchmarking. Regarding the first one, the SVM classifier is based on the search for the separating hyperplane of maximum margin and it is written as a quadratic convex problem. In the Benchmarking context, the goal is to calculate the different efficiencies through a non-parametric deterministic approach. In this thesis we will focus on Data Envelopment Analysis (DEA), which consists on a Linear Programming formulation. This dissertation is structured as follows. In Chapter 1 we briefly present the different challenges this thesis faces on, as well as their state-of-the-art. In the same vein, the different formulations used as base models are exposed, together with the notation used along the chapters in this thesis. In Chapter 2, we tackle the problem of the construction of a version of the SVM that considers misclassification errors. To do this, we incorporate new performance constraints in the SVM formulation, imposing upper bounds on the misclassification errors. The resulting formulation is a quadratic convex problem with linear constraints. Chapter 3 continues with the SVM as the basis, and sets out the problem of providing not only a hard-labeling for each of the individuals belonging to the dataset, but a class probability estimation. Furthermore, confidence intervals for both the score values and the posterior class probabilities will be provided. In addition, as in the previous chapter, we will carry the obtained results to the field in which misclassified errors are considered. With such a purpose, we have to solve either a quadratic convex problem or a quadratic convex problem with linear constraints and integer variables, and always taking advantage of the parameter tuning of the SVM, that is usually wasted. Based on the results in Chapter 2, in Chapter 4 we handle the problem of feature selection, taking again into account the misclassification errors. In order to build this technique, the feature selection is embedded in the classifier model. Such a process is divided in two different steps. In the first step, feature selection is performed while at the same time data is separated via an hyperplane or linear classifier, considering the performance constraints. In the second step, we build the maximum margin classifier (SVM) using the selected features from the first step, and again taking into account the same performance constraints. In Chapter 5, we move to the problem of Benchmarking, where the practices of different entities are compared through the products or services they provide. This is done with the aim of make some changes or improvements in each of them. Concretely, in this chapter we propose a Mixed Integer Linear Programming formulation based in Data Envelopment Analysis (DEA), with the aim of perform feature selection, improving the interpretability and comprehension of the obtained model and efficiencies. Finally, in Chapter 6 we collect the conclusions of this thesis as well as future lines of research.	es
dc.format	application/pdf	es
dc.format.extent	138 p.	es
dc.language.iso	eng	es
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.title	New models and methods for classification and feature selection. a mathematical optimization perspective	es
dc.type	info:eu-repo/semantics/doctoralThesis	es
dcterms.identifier	https://ror.org/03yxnpp24
dc.type.version	info:eu-repo/semantics/publishedVersion	es
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es
dc.contributor.affiliation	Universidad de Sevilla. Departamento de Estadística e Investigación Operativa	es
dc.publication.endPage	122	es

Ficheros	Tamaño	Formato	Ver	Descripción
BENÍTEZ PEÑA, Sandra TESIS.pdf	4.581Mb	[PDF]	Ver/Abrir

Este registro aparece en las siguientes colecciones

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como: Attribution-NonCommercial-NoDerivatives 4.0 Internacional