dc.creator | Carrizosa Priego, Emilio José | es |
dc.creator | Hvas Mortensen, Laust | es |
dc.creator | Romero Morales, María Dolores | es |
dc.creator | Sillero Denamiel, María Remedios | es |
dc.date.accessioned | 2022-06-30T07:08:46Z | |
dc.date.available | 2022-06-30T07:08:46Z | |
dc.date.issued | 2022-05-04 | |
dc.identifier.citation | Carrizosa Priego, E.J., Hvas Mortensen, L., Romero Morales, M.D. y Sillero Denamiel, M.R. (2022). The tree based linear regression model for hierarchical categorical variables. Expert Systems with Applications, 203, 117423-1-117423-13. | |
dc.identifier.issn | 0957-4174 | es |
dc.identifier.uri | https://hdl.handle.net/11441/134809 | |
dc.description.abstract | Many real-life applications consider nominal categorical predictor variables that have a hierarchical structure,
e.g. economic activity data in Official Statistics. In this paper, we focus on linear regression models built in the
presence of this type of nominal categorical predictor variables, and study the consolidation of their categories
to have a better tradeoff between interpretability and fit of the model to the data. We propose the so-called
Tree based Linear Regression (TLR) model that optimizes both the accuracy of the reduced linear regression
model and its complexity, measured as a cost function of the level of granularity of the representation of
the hierarchical categorical variables. We show that finding non-dominated outcomes for this problem boils
down to solving Mixed Integer Convex Quadratic Problems with Linear Constraints, and small to medium size
instances can be tackled using off-the-shelf solvers. We illustrate our approach in two real-world datasets, as
well as a synthetic one, where our methodology finds a much less complex model with a very mild worsening
of the accuracy. | es |
dc.format | application/pdf | es |
dc.format.extent | 13 p. | es |
dc.language.iso | eng | es |
dc.publisher | Elsevier | es |
dc.relation.ispartof | Expert Systems with Applications, 203, 117423-1-117423-13. | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Hierarchical categorical variables | es |
dc.subject | Linear regression models | es |
dc.subject | Accuracy vs. model complexity | es |
dc.subject | Mixed integer convex quadratic problem with linear constraints | es |
dc.title | The tree based linear regression model for hierarchical categorical variables | es |
dc.type | info:eu-repo/semantics/article | es |
dc.type.version | info:eu-repo/semantics/publishedVersion | es |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es |
dc.contributor.affiliation | Universidad de Sevilla. Departamento de Estadística e Investigación Operativa | es |
dc.relation.publisherversion | doi.org/10.1016/j.eswa.2022.117423 | es |
dc.identifier.doi | 10.1016/j.eswa.2022.117423 | es |
dc.contributor.group | Universidad de Sevilla. FQM329: Optimizacion | es |
dc.journaltitle | Expert Systems with Applications | es |
dc.publication.volumen | 203 | es |
dc.publication.initialPage | 117423-1 | es |
dc.publication.endPage | 117423-13 | es |