Book
Application of the Representative Measure Approach to Assess the Reliability of Decision Trees in Dealing with Unseen Vehicle Collision Data
Author/s | Perera Lago, Javier
Toscano Durán, Víctor Paluzo Hidalgo, Eduardo Narteni, Sara Rucco, Matteo |
Department | Universidad de Sevilla. Departamento de Matemática Aplicada I (ETSII) |
Publication Date | 2024 |
Deposit Date | 2024-07-12 |
Published in |
|
ISBN/ISSN | 978-3-031-63802-2 |
Abstract | Machine learning algorithms are fundamental components of
novel data-informed Artificial Intelligence architecture. In this domain,
the imperative role of representative datasets is a cornerstone in shaping
the trajectory ... Machine learning algorithms are fundamental components of novel data-informed Artificial Intelligence architecture. In this domain, the imperative role of representative datasets is a cornerstone in shaping the trajectory of artificial intelligence (AI) development. Representative datasets are needed to train machine learning components properly. Proper training has multiple impacts: it reduces the final model’s complexity, power, and uncertainties. In this paper, we investigate the reliability of the ε-representativeness method to assess the dataset similarity from a theoretical perspective for decision trees. We decided to focus on the family of decision trees because it includes a wide variety of models known to be explainable. Thus, in this paper, we provide a result guaranteeing that if two datasets are related by ε-representativeness, i.e., both of them have points closer than ε, then the predictions by the classic decision tree are similar. Experimentally, we have also tested that ε- representativeness presents a significant correlation with the ordering of the feature importance. Moreover, we extend the results experimentally in the context of unseen vehicle collision data for XGboost, a machinelearning component widely adopted for dealing with tabular data. |
Citation | Perera Lago, J., Toscano Durán, V., Paluzo Hidalgo, E., Narteni, S. y Rucco, M. (2024). Application of the Representative Measure Approach to Assess the Reliability of Decision Trees in Dealing with Unseen Vehicle Collision Data. https://doi.org/10.1007/978-3-031-63803-9_21. |
Files | Size | Format | View | Description |
---|---|---|---|---|
Application of ...pdf | 473.3Kb | [PDF] | View/ | |