Mostrar el registro sencillo del ítem

Artículo

dc.creatorEstepa Alonso, Rafael Maríaes
dc.creatorDíaz Verdejo, Jesúses
dc.creatorEstepa Alonso, Antonio Josées
dc.creatorMadinabeitia Luque, Germánes
dc.date.accessioned2021-04-19T11:03:07Z
dc.date.available2021-04-19T11:03:07Z
dc.date.issued2020
dc.identifier.citationEstepa Alonso, R.M., Díaz Verdejo, J., Estepa Alonso, A.J. y Madinabeitia Luque, G. (2020). How much training data is enough?. A case study for HTTP anomaly-based intrusion detection. IEEE Access, 4, 44410-44425.
dc.identifier.issn2169-3536es
dc.identifier.urihttps://hdl.handle.net/11441/107307
dc.description.abstractMost anomaly-based intrusion detectors rely on models that learn from a training dataset whose quality is crucial in their performance. Albeit the properties of suitable datasets have been formulated, the influence of the dataset size on the performance of the anomaly-based detector has received scarce attention so far. In this work, we investigate the optimal size of a training dataset. This size should be large enough so that training data is representative of normal behavior, but after that point, collecting more data may result in unnecessary waste of time and computational resources, not to mention an increased risk of overtraining. In this spirit, we provide a method to find out when the amount of data collected at the production environment is representative of normal behavior in the context of a detector of HTTP URI attacks based on 1-grammar. Our approach is founded on a set of indicators related to the statistical properties of the data. These indicators are periodically calculated during data collection, producing time series that stabilize when more training data is not expected to translate to better system performance, which indicates that data collection can be stopped. We present a case study with real-life datasets collected at the University of Seville (Spain) and a public dataset from the University of Saskatchewan. The application of our method to these datasets showed that more than 42% of one of trace, and almost 20% of another were unnecessarily collected, thereby showing that our proposed method can be an efficient approach for collecting training data at the production environment.es
dc.formatapplication/pdfes
dc.format.extent16 p.es
dc.language.isoenges
dc.publisherInstitute of Electrical and Electronics Engineerses
dc.relation.ispartofIEEE Access, 4, 44410-44425.
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectanomaly-based intrusion detectiones
dc.subjectdataset assessmentes
dc.subjecttraininges
dc.titleHow much training data is enough?. A case study for HTTP anomaly-based intrusion detectiones
dc.typeinfo:eu-repo/semantics/articlees
dcterms.identifierhttps://ror.org/03yxnpp24
dc.type.versioninfo:eu-repo/semantics/publishedVersiones
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses
dc.contributor.affiliationUniversidad de Sevilla. Departamento de Ingeniería Telemáticaes
dc.relation.publisherversionhttps://ieeexplore.ieee.org/document/9019687es
dc.identifier.doi10.1109/ACCESS.2020.2977591es
dc.contributor.groupUniversidad de Sevilla. PI-1669/22/2017: Sistema Integral para Vigilancia y Auditoría de Ciberseguridad Corporativa (SIVA)es
dc.contributor.groupUniversidad de Sevilla. PI-1786/22/2018: Sistema de Ciberportección para servidores web de la Universidad de Sevilla (CiberwebUS)es
dc.contributor.groupUniversidad de Sevilla. PI-1736/22/2017: Detección Temprana de Ataques de Ciberseguridad en Servidores Web de la biblioteca de la USes
dc.journaltitleIEEE Accesses
dc.publication.volumen4es
dc.publication.initialPage44410es
dc.publication.endPage44425es

FicherosTamañoFormatoVerDescripción
How Mach Training Data.pdf5.780MbIcon   [PDF] Ver/Abrir  

Este registro aparece en las siguientes colecciones

Mostrar el registro sencillo del ítem

Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Excepto si se señala otra cosa, la licencia del ítem se describe como: Attribution-NonCommercial-NoDerivatives 4.0 Internacional