Dataset
Labeled HTTP requests dataset: Dataset Biblio-US17
Autor/es | Díaz Verdejo, Jesús
Estepa Alonso, Rafael María Estepa Alonso, Antonio José Muñoz Calle, Francisco Javier Madinabeitia Luque, Germán |
Compilador de datos | Estepa Alonso, Rafael María |
Gestor de datos | Díaz Verdejo, Jesús
Muñoz Calle, Francisco Javier |
Contacto | Muñoz Calle, Francisco Javier |
Departamento | Universidad de Sevilla. Departamento de Ingeniería Telemática |
Idioma (ISO) | inglés (eng) |
Fecha de difusión | 2023-07 |
Fecha de depósito | 2023-07-28 |
Fecha de creación | 2023 |
Versión | v.1 |
Resumen | This dataset contains a set of anonymized and labeled HTTP requests (selected fields) from the logs of a real-in-production web server at the library of the University of Seville during 6.5 months in 2017.
The dataset ... This dataset contains a set of anonymized and labeled HTTP requests (selected fields) from the logs of a real-in-production web server at the library of the University of Seville during 6.5 months in 2017. The dataset has been sanitized using a supervised methodology as proposed in: - Díaz-Verdejo, Jesús E.; Estepa, Antonio; Estepa, Rafael; Madinabeitia, German; Muñoz-Calle, Javier, "A methodology for conducting efficient sanitization of HTTP training datasets", Future Generation Computer Systems, vol. 109, pp. 67–82, 2020. https://doi.org/10.1016/j.future.2020.03.033. |
Contenido | The dataset is organized in a tree structure (subdirectories) each containing different types of files or sets. As provided, 5 sets of files and two partitioning schemes are considered. The partition files are not directly ... The dataset is organized in a tree structure (subdirectories) each containing different types of files or sets. As provided, 5 sets of files and two partitioning schemes are considered. The partition files are not directly provided but can be generated from the files using the provided script. The following sets of files (subdirs) are included: - RAW files: Initial registers (obtained after preprocessing and anonymization of real captured files). - LABEL files: Labels assigned during analysis. - CLEAN files: Registers considered as clean after sanitization. This is the full dataset to be used as normal traffic. - SID files: Information about SIDs triggered by used SIDS tools. - ATTACK files: Registers classified as attack (only LVL1 -indubituous- attacks). Registers in each set are organized in daily bins (files) named as biblio-2017-<mm>-<dd>.<ext>, being <mm> the number of the month, <dd> the day and <ext> an extension related to the type of content: - .raw for RAW files - .lbl for LBL files - .cl for CLEAN files - .sid for SID files - .att for ATTACK files |
Agencias financiadoras | European Commission (EC). Fondo Europeo de Desarrollo Regional (FEDER) Ministerio de Ciencia e Innovación Junta de Andalucía (Consejería de Transformación Económica, Industria, Conocimiento y Universidades) Universidad de Sevilla |
Identificador del proyecto | PI-1736/22/2017
A-TIC-224-UGR20 PID2020-115199RB-I00 PYC20-RE-087-USE |
Asociado a la publicación | Díaz-Verdejo, Jesús E.; Estepa, Rafael; Estepa, Antonio; Muñoz-Calle, Javier; Madinabeitia, German; "Biblio-US17: A large real and labeled URI dataset for website modelling torwards anomaly-based intrusion detection systems". [DOI pendiente de publicación] |
Tipo de dataset | Bases de datos |
Cita | Díaz Verdejo, J., Estepa Alonso, R.M.,...,Madinabeitia Luque, G. (2023). Labeled HTTP requests dataset: Dataset Biblio-US17. idUS. Depósito de Investigación de la Universidad de Sevilla. https://hdl.handle.net/11441/148254. |
Ficheros | Tamaño | Formato | Ver | Descripción |
---|---|---|---|---|
Biblio-US17.tar.gz | 1.139Gb | [application/gzip] | Acceso restringido. Petición a través del formulario. | Labeled HTTP requests dataset |
README.en | 13.38Kb | [Fichero de texto] | Ver/ | |
README_en.txt | 13.66Kb | [Fichero de texto] | Ver/ | |