Informática
URI permanente para esta comunidadhttps://hdl.handle.net/11441/33181
Examinar
Examinando Informática por Autor "Alaiz Rodríguez, Rocío"
Mostrando 1 - 2 de 2
- Resultados por página
- Opciones de ordenación
Ponencia Automating cybersecurity TTP classification based on unstructured attack descriptions(Universidad de Sevilla. Escuela Técnica Superior de Ingeniería Informática, 2024) Castaño, Felipe; Gil Lerchundi, Amaia; Orduna Urrutia, Raúl; Fidalgo Fernández, Eduardo; Alaiz Rodríguez, RocíoCTI sources help SOCs to share important information about incidents and attacks. Unstructured text processing gains importance, considering that incident-related information is present in a wide range of sources. The datasets in the literature contain insufficiently lengthy text or a limited number of samples per class. Therefore, we proposed a method to build a semi-automatic dataset using the CTI sources. As a result, we have presented a new dataset of unstructured CTI descriptions called Weakness, Attack, Vulnerabilities, and Events 27k (WAVE 27K). WAVE-27K includes information on 27 different MITRE techniques and 7 tactics, containing 22539 samples associated with a single technique and 5262 samples related to two or more techniques. WAVE-27K is the largest dataset compared to those in the literature. We trained a BERT-based model using WAVE-27K, obtaining a 97.00% micro F1-score, which could validate that the information included on WAVE-27-K has quality sufficient for training machine learning models.Ponencia Spam hierarchical clustering for campaigns spotting and topic-based classification [Póster](Universidad de Sevilla. Escuela Técnica Superior de Ingeniería Informática, 2024) Jáñez Martino, Francisco; Carofilis, Andrés; Alaiz Rodríguez, Rocío; González Castro, Víctor; Fidalgo, Eduardo; Alegre, EnriqueThis article focuses on the creation of multi classification systems for spam email in cybersecurity orga nizations to prevent cyber-attacks and spam campaigns. We introduce two new subsets: SPEMC-15K-E and SPEMC-15K-S, comprising 14479 and 14992 spam emails, in English and Span ish, respectively. These are divided into eleven classes, defined using agglomerative hierarchical clustering. We evaluated sixteen pipelines, combining text representation techniques (TF-IDF, Bag of Words, Word2Vec, and BERT) and classifiers (Support Vector Machine, Na¨ ıve Bayes, Random Forest, and Logistic Regression). TF-IDF with Logistic Regression (LR) achieved the best results for English, with an F1-score of 0.953 and 94.6% accuracy. Similarly, TF-IDF with Na¨ ıve Bayes achieved the best results for Spanish, achieving an F1-score of 0.945 and 98.5% accuracy. Finally, it was observed that the TF-IDF with LR has the shortest processing time, completing the classification in an average of 2ms and 2.2ms per-email in English and Spanish, respectively.