dc.contributor.editor | Varela Vaca, Ángel Jesús | es |
dc.contributor.editor | Ceballos Guerrero, Rafael | es |
dc.contributor.editor | Reina Quintero, Antonia María | es |
dc.creator | Jáñez Martino, Francisco | es |
dc.creator | Carofilis, Andrés | es |
dc.creator | Alaiz Rodríguez, Rocío | es |
dc.creator | González Castro, Víctor | es |
dc.creator | Fidalgo, Eduardo | es |
dc.creator | Alegre, Enrique | es |
dc.date.accessioned | 2024-08-27T10:58:40Z | |
dc.date.available | 2024-08-27T10:58:40Z | |
dc.date.issued | 2024 | |
dc.identifier.citation | Jáñez Martino, F., Carofilis, A., Alaiz Rodríguez, R., González Castro, V., Fidalgo, E. y Alegre, E. (2024). Spam hierarchical clustering for campaigns spotting and topic-based classification [Póster]. En Jornadas Nacionales de Investigación en Ciberseguridad (JNIC) (9ª.2024. Sevilla) (490-491), Sevilla: Universidad de Sevilla. Escuela Técnica Superior de Ingeniería Informática. | |
dc.identifier.isbn | 978-84-09-62140-8 | es |
dc.identifier.uri | https://hdl.handle.net/11441/162068 | |
dc.description.abstract | This article focuses on the creation of multi classification systems for spam email in cybersecurity orga nizations to prevent cyber-attacks and spam campaigns. We introduce two new subsets: SPEMC-15K-E and SPEMC-15K-S, comprising 14479 and 14992 spam emails, in English and Span ish, respectively. These are divided into eleven classes, defined using agglomerative hierarchical clustering. We evaluated sixteen pipelines, combining text representation techniques (TF-IDF, Bag of Words, Word2Vec, and BERT) and classifiers (Support Vector Machine, Na¨ ıve Bayes, Random Forest, and Logistic Regression). TF-IDF with Logistic Regression (LR) achieved the best results for English, with an F1-score of 0.953 and 94.6% accuracy. Similarly, TF-IDF with Na¨ ıve Bayes achieved the best results for Spanish, achieving an F1-score of 0.945 and 98.5% accuracy. Finally, it was observed that the TF-IDF with LR has the shortest processing time, completing the classification in an average of 2ms and 2.2ms per-email in English and Spanish, respectively. | es |
dc.format | application/pdf | es |
dc.format.extent | 2 | es |
dc.language.iso | eng | es |
dc.publisher | Universidad de Sevilla. Escuela Técnica Superior de Ingeniería Informática | es |
dc.relation.ispartof | Jornadas Nacionales de Investigación en Ciberseguridad (JNIC) (9ª.2024. Sevilla) (2024), pp. 490-491. | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Spam detection | es |
dc.subject | Multi-classification | es |
dc.subject | Image based spam | es |
dc.subject | Text classification | es |
dc.title | Spam hierarchical clustering for campaigns spotting and topic-based classification [Póster] | es |
dc.type | info:eu-repo/semantics/conferenceObject | es |
dc.type.version | info:eu-repo/semantics/publishedVersion | es |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es |
dc.publication.initialPage | 490 | es |
dc.publication.endPage | 491 | es |
dc.eventtitle | Jornadas Nacionales de Investigación en Ciberseguridad (JNIC) (9ª.2024. Sevilla) | es |
dc.eventinstitution | Sevilla | es |
dc.relation.publicationplace | Sevilla | es |