Artículo
PolaritySpam: Propagating Content-based Information Through a Web-Graph to Detect Web Spam
Autor/es | Ortega Rodríguez, Francisco Javier
![]() ![]() ![]() ![]() ![]() ![]() ![]() Troyano Jiménez, José Antonio ![]() ![]() ![]() ![]() ![]() ![]() ![]() Cruz Mata, Fermín ![]() ![]() ![]() ![]() ![]() ![]() ![]() García Vallejo, Carlos Antonio |
Departamento | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos |
Fecha de publicación | 2012 |
Fecha de depósito | 2022-03-11 |
Publicado en |
|
Resumen | Spam web pages have become a problem for Information Retrieval systems
due to the negative effects that this phenomenon can cause in their results. In this work
we tackle the problem of detecting these pages with a ... Spam web pages have become a problem for Information Retrieval systems due to the negative effects that this phenomenon can cause in their results. In this work we tackle the problem of detecting these pages with a propagation algorithm that, taking as input a web graph, chooses a set of spam and not-spam web pages in order to spread their spam likelihood over the rest of the network. Thus we take advantage of the links between pages to obtain a ranking of pages according to their relevance and their spam likelihood. Our intuition consists in giving a high reputation to those pages related to relevant ones, and giving a high spam likelihood to the pages linked to spam web pages. We introduce the novelty of including the content of the web pages in the computation of an a priori estimation of the spam likelihood of the pages, and propagate this information. Our graph-based algorithm computes two scores for each node in the graph. Intuitively, these values represent how bad or good (spam-like or not) is a web page, according to its textual content and its relations in the graph. The experimental results show that our method outperforms other techniques for spam detection |
Agencias financiadoras | Ministerio de Educación y Ciencia (MEC). España |
Identificador del proyecto | HUM2007-66607-C04-04
![]() |
Cita | Ortega Rodríguez, F.J., Troyano Jiménez, J.A., Cruz Mata, F. y García Vallejo, C.A. (2012). PolaritySpam: Propagating Content-based Information Through a Web-Graph to Detect Web Spam. International Journal of Innovative Computing, Information and Control, 8 (4), 2915-2928. |
Ficheros | Tamaño | Formato | Ver | Descripción |
---|---|---|---|---|
POLARITYSPAM PROPAGATING CONTE ... | 262.3Kb | ![]() | Ver/ | |