Ponencia
Combining Textual Content and Hyperlinks in Web Spam Detection
Autor/es | Ortega Rodríguez, Francisco Javier
MacDonald, Craig Troyano Jiménez, José Antonio Cruz Mata, Fermín Enríquez de Salamanca Ros, Fernando |
Departamento | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos |
Fecha de publicación | 2011 |
Fecha de depósito | 2020-07-17 |
Publicado en |
|
ISBN/ISSN | 978-3-642-22326-6 0302-9743 |
Resumen | In this work1, we tackle the problem of spam detection on
the Web. Spam web pages have become a problem for Web search engines,
due to the negative effects that this phenomenon can cause in
their retrieval results. Our ... In this work1, we tackle the problem of spam detection on the Web. Spam web pages have become a problem for Web search engines, due to the negative effects that this phenomenon can cause in their retrieval results. Our approach is based on a random-walk algorithm that obtains a ranking of pages according to their relevance and their spam likelihood. We introduce the novelty of taking into account the content of the web pages to characterize the web graph and to obtain an a priori estimation of the spam likelihood of the web pages. Our graph-based algorithm computes two scores for each node in the graph. Intuitively, these values represent how bad or good (spam-like or not) a web page is, according to its textual content and the relations in the graph. Our experiments show that our proposed technique outperforms other link-based techniques for spam detection. |
Agencias financiadoras | Ministerio de Educación y Ciencia (MEC). España |
Identificador del proyecto | HUM2007-66607-C04-04 |
Cita | Ortega Rodríguez, F.J., MacDonald, C., Troyano Jiménez, J.A., Cruz Mata, F. y Enríquez de Salamanca Ros, F. (2011). Combining Textual Content and Hyperlinks in Web Spam Detection. En NLDB 2011: 16th International Conference on Applications of Natural Language to Information Systems (266-269), Alicante, España: Springer. |
Ficheros | Tamaño | Formato | Ver | Descripción |
---|---|---|---|---|
Combining Textual Content and ... | 87.12Kb | [PDF] | Ver/ | |