dc.creator | Ortega Rodríguez, Francisco Javier | es |
dc.creator | Troyano Jiménez, José Antonio | es |
dc.creator | Cruz Mata, Fermín | es |
dc.creator | García Vallejo, Carlos Antonio | es |
dc.date.accessioned | 2022-03-11T08:04:56Z | |
dc.date.available | 2022-03-11T08:04:56Z | |
dc.date.issued | 2012 | |
dc.identifier.citation | Ortega Rodríguez, F.J., Troyano Jiménez, J.A., Cruz Mata, F. y García Vallejo, C.A. (2012). PolaritySpam: Propagating Content-based Information Through a Web-Graph to Detect Web Spam. International Journal of Innovative Computing, Information and Control, 8 (4), 2915-2928. | |
dc.identifier.issn | 1349-4198 | es |
dc.identifier.uri | https://hdl.handle.net/11441/130681 | |
dc.description.abstract | Spam web pages have become a problem for Information Retrieval systems
due to the negative effects that this phenomenon can cause in their results. In this work
we tackle the problem of detecting these pages with a propagation algorithm that, taking
as input a web graph, chooses a set of spam and not-spam web pages in order to spread
their spam likelihood over the rest of the network. Thus we take advantage of the links
between pages to obtain a ranking of pages according to their relevance and their spam
likelihood. Our intuition consists in giving a high reputation to those pages related to
relevant ones, and giving a high spam likelihood to the pages linked to spam web pages.
We introduce the novelty of including the content of the web pages in the computation of
an a priori estimation of the spam likelihood of the pages, and propagate this information.
Our graph-based algorithm computes two scores for each node in the graph. Intuitively,
these values represent how bad or good (spam-like or not) is a web page, according to its
textual content and its relations in the graph. The experimental results show that our
method outperforms other techniques for spam detection | es |
dc.description.sponsorship | Ministerio de Educación y Ciencia HUM2007-66607-C04-04 | es |
dc.format | application/pdf | es |
dc.format.extent | 14 | es |
dc.language.iso | eng | es |
dc.publisher | ICIC International | es |
dc.relation.ispartof | International Journal of Innovative Computing, Information and Control, 8 (4), 2915-2928. | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Information retrieval | es |
dc.subject | Web spam detection | es |
dc.subject | Graph algorithms | es |
dc.subject | PageRank | es |
dc.subject | Web search | es |
dc.title | PolaritySpam: Propagating Content-based Information Through a Web-Graph to Detect Web Spam | es |
dc.type | info:eu-repo/semantics/article | es |
dc.type.version | info:eu-repo/semantics/submittedVersion | es |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es |
dc.contributor.affiliation | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos | es |
dc.relation.projectID | HUM2007-66607-C04-04 | es |
dc.relation.publisherversion | http://www.ijicic.org/contents.htm | es |
dc.contributor.group | Universidad de Sevilla. TIC134: Sistemas Informáticos | es |
dc.journaltitle | International Journal of Innovative Computing, Information and Control | es |
dc.publication.volumen | 8 | es |
dc.publication.issue | 4 | es |
dc.publication.initialPage | 2915 | es |
dc.publication.endPage | 2928 | es |
dc.identifier.sisius | 20031169 | es |
dc.contributor.funder | Ministerio de Educación y Ciencia (MEC). España | es |