dc.creator | Ortega Rodríguez, Francisco Javier | es |
dc.creator | MacDonald, Craig | es |
dc.creator | Troyano Jiménez, José Antonio | es |
dc.creator | Cruz Mata, Fermín | es |
dc.date.accessioned | 2020-08-05T09:22:03Z | |
dc.date.available | 2020-08-05T09:22:03Z | |
dc.date.issued | 2010 | |
dc.identifier.citation | Ortega Rodríguez, F.J., MacDonald, C., Troyano Jiménez, J.A. y Cruz Mata, F. (2010). Spam detection with a content-based random-walk algorithm. En SMUC 2010: 2nd international workshop on Search and mining user-generated contents (45-52), Toronto, ON, Canada: ACM Digital Library. | |
dc.identifier.isbn | 978-1-4503-0386-6 | es |
dc.identifier.uri | https://hdl.handle.net/11441/100111 | |
dc.description.abstract | In this work we tackle the problem of the spam detection on the
Web. Spam web pages have become a problem for Web search
engines, due to the negative effects that this phe-nomenon can
cause in their retrieval results. Our approach is based on a
random-walk algorithm that obtains a ranking of pages
according to their relevance and their spam likelihood. We
introduce the novelty of taking into account the content of the
web pages to characterize the web graph and to ob-tain an a-
priori estimation of the spam likekihood of the web pages. Our
graph-based algorithm computes two scores for each node in the
graph. Intuitively, these values represent how bad or good
(spam-like or not) is a web page, according to its textual content
and the relations in the graph. Our experiments show that our
proposed technique outperforms other link-based techniques
for spam detection. | es |
dc.description.sponsorship | Ministerio de Educación y Ciencia HUM2007-66607-C04-04 | es |
dc.format | application/pdf | es |
dc.format.extent | 7 | es |
dc.language.iso | eng | es |
dc.publisher | ACM Digital Library | es |
dc.relation.ispartof | SMUC 2010: 2nd international workshop on Search and mining user-generated contents (2010), p 45-52 | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Information Retrieval | es |
dc.subject | Web spam detection | es |
dc.subject | Graph algorithms | es |
dc.subject | PageRank | es |
dc.subject | Web search | es |
dc.title | Spam detection with a content-based random-walk algorithm | es |
dc.type | info:eu-repo/semantics/conferenceObject | es |
dcterms.identifier | https://ror.org/03yxnpp24 | |
dc.type.version | info:eu-repo/semantics/submittedVersion | es |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es |
dc.contributor.affiliation | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos | es |
dc.relation.projectID | HUM2007-66607-C04-04 | es |
dc.relation.publisherversion | https://dl.acm.org/doi/10.1145/1871985.1871994 | es |
dc.identifier.doi | 10.1145/1871985.1871994 | es |
dc.publication.initialPage | 45 | es |
dc.publication.endPage | 52 | es |
dc.eventtitle | SMUC 2010: 2nd international workshop on Search and mining user-generated contents | es |
dc.eventinstitution | Toronto, ON, Canada | es |
dc.relation.publicationplace | New York, USA | es |
dc.contributor.funder | Ministerio de Educación y Ciencia (MEC). España | es |