Detecting malicious tweets in trending topics using a statistical analysis of language

Martínez-Romo, Juan y Araujo, Lourdes . (2013) Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Systems with Applications 40(8), 2013, pp.2992–3000. ISSN: 0957-4174. DOI:10.1016/j.eswa.2012.12.015

Ficheros (Some files may be inaccessible until you login with your e-spacio credentials)
Nombre Descripción Tipo MIME Size
Detecting_malicious.pdf Full text (open access) application/pdf 642.72KB

Título Detecting malicious tweets in trending topics using a statistical analysis of language
Autor(es) Martínez-Romo, Juan
Araujo, Lourdes
Materia(s) Informática
Resumen Twitter spam detection is a recent area of research in which most previous works had focused on the identification of malicious user accounts and honeypot-based approaches. However, in this paper we present a methodology based on two new aspects: the detection of spam tweets in isolation and without previous information of the user; and the application of a statistical analysis of language to detect spam in trending topics. Trending topics capture the emerging Internet trends and topics of discussion that are in everybody’s lips. This growing microblogging phenomenon therefore allows spammers to disseminate malicious tweets quickly and massively. In this paper we present the first work that tries to detect spam tweets in real time using language as the primary tool. We first collected and labeled a large dataset with 34 K trending topics and 20 million tweets. Then, we have proposed a reduced set of features hardly manipulated by spammers. In addition, we have developed a machine learning system with some orthogonal features that can be combined with other sets of features with the aim of analyzing emergent characteristics of spam in social networks. We have also conducted an extensive evaluation process that has allowed us to show how our system is able to obtain an F-measure at the same level as the best state-ofthe- art systems based on the detection of spam accounts. Thus, our system can be applied to Twitter spam detection in trending topics in real time due mainly to the analysis of tweets instead of user accounts.
Palabras clave spam detection
social network
statistical natural language processing
machine learning
Editor(es) Elsevier
Fecha 2013-06-01
Formato application/pdf
Identificador http://e-spacio.uned.es/fez/view/bibliuned:DptoLSI-ETSI-MA2VICMR-1010
bibliuned:DptoLSI-ETSI-MA2VICMR-1010
DOI - identifier 10.1016/j.eswa.2012.12.015
ISSN - identifier 0957-4174
Publicado en la Revista Expert Systems with Applications 40(8), 2013, pp.2992–3000. ISSN: 0957-4174. DOI:10.1016/j.eswa.2012.12.015
Idioma eng
Versión de la publicación publishedVersion
Relacionado con el proyecto: info:eu-repo/grantAgreement/S2009/TIC-1542
Tipo de recurso Article
Derechos de acceso y licencia http://creativecommons.org/licenses/by-nc-nd/4.0
info:eu-repo/semantics/openAccess
Tipo de acceso Acceso abierto

 
Versiones
Versión Tipo de filtro
Contador de citas: Google Scholar Search Google Scholar
Estadísticas de acceso: 212 Visitas, 639 Descargas  -  Estadísticas en detalle
Creado: Fri, 31 Oct 2014, 14:19:27 CET