Detecting malicious tweets in trending topics using a statistical analysis of language

Martínez-Romo, Juan; Araujo, Lourdes

doi:10.1016/j.eswa.2012.12.015

Detecting malicious tweets in trending topics using a statistical analysis of language

Martínez-Romo, Juan y Araujo, Lourdes . (2013) Detecting malicious tweets in trending topics using a statistical analysis of language. Expert Systems with Applications 40(8), 2013, pp.2992–3000. ISSN: 0957-4174. DOI:10.1016/j.eswa.2012.12.015

Ficheros (Some files may be inaccessible until you login with your e-spacio credentials)
Nombre			Descripción	Tipo MIME		Size
Detecting_malicious.pdf			Full text (open access)		application/pdf	642.72KB

Título	Detecting malicious tweets in trending topics using a statistical analysis of language
Autor(es)	Martínez-Romo, Juan Araujo, Lourdes
Materia(s)	Informática
Resumen	Twitter spam detection is a recent area of research in which most previous works had focused on the identification of malicious user accounts and honeypot-based approaches. However, in this paper we present a methodology based on two new aspects: the detection of spam tweets in isolation and without previous information of the user; and the application of a statistical analysis of language to detect spam in trending topics. Trending topics capture the emerging Internet trends and topics of discussion that are in everybody’s lips. This growing microblogging phenomenon therefore allows spammers to disseminate malicious tweets quickly and massively. In this paper we present the first work that tries to detect spam tweets in real time using language as the primary tool. We first collected and labeled a large dataset with 34 K trending topics and 20 million tweets. Then, we have proposed a reduced set of features hardly manipulated by spammers. In addition, we have developed a machine learning system with some orthogonal features that can be combined with other sets of features with the aim of analyzing emergent characteristics of spam in social networks. We have also conducted an extensive evaluation process that has allowed us to show how our system is able to obtain an F-measure at the same level as the best state-ofthe- art systems based on the detection of spam accounts. Thus, our system can be applied to Twitter spam detection in trending topics in real time due mainly to the analysis of tweets instead of user accounts.
Palabras clave	spam detection social network statistical natural language processing machine learning
Editor(es)	Elsevier
Fecha	2013-06-01
Formato	application/pdf
Identificador	http://e-spacio.uned.es/fez/view/bibliuned:DptoLSI-ETSI-MA2VICMR-1010 bibliuned:DptoLSI-ETSI-MA2VICMR-1010
DOI - identifier	10.1016/j.eswa.2012.12.015
ISSN - identifier	0957-4174
Publicado en la Revista	Expert Systems with Applications 40(8), 2013, pp.2992–3000. ISSN: 0957-4174. DOI:10.1016/j.eswa.2012.12.015
Idioma	eng
Versión de la publicación	publishedVersion
Relacionado con el proyecto:	info:eu-repo/grantAgreement/S2009/TIC-1542
Tipo de recurso	Article
Derechos de acceso y licencia	http://creativecommons.org/licenses/by-nc-nd/4.0 info:eu-repo/semantics/openAccess
Tipo de acceso	Acceso abierto

Tipo de documento:	Artículo de revista
Collections:	Grupo de Procesamiento del Lenguaje Natural y Recuperación de Información. Proyecto MA2VICMR-CM Set de artículo Set de proyectos financiados Set de openaire

Contador de citas:	Search Google Scholar
Estadísticas de acceso:	602 Visitas, 2850 Descargas - Estadísticas en detalle
Creado:	Fri, 31 Oct 2014, 15:19:27 CET

e-spacio

Detecting malicious tweets in trending topics using a statistical analysis of language