Can deep learning techniques improve classification performance of vandalism detection in Wikipedia?

Martinez-Rico, Juan R.; Martinez-Romo, Juan; Araujo, Lourdes

doi:https://doi.org/10.1016/j.engappai.2018.11.012

Can deep learning techniques improve classification performance of vandalism detection in Wikipedia?

Martinez-Rico, Juan R., Martinez-Romo, Juan y Araujo, Lourdes . (2019) Can deep learning techniques improve classification performance of vandalism detection in Wikipedia?. Engineering Applications of Artificial Intelligence, vol.78, pp 248-259

Ficheros (Some files may be inaccessible until you login with your e-spacio credentials)
Nombre			Descripción	Tipo MIME		Size
Martinez-Romo_Juan_EAAI2019.pdf			Martinez-Romo_Juan_EAAI2019.pdf		application/pdf	1.69MB

Título	Can deep learning techniques improve classification performance of vandalism detection in Wikipedia?
Autor(es)	Martinez-Rico, Juan R. Martinez-Romo, Juan Araujo, Lourdes
Materia(s)	Ingeniería Informática
Abstract	Wikipedia is a free encyclopedia created as an international collaborative project. One of its peculiarities is that any user can edit its contents almost without restrictions, what has given rise to a phenomenon known as vandalism. Vandalism is any attempt that seeks to damage the integrity of the encyclopedia deliberately. To address this problem, in recent years several automatic detection systems and associated features have been developed. This work implements one of these systems, which uses three sets of new features based on different techniques. Specifically we study the applicability of a leading technology as deep learning to the problem of vandalism detection. The first set is obtained by expanding a list of vandal terms taking advantage of the existing semantic-similarity relations in word embeddings and deep neural networks. Deep learning techniques are applied to the second set of features, specifically Stacked Denoising Autoencoders (SDA), in order to reduce the dimensionality of a bag of words model obtained from a set of edits taken from Wikipedia. The last set uses graph-based ranking algorithms to generate a list of vandal terms from a vandalism corpus extracted from Wikipedia. These three sets of new features are evaluated separately as well as together to study their complementarity, improving the results in the state of the art. The system evaluation has been carried out on a corpus extracted from Wikipedia (WP_Vandal) as well as on another called PAN-WVC-2010 that was used in a vandalism detection competition held at CLEF conference.
Palabras clave	Vandalism Wikipedia Natural language processing Deep learning Word embedding
Editor(es)	Elsevier
Fecha	2019
Formato	application/pdf
Identificador	bibliuned:DptoLSI-ETSI-Articulos-Jmartinez-0002 http://e-spacio.uned.es/fez/view/bibliuned:DptoLSI-ETSI-Articulos-Jmartinez-0002
DOI - identifier	https://doi.org/10.1016/j.engappai.2018.11.012
ISSN - identifier	0952-1976
Nombre de la revista	Engineering Applications of Artificial Intelligence
Número de Volumen	78
Página inicial	248
Página final	259
Publicado en la Revista	Engineering Applications of Artificial Intelligence, vol.78, pp 248-259
Idioma	eng
Versión de la publicación	publishedVersion
Tipo de recurso	Article
Derechos de acceso y licencia	http://creativecommons.org/licenses/by-nc-nd/4.0 info:eu-repo/semantics/openAccess
Tipo de acceso	Acceso abierto
Notas adicionales	La versión registrada de este artículo, publicado por primera vez en Engineering Applications of Artificial Intelligence, vol.78, pp 248-259, 2019, está disponible en línea en el sitio web del editor: Elsevier, https://doi.org/10.1016/j.engappai.2018.11.012
Notas adicionales	The registered version of this article, first published in Engineering Applications of Artificial Intelligence, vol.78, pp 248-259, 2019, is available online at the publisher's website: Elsevier, https://doi.org/10.1016/j.engappai.2018.11.012

Tipo de documento:	Artículo de revista
Collections:	Departamento de Lenguajes y Sistemas Informáticos. E.T.S.I Informática (UNED). Artículos Set de openaire

Contador de citas:	Search Google Scholar
Estadísticas de acceso:	36 Visitas, 4 Descargas - Estadísticas en detalle
Creado:	Tue, 30 Apr 2024, 18:08:41 CET

e-spacio

Can deep learning techniques improve classification performance of vandalism detection in Wikipedia?