Can deep learning techniques improve classification performance of vandalism detection in Wikipedia?

Martinez-Rico, Juan R., Martinez-Romo, Juan y Araujo, Lourdes . (2019) Can deep learning techniques improve classification performance of vandalism detection in Wikipedia?. Engineering Applications of Artificial Intelligence, vol.78, pp 248-259

Ficheros (Some files may be inaccessible until you login with your e-spacio credentials)
Nombre Descripción Tipo MIME Size
Martinez-Romo_Juan_EAAI2019.pdf Martinez-Romo_Juan_EAAI2019.pdf application/pdf 1.69MB

Título Can deep learning techniques improve classification performance of vandalism detection in Wikipedia?
Autor(es) Martinez-Rico, Juan R.
Martinez-Romo, Juan
Araujo, Lourdes
Materia(s) Ingeniería Informática
Abstract Wikipedia is a free encyclopedia created as an international collaborative project. One of its peculiarities is that any user can edit its contents almost without restrictions, what has given rise to a phenomenon known as vandalism. Vandalism is any attempt that seeks to damage the integrity of the encyclopedia deliberately. To address this problem, in recent years several automatic detection systems and associated features have been developed. This work implements one of these systems, which uses three sets of new features based on different techniques. Specifically we study the applicability of a leading technology as deep learning to the problem of vandalism detection. The first set is obtained by expanding a list of vandal terms taking advantage of the existing semantic-similarity relations in word embeddings and deep neural networks. Deep learning techniques are applied to the second set of features, specifically Stacked Denoising Autoencoders (SDA), in order to reduce the dimensionality of a bag of words model obtained from a set of edits taken from Wikipedia. The last set uses graph-based ranking algorithms to generate a list of vandal terms from a vandalism corpus extracted from Wikipedia. These three sets of new features are evaluated separately as well as together to study their complementarity, improving the results in the state of the art. The system evaluation has been carried out on a corpus extracted from Wikipedia (WP_Vandal) as well as on another called PAN-WVC-2010 that was used in a vandalism detection competition held at CLEF conference.
Palabras clave Vandalism
Wikipedia
Natural language processing
Deep learning
Word embedding
Editor(es) Elsevier
Fecha 2019
Formato application/pdf
Identificador bibliuned:DptoLSI-ETSI-Articulos-Jmartinez-0002
http://e-spacio.uned.es/fez/view/bibliuned:DptoLSI-ETSI-Articulos-Jmartinez-0002
DOI - identifier https://doi.org/10.1016/j.engappai.2018.11.012
ISSN - identifier 0952-1976
Nombre de la revista Engineering Applications of Artificial Intelligence
Número de Volumen 78
Página inicial 248
Página final 259
Publicado en la Revista Engineering Applications of Artificial Intelligence, vol.78, pp 248-259
Idioma eng
Versión de la publicación publishedVersion
Tipo de recurso Article
Derechos de acceso y licencia http://creativecommons.org/licenses/by-nc-nd/4.0
info:eu-repo/semantics/openAccess
Tipo de acceso Acceso abierto
Notas adicionales La versión registrada de este artículo, publicado por primera vez en Engineering Applications of Artificial Intelligence, vol.78, pp 248-259, 2019, está disponible en línea en el sitio web del editor: Elsevier, https://doi.org/10.1016/j.engappai.2018.11.012
Notas adicionales The registered version of this article, first published in Engineering Applications of Artificial Intelligence, vol.78, pp 248-259, 2019, is available online at the publisher's website: Elsevier, https://doi.org/10.1016/j.engappai.2018.11.012

 
Versiones
Versión Tipo de filtro
Contador de citas: Google Scholar Search Google Scholar
Estadísticas de acceso: 36 Visitas, 4 Descargas  -  Estadísticas en detalle
Creado: Tue, 30 Apr 2024, 18:08:41 CET