Publication:
Can deep learning techniques improve classification performance of vandalism detection in Wikipedia?

dc.contributor.authorMartinez-Rico, Juan R.
dc.contributor.authorMartínez Romo, Juan
dc.contributor.authorAraujo Serna, M. Lourdes
dc.date.accessioned2024-06-11T15:15:28Z
dc.date.available2024-06-11T15:15:28Z
dc.date.issued2019
dc.description.abstractWikipedia is a free encyclopedia created as an international collaborative project. One of its peculiarities is that any user can edit its contents almost without restrictions, what has given rise to a phenomenon known as vandalism. Vandalism is any attempt that seeks to damage the integrity of the encyclopedia deliberately. To address this problem, in recent years several automatic detection systems and associated features have been developed. This work implements one of these systems, which uses three sets of new features based on different techniques. Specifically we study the applicability of a leading technology as deep learning to the problem of vandalism detection. The first set is obtained by expanding a list of vandal terms taking advantage of the existing semantic-similarity relations in word embeddings and deep neural networks. Deep learning techniques are applied to the second set of features, specifically Stacked Denoising Autoencoders (SDA), in order to reduce the dimensionality of a bag of words model obtained from a set of edits taken from Wikipedia. The last set uses graph-based ranking algorithms to generate a list of vandal terms from a vandalism corpus extracted from Wikipedia. These three sets of new features are evaluated separately as well as together to study their complementarity, improving the results in the state of the art. The system evaluation has been carried out on a corpus extracted from Wikipedia (WP_Vandal) as well as on another called PAN-WVC-2010 that was used in a vandalism detection competition held at CLEF conference.en
dc.description.versionversión publicada
dc.identifier.doihttps://doi.org/10.1016/j.engappai.2018.11.012
dc.identifier.issn0952-1976
dc.identifier.urihttps://hdl.handle.net/20.500.14468/22411
dc.journal.titleEngineering Applications of Artificial Intelligence
dc.journal.volume78
dc.language.isoen
dc.publisherElsevier
dc.relation.centerE.T.S. de Ingeniería Informática
dc.relation.departmentLenguajes y Sistemas Informáticos
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttps://creativecommons.org/licenses/by-nc-nd/4.0/deed.es
dc.subject.keywordsVandalism
dc.subject.keywordsWikipedia
dc.subject.keywordsNatural language processing
dc.subject.keywordsDeep learning
dc.subject.keywordsWord embedding
dc.titleCan deep learning techniques improve classification performance of vandalism detection in Wikipedia?es
dc.typejournal articleen
dc.typeartículoes
dspace.entity.typePublication
relation.isAuthorOfPublication91b7e317-2a30-494f-98e9-3a0e026747b1
relation.isAuthorOfPublication77c4023e-4374-442a-9dfb-b9d4b609c31e
relation.isAuthorOfPublication.latestForDiscovery91b7e317-2a30-494f-98e9-3a0e026747b1
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Martinez-Romo_Juan_EAAI2019.pdf
Size:
1.69 MB
Format:
Adobe Portable Document Format