Fake News Detection using news content and user engagement

Pérez Madre, Mario. (2021). Fake News Detection using news content and user engagement Master Thesis, Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Inteligencia Artificial

Ficheros (Some files may be inaccessible until you login with your e-spacio credentials)
Nombre Descripción Tipo MIME Size
PerezMadre_MarioTFM.pdf PerezMadre_MarioTFM.pdf application/pdf 2.05MB

Título Fake News Detection using news content and user engagement
Autor(es) Pérez Madre, Mario
Abstract Fake news are purposefully designed to be misleading, and their success depends mostly on their readers [ ¨OG17]. Due to the nature of social networks, fake news can be quickly propagated, potentially causing a great damage to the society. Moreover, sociological phenomena like echo chambers or polarization [Sil+16], and psychological factors like confirmation bias [Del+16] or overconfidence to be fooled by fake news 1, create the perfect playground for misinformation [Tac +17] [Del+18]. Platforms have recently adopted measures 2 to discourage users sharing content without reading it first, but these measures are still not fully enforced and easy to bypass. An automatic fake news detection system that blocks or warns users about possibly misleading information will be needed in the near future, espe-cially with the high volume of information that is shared through these websites. Several models have been proposed to detect fake news by analyzing linguistic features [HA17] [Pot +17][BS19] [KGN21], but these are often not enough [Shu+18] to distinguish fake news from real ones. Research is now focusing on including user engagement information to existing contentbased models [RSL17] [Del+18] [SML19]. Some systems have been proposed that only use information from these engagements [Tac+17]. Part of the research has focused on creating training datasets [SW18] [Shu+18] for this task, crawling news from fact-checking websites and fetching user engagements using the public APIs offered by social networks. In this work, we develop and test different fake news detection systems using information from news articles and user engagements in social networks. Two different architectures are used. The main one is based on Deep Learning, and can process news content and user engagements. The second one is based on well-known algorithms like logistic regression, SVM, random forest, LightGBM or XGBoost; it can only process news content and is used as a performance baseline for our Deep Learning models. We use the FakeNewsNet dataset [Shu+18], which contains real and fake news from two factchecking sources. For each news piece, this dataset contains the scraped news article, as well as tweets and retweets related to each news, and user profiles of the users involved in these tweet, including the user’s timeline, followers and followees, although not all this information will be used. Our work starts with an exploratory data analysis on the train set, where we highlight the main charac-teristics of the dataset. Then, we carry out a series of tests on both architectures, taking news from each set of news, with different subsets of features and with various textual representation techniques. Additionally, we perform an ablation test on the Deep Learning architecture, to understand how individual features behave and how do they complement each other. Our results clearly show that our architecture is able to capture much information from user engagements, and that including user interactions gives better results than models using only information from news articles. With this work, our main contribution is a Deep Learning architecture capable of handling varying-length sequences of engagements for each piece of news, while also extracting all the information from them without padding or truncating to fixed-size sequences. We take advantage of recent innovations in frameworks like Tensorflow to process non-tabular-shaped data, which allows to directly include unaggregated features, minimizing the preprocessing required before the input data is fed to the model. This architecture can perform complex summarizations, such as a trainable recurrent layer that takes a sequence of user engagements in the same order as they were published, and outputs a vector that summarizes the whole user engagement sequence
Notas adicionales Trabajo de Fin de Máster Universitario en Ingeniería y Ciencia de Datos. UNED
Materia(s) Ingeniería Informática
Editor(es) Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Inteligencia Artificial
Director/Tutor Rodrigo Yuste, Álvaro
Fecha 2021-07-01
Formato application/pdf
Identificador bibliuned:master-ETSInformatica-ICD-Mperez
Idioma eng
Versión de la publicación acceptedVersion
Nivel de acceso y licencia http://creativecommons.org/licenses/by-nc-nd/4.0
Tipo de recurso master Thesis
Tipo de acceso Acceso abierto

Versión Tipo de filtro
Contador de citas: Google Scholar Search Google Scholar
Estadísticas de acceso: 133 Visitas, 74 Descargas  -  Estadísticas en detalle
Creado: Fri, 29 Oct 2021, 19:37:18 CET