Fake News Detection using news content and user engagement

Pérez Madre, Mario

Publicación:
Fake News Detection using news content and user engagement

Archivos

PerezMadre_MarioTFM.pdf (2.06 MB)

Fecha

2021-07-01

Autores

Pérez Madre, Mario

Director/a

Rodrigo Yuste, Álvaro

Derechos de acceso

info:eu-repo/semantics/openAccess

Editor

Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Inteligencia Artificial

Resumen

Fake news are purposefully designed to be misleading, and their success depends mostly on their readers [ ¨OG17]. Due to the nature of social networks, fake news can be quickly propagated, potentially causing a great damage to the society. Moreover, sociological phenomena like echo chambers or polarization [Sil+16], and psychological factors like confirmation bias [Del+16] or overconfidence to be fooled by fake news 1, create the perfect playground for misinformation [Tac +17] [Del+18]. Platforms have recently adopted measures 2 to discourage users sharing content without reading it first, but these measures are still not fully enforced and easy to bypass. An automatic fake news detection system that blocks or warns users about possibly misleading information will be needed in the near future, espe-cially with the high volume of information that is shared through these websites. Several models have been proposed to detect fake news by analyzing linguistic features [HA17] [Pot +17][BS19] [KGN21], but these are often not enough [Shu+18] to distinguish fake news from real ones. Research is now focusing on including user engagement information to existing contentbased models [RSL17] [Del+18] [SML19]. Some systems have been proposed that only use information from these engagements [Tac+17]. Part of the research has focused on creating training datasets [SW18] [Shu+18] for this task, crawling news from fact-checking websites and fetching user engagements using the public APIs offered by social networks. In this work, we develop and test different fake news detection systems using information from news articles and user engagements in social networks. Two different architectures are used. The main one is based on Deep Learning, and can process news content and user engagements. The second one is based on well-known algorithms like logistic regression, SVM, random forest, LightGBM or XGBoost; it can only process news content and is used as a performance baseline for our Deep Learning models. We use the FakeNewsNet dataset [Shu+18], which contains real and fake news from two factchecking sources. For each news piece, this dataset contains the scraped news article, as well as tweets and retweets related to each news, and user profiles of the users involved in these tweet, including the user’s timeline, followers and followees, although not all this information will be used. Our work starts with an exploratory data analysis on the train set, where we highlight the main charac-teristics of the dataset. Then, we carry out a series of tests on both architectures, taking news from each set of news, with different subsets of features and with various textual representation techniques. Additionally, we perform an ablation test on the Deep Learning architecture, to understand how individual features behave and how do they complement each other. Our results clearly show that our architecture is able to capture much information from user engagements, and that including user interactions gives better results than models using only information from news articles. With this work, our main contribution is a Deep Learning architecture capable of handling varying-length sequences of engagements for each piece of news, while also extracting all the information from them without padding or truncating to fixed-size sequences. We take advantage of recent innovations in frameworks like Tensorflow to process non-tabular-shaped data, which allows to directly include unaggregated features, minimizing the preprocessing required before the input data is fed to the model. This architecture can perform complex summarizations, such as a trainable recurrent layer that takes a sequence of user engagements in the same order as they were published, and outputs a vector that summarizes the whole user engagement sequence

Centro

Facultades y escuelas::E.T.S. de Ingeniería Informática

Departamento

Inteligencia Artificial

Handle

https://hdl.handle.net/20.500.14468/14624

Colecciones

Trabajos de fin de máster (TFM)

Página completa del ítem

Publicación:
Fake News Detection using news content and user engagement

Archivos

Fecha

Autores

Editor/a

Director/a

Tutor/a

Coordinador/a

Prologuista

Revisor/a

Ilustrador/a

Derechos de acceso

Título de la revista

ISSN de la revista

Título del volumen

Editor

Proyectos de investigación

Unidades organizativas

Número de la revista

Resumen

Descripción

Categorías UNESCO

Palabras clave

Citación

Centro

Departamento

Grupo de investigación

Grupo de innovación

Programa de doctorado

Cátedra

Handle

DOI

Colecciones

Publicación: Fake News Detection using news content and user engagement

Archivos

Fecha

Autores

Editor/a

Director/a

Tutor/a

Coordinador/a

Prologuista

Revisor/a

Ilustrador/a

Derechos de acceso

Título de la revista

ISSN de la revista

Título del volumen

Editor

Proyectos de investigación

Unidades organizativas

Número de la revista

Resumen

Descripción

Categorías UNESCO

Palabras clave

Citación

Centro

Departamento

Grupo de investigación

Grupo de innovación

Programa de doctorado

Cátedra

Handle

DOI

Colecciones

Publicación:
Fake News Detection using news content and user engagement