Publicación:
Fake News Detection using news content and user engagement

Cargando...
Miniatura
Fecha
2021-07-01
Editor/a
Tutor/a
Coordinador/a
Prologuista
Revisor/a
Ilustrador/a
Derechos de acceso
info:eu-repo/semantics/openAccess
Título de la revista
ISSN de la revista
Título del volumen
Editor
Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Inteligencia Artificial
Proyectos de investigación
Unidades organizativas
Número de la revista
Resumen
Fake news are purposefully designed to be misleading, and their success depends mostly on their readers [ ¨OG17]. Due to the nature of social networks, fake news can be quickly propagated, potentially causing a great damage to the society. Moreover, sociological phenomena like echo chambers or polarization [Sil+16], and psychological factors like confirmation bias [Del+16] or overconfidence to be fooled by fake news 1, create the perfect playground for misinformation [Tac +17] [Del+18]. Platforms have recently adopted measures 2 to discourage users sharing content without reading it first, but these measures are still not fully enforced and easy to bypass. An automatic fake news detection system that blocks or warns users about possibly misleading information will be needed in the near future, espe-cially with the high volume of information that is shared through these websites. Several models have been proposed to detect fake news by analyzing linguistic features [HA17] [Pot +17][BS19] [KGN21], but these are often not enough [Shu+18] to distinguish fake news from real ones. Research is now focusing on including user engagement information to existing contentbased models [RSL17] [Del+18] [SML19]. Some systems have been proposed that only use information from these engagements [Tac+17]. Part of the research has focused on creating training datasets [SW18] [Shu+18] for this task, crawling news from fact-checking websites and fetching user engagements using the public APIs offered by social networks. In this work, we develop and test different fake news detection systems using information from news articles and user engagements in social networks. Two different architectures are used. The main one is based on Deep Learning, and can process news content and user engagements. The second one is based on well-known algorithms like logistic regression, SVM, random forest, LightGBM or XGBoost; it can only process news content and is used as a performance baseline for our Deep Learning models. We use the FakeNewsNet dataset [Shu+18], which contains real and fake news from two factchecking sources. For each news piece, this dataset contains the scraped news article, as well as tweets and retweets related to each news, and user profiles of the users involved in these tweet, including the user’s timeline, followers and followees, although not all this information will be used. Our work starts with an exploratory data analysis on the train set, where we highlight the main charac-teristics of the dataset. Then, we carry out a series of tests on both architectures, taking news from each set of news, with different subsets of features and with various textual representation techniques. Additionally, we perform an ablation test on the Deep Learning architecture, to understand how individual features behave and how do they complement each other. Our results clearly show that our architecture is able to capture much information from user engagements, and that including user interactions gives better results than models using only information from news articles. With this work, our main contribution is a Deep Learning architecture capable of handling varying-length sequences of engagements for each piece of news, while also extracting all the information from them without padding or truncating to fixed-size sequences. We take advantage of recent innovations in frameworks like Tensorflow to process non-tabular-shaped data, which allows to directly include unaggregated features, minimizing the preprocessing required before the input data is fed to the model. This architecture can perform complex summarizations, such as a trainable recurrent layer that takes a sequence of user engagements in the same order as they were published, and outputs a vector that summarizes the whole user engagement sequence
Descripción
Categorías UNESCO
Palabras clave
Citación
Centro
Facultades y escuelas::E.T.S. de Ingeniería Informática
Departamento
Inteligencia Artificial
Grupo de investigación
Grupo de innovación
Programa de doctorado
Cátedra
DOI