Feature engineering for sentiment analysis in e-health forums

Carrillo de Albornoz, Jorge, Rodríguez-Vidal, Javier y Plaza, Laura . (2018) Feature engineering for sentiment analysis in e-health forums. Plos One 13 (11): e0207996

Ficheros (Some files may be inaccessible until you login with your e-spacio credentials)
Nombre Descripción Tipo MIME Size
Rodriguez_Vidal_Javier_Feature_Sentiment_Anal.pdf Rodriguez Vidal_Javier_Feature Sentiment Anal.pdf application/pdf 1.37MB

Título Feature engineering for sentiment analysis in e-health forums
Autor(es) Carrillo de Albornoz, Jorge
Rodríguez-Vidal, Javier
Plaza, Laura
Materia(s) Informática
Abstract Introduction Exploiting information in health-related social media services is of great interest for patients, researchers and medical companies. The challenge is, however, to provide easy, quick and relevant access to the vast amount of information that is available. One step towards facilitating information access to online health data is opinion mining. Even though the classification of patient opinions into positive and negative has been previously tackled, most works make use of machine learning methods and bags of words. Our first contribution is an extensive evaluation of different features, including lexical, syntactic, semantic, network-based, sentiment-based and word embeddings features to represent patient-authored texts for polarity classification. The second contribution of this work is the study of polar facts (i.e. objective information with polar connotations). Traditionally, the presence of polar facts has been neglected and research in polarity classification has been bounded to opinionated texts. We demonstrate the existence and importance of polar facts for the polarity classification of health information. Material and methods We annotate a set of more than 3500 posts to online health forums of breast cancer, crohn and different allergies, respectively. Each sentence in a post is manually labeled as “experience”, “fact” or “opinion”, and as “positive”, “negative” and “neutral”. Using this data, we train different machine learning algorithms and compare traditional bags of words representations with word embeddings in combination with lexical, syntactic, semantic, network-based and emotional properties of texts to automatically classify patient-authored contents into positive, negative and neutral. Beside, we experiment with a combination of textual and semantic representations by generating concept embeddings using the UMLS Metathesaurus. Results We reach two main results: first, we find that it is possible to predict polarity of patient-authored contents with a very high accuracy (≈ 70 percent) using word embeddings, and that this considerably outperforms more traditional representations like bags of words; and second, when dealing with medical information, negative and positive facts (i.e. objective information) are nearly as frequent as negative and positive opinions and experiences (i.e. subjective information), and their importance for polarity classification is crucial.
Editor(es) Public Library of Science
Fecha 2018-11-29
Formato application/pdf
Identificador bibliuned:DptoLSI-ETSI-GPLNyRI-Jrodriguez-0004
DOI - identifier https://doi.org/10.1371/journal.pone.0207996
ISSN - identifier 1932-6203
Nombre de la revista Plos One
Número de Volumen 13
Número de Issue 11
Publicado en la Revista Plos One 13 (11): e0207996
Idioma eng
Versión de la publicación acceptedVersion
Tipo de recurso Article
Derechos de acceso y licencia http://creativecommons.org/licenses/by-nc-nd/4.0
Tipo de acceso Acceso abierto
Notas adicionales This is an Accepted Manuscript of an article published by Public Library of Science in "Plos One 13 (11): e0207996", available at: https://doi.org/10.1371/journal.pone.0207996
Notas adicionales Este es el manuscrito aceptado del artículo publicado por Public Library of Science en "Plos One 13 (11): e0207996", disponible en línea: https://doi.org/10.1371/journal.pone.0207996

Versión Tipo de filtro
Contador de citas: Google Scholar Search Google Scholar
Estadísticas de acceso: 41 Visitas, 7 Descargas  -  Estadísticas en detalle
Creado: Fri, 26 Jan 2024, 21:39:36 CET