Publicación: Un Método para la Detección de Controversia en Textos y su Aplicación al Caso de Comentarios sobre Fármacos en Foros de Salud
Fecha
2020-09-01
Autores
Editor/a
Director/a
Tutor/a
Coordinador/a
Prologuista
Revisor/a
Ilustrador/a
Derechos de acceso
info:eu-repo/semantics/openAccess
Título de la revista
ISSN de la revista
Título del volumen
Editor
Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Lenguajes y Sistemas Informáticos
Resumen
La controversia, como fenomeno social y ling ustico, consiste en la discusion o debate reiterado de individuos con posiciones enfrentadas. En la actualidad, goza de una especial visibilidad gracias a las condiciones idoneas de una sociedad hiperconectada, que han permitido registrar y potenciar la interaccion de usuarios online, a menudo anonima, as como la creacion y consumo de contenido nunca antes visto. Analizar las propiedades y caractersticas propias de este fenomeno puede permitirnos extraer diferentes insights sobre el tema que es objeto de controversia: un mejor entendimiento del porque de su controversia, su percepcion en la comunidad, si el fenomeno de controversia es equivalente para diferentes dominios y facilitar el desarrollo de herramientas que mejoren el acceso y consumo de la informacion para los usuarios, entre otros aspectos de interes. Sin embargo, debido a su sutileza y dependencia del contexto, su denicion y deteccion es aun un paradigma sin resolver. En este trabajo se ha realizado un estudio del problema de la deteccion de controversia en textos, identicando cuales son los desafos de las metodologas existentes en el estado del arte para este problema. Entre estos desafos, encontramos una falta de denicion explcita y ampliamente aceptada y aplicada, as como una metodologa para su deteccion acordemente amplia e independiente del dominio y caso de uso. Para afrontar dichos desafos, hemos desarrollado una propuesta para una denicion amplia de controversia, independiente del dominio, y una aproximaci on tecnica para su deteccion, ademas de su implementacion y evaluacion en un caso de estudio concreto: el de comentarios de usuarios en foros del ambito medico (corpus Drug Review Dataset). Dicha propuesta se ha basado, por un lado, en la novedosa aplicacion formal de deteccion de argumentacion como base para la deteccion de controversia, y por otro lado, incluyendo otros aspectos presentes en el estado del arte, como son la formacion de grupos de opinion y la confrontacion de dichos grupos respecto al tema de controversia. Se ha desarrollado un sistema modular de deteccion basado en dicha denicion, consistente en un detector de argumentos, un componente de clustering de argumentos, un clasicador de polaridad y un estimador de controversia, de propuesta propia. Para dicho componente, se han conseguido resultados de clasicacion de argumentos que superan los encontrados en el estado del arte para el mismo problema y conguracion. Finalmente, hemos evaluado el caso particular Drug Review Dataset, comparando los resultados con una anotacion manual para el mismo dataset, llevada a cabo por tres anotadores diferentes. Los resultados obtenidos son prometedores, detectando la controversia correctamente en sus extremos y aportando una serie de detalles para su explicabilidad.
Controversy, as a social and linguistic phenomenon, consists of repeated discussion or debate by individuals with opposing positions. Nowadays, it is highly visible thanks to the ideal conditions of a hyperconnected society. This has allowed to record and increase user interaction online, often anonymous, as well as the creation and consumption of content never before seen. Analyzing the properties and characteristics of this phenomenon can allow us to extract dierent insights on the subject that is the subject of controversy: a better understanding of why it is controversial, its perception in the community, if the phenomenon of controversy is equivalent for dierent domains and facilitate the development of tools that improve access and consumption of information for users, among other aspects of interest. However, due to its subtlety and dependence on the context, its denition and detection is still an unresolved paradigm. In this work, we study the problem of controversy detection in texts, identifying what are the challenges of the existing methodologies in the state of art for this problem. Among these challenges, we nd a lack of explicit and widely accepted and applied denition, as well as a methodology for detection that is consistently broad and independent of the domain and case of use. To address these challenges, we have developed a proposal for a broad, domain-independent denition of controversy and a technical approach for its detection, as well as its implementation and evaluation in a specic case study: user comments in medical forums (corpus Drug Review Dataset). This proposal is based, rstly, on the novel formal application of argument detection as a basis for the detection of controversy, and secondly, on including other aspects present in the state of art, such as the formation of opinion groups and the confrontation of these groups with respect to the subject of controversy. A modular detection system has been developed based on this denition, consisting of an argument-detector, a component for argument clustering, a polarity classier and controversy estimator proposed in this work. For this component, obtained argument classication results exceed those found in the state of art for the same problem and conguration. Finally, we have evaluated the particular case of Drug Review Dataset, comparing the results with a manual annotation for the same dataset, carried out by three dierent evaluators. The results obtained are promising, detecting the controversy correctly in its extremes and providing a series of details for its explanation.
Controversy, as a social and linguistic phenomenon, consists of repeated discussion or debate by individuals with opposing positions. Nowadays, it is highly visible thanks to the ideal conditions of a hyperconnected society. This has allowed to record and increase user interaction online, often anonymous, as well as the creation and consumption of content never before seen. Analyzing the properties and characteristics of this phenomenon can allow us to extract dierent insights on the subject that is the subject of controversy: a better understanding of why it is controversial, its perception in the community, if the phenomenon of controversy is equivalent for dierent domains and facilitate the development of tools that improve access and consumption of information for users, among other aspects of interest. However, due to its subtlety and dependence on the context, its denition and detection is still an unresolved paradigm. In this work, we study the problem of controversy detection in texts, identifying what are the challenges of the existing methodologies in the state of art for this problem. Among these challenges, we nd a lack of explicit and widely accepted and applied denition, as well as a methodology for detection that is consistently broad and independent of the domain and case of use. To address these challenges, we have developed a proposal for a broad, domain-independent denition of controversy and a technical approach for its detection, as well as its implementation and evaluation in a specic case study: user comments in medical forums (corpus Drug Review Dataset). This proposal is based, rstly, on the novel formal application of argument detection as a basis for the detection of controversy, and secondly, on including other aspects present in the state of art, such as the formation of opinion groups and the confrontation of these groups with respect to the subject of controversy. A modular detection system has been developed based on this denition, consisting of an argument-detector, a component for argument clustering, a polarity classier and controversy estimator proposed in this work. For this component, obtained argument classication results exceed those found in the state of art for the same problem and conguration. Finally, we have evaluated the particular case of Drug Review Dataset, comparing the results with a manual annotation for the same dataset, carried out by three dierent evaluators. The results obtained are promising, detecting the controversy correctly in its extremes and providing a series of details for its explanation.
Descripción
Categorías UNESCO
Palabras clave
Citación
Centro
Facultades y escuelas::E.T.S. de Ingeniería Informática
Departamento
Lenguajes y Sistemas Informáticos