Publicación: Combinación de LLMs basados en Transformers con información socio-demográfica para detectar contenido sexista en redes sociales
Cargando...
Fecha
2024-02
Autores
Editor/a
Director/a
Tutor/a
Coordinador/a
Prologuista
Revisor/a
Ilustrador/a
Derechos de acceso
Atribución-NoComercial-SinDerivadas 4.0 Internacional
info:eu-repo/semantics/openAccess
info:eu-repo/semantics/openAccess
Título de la revista
ISSN de la revista
Título del volumen
Editor
Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Lenguajes y Sistemas Informáticos
Resumen
Este trabajo se desarrolla en el marco de la edición de 2023 de EXIST, que consta de una serie de encuentros científicos y desafíos colaborativos destinados a la identificación del sexismo en plataformas de redes sociales. Su propósito abarca desde la detección de misoginia manifiesta hasta la identificación de comportamientos sexistas, sutiles y tácitos. La tercera entrega de este desafío conjunto se realizará como parte de un laboratorio en la conferencia CLEF 2023. En esta edición de EXIST, además del tema central de identificación de sexismo, las tareas se abordan desde la perspectiva de aprendizaje con desacuerdos (learning with disagreements), donde cada instancia del conjunto de datos aportado, está asociada a seis etiquetas, las cuales se derivan de las anotaciones proporcionadas por anotadores pertenecientes a seis cohortes distintas (en función de género y edad). A lo largo de este trabajo, se repasan tanto el estado del arte en torno a la detección toxicidad en internet de forma general, y más concretamente en torno a la identificación de sexismo, y se repasan las estrategias más comunes en cuanto al tratamiento de desacuerdo entre anotadores. Tras este análisis inicial, se plantean tres propuestas para la tarea 1 y otras 3 para la tarea 2, donde se alcanza la segunda posición en la métrica soft-soft en el contexto monolingüe español y la tercera posición en el contexto bilingüe. Además, la propuesta realizada, es la única en plantear un sistema basado en la información socio-demográfica de los anotadores, creando un modelo para cada cohorte para calcular la distribución final de probabilidades. Todo ello se recoge en un articulo científico que es enviado a la competición. Finalmente, se extraen las conclusiones de los resultados obtenidos y se proponen cuáles podrían ser las siguientes líneas futuras de investigación tanto para la detección de sexismo como para la gestión de tareas con desacuerdo.
This work is developed within the framework of the 2023 edition of EXIST, which consists of a series of scientific gatherings and collaborative challenges aimed at identifying sexism on social media platforms. Its purpose spans from detecting overt misogyny to identifying subtle and implicit sexist behaviors. The third installment of this joint challenge will be conducted as part of a laboratory at the CLEF 2023 conference. In this edition of EXIST, in addition to the central theme of sexism identification, tasks are approached from the perspective of learning with disagreements, where each instance of the provided dataset is associated with six labels derived from annotations provided by annotators belonging to six different cohorts (based on gender and age). Throughout this work, the state-of-the-art in detecting internet toxicity in general, and more specifically in identifying sexism, is reviewed, as well as the most common strategies for handling annotator disagreements. Following this initial analysis, three proposals are presented for task 1 and another 3 for task 2, where the second position is achieved in the soft-soft metric in the Spanish monolingual context and the third position in the bilingual context. Additionally, the proposal made is the only one to propose a system based on the socio-demographic information of the annotators, creating a model for each cohort to calculate the final probability distribution. All this is documented in a scientific paper that is submitted to the competition. Finally, conclusions are drawn from the obtained results, and suggestions are made for potential future research directions for both sexism detection and managing tasks with disagreements.
This work is developed within the framework of the 2023 edition of EXIST, which consists of a series of scientific gatherings and collaborative challenges aimed at identifying sexism on social media platforms. Its purpose spans from detecting overt misogyny to identifying subtle and implicit sexist behaviors. The third installment of this joint challenge will be conducted as part of a laboratory at the CLEF 2023 conference. In this edition of EXIST, in addition to the central theme of sexism identification, tasks are approached from the perspective of learning with disagreements, where each instance of the provided dataset is associated with six labels derived from annotations provided by annotators belonging to six different cohorts (based on gender and age). Throughout this work, the state-of-the-art in detecting internet toxicity in general, and more specifically in identifying sexism, is reviewed, as well as the most common strategies for handling annotator disagreements. Following this initial analysis, three proposals are presented for task 1 and another 3 for task 2, where the second position is achieved in the soft-soft metric in the Spanish monolingual context and the third position in the bilingual context. Additionally, the proposal made is the only one to propose a system based on the socio-demographic information of the annotators, creating a model for each cohort to calculate the final probability distribution. All this is documented in a scientific paper that is submitted to the competition. Finally, conclusions are drawn from the obtained results, and suggestions are made for potential future research directions for both sexism detection and managing tasks with disagreements.
Descripción
Categorías UNESCO
Palabras clave
Citación
Centro
Facultades y escuelas::E.T.S. de Ingeniería Informática
Departamento
Lenguajes y Sistemas Informáticos