Publicación: Reconocimiento de entidades y extracción de relaciones en texto biomédico
Cargando...
Fecha
2023
Autores
Editor/a
Director/a
Tutor/a
Coordinador/a
Prologuista
Revisor/a
Ilustrador/a
Derechos de acceso
Atribución-NoComercial-SinDerivadas 4.0 Internacional
info:eu-repo/semantics/openAccess
info:eu-repo/semantics/openAccess
Título de la revista
ISSN de la revista
Título del volumen
Editor
Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Lenguajes y Sistemas Informáticos
Resumen
El análisis de documentos médicos supone un gran reto a día de hoy. Existe mucha información desestructurada, que difícilmente puede ser analizada. Es por esto, que una de las tareas de Procesamiento del Lenguaje Natural es poder ser capaces de extraer las Entidades Nombradas en los textos, así como también ciertas relaciones que existen entre éstas, lo que facilita en gran medida el posterior análisis de los datos biomédicos. Para realizar estas tareas existen modelos avanzados, basados en modelos de lenguaje entrenados con grandes cantidades de datos. Lo que se propone en este trabajo, es utilizar modelos transformers junto con modelos de aprendizaje automático como SVM o Redes Neuronales para realizar la tarea de extracción de entidades, ya sea utilizando modelos previamente entrenados en grandes cantidades de texto biomédico en español, o bien realizando el entrenamiento del modelo a partir de estos modelos previamente entrenados. Esta última aproximación ha conseguido superar los resultados de otros sistemas del estado del arte para la tarea del reconocimiento de entidades. Respecto a la tarea de extracci ón de relaciones, se han utilizado también estos modelos transformers entrenados con texto biomédico en español junto con modelos de Redes Neuronales, además de utilizar técnicas de aumento de datos como SMOTE-NC y reducción de dimensionalidad como LDA, lo que ha dado como resultado sistemas comparables a los del estado del arte.
The analysis of medical documents is a considerable challenge nowadays. There is a lot of unstructured information. For this reason one of the tasks of Natural Language Processing is to be able to extract the Named Entities in the texts, as well as certain relationships that exist between them, which greatly facilitates the subsequent analysis of biomedical data. To perform these tasks, advanced models are available, based on language models trained on large amounts of data. In this work, we propose to use transformers models together with machine learning models such as SVM or Neural Networks to perform the task of entity extraction, either by using models previously trained on large amounts of biomedical text in Spanish, or by training the model from these previously trained models, which has managed to overcome the results of other state-of-the-art systems for the task of entity recognition. Regarding the relation extraction task, we have also used these F1Score models trained with biomedical text in Spanish together with Neural Network models, in addition to using data augmentation techniques such as SMOTE-NC and dimensionality reduction such as LDA, which has resulted in comparable state-of-the-art systems.
The analysis of medical documents is a considerable challenge nowadays. There is a lot of unstructured information. For this reason one of the tasks of Natural Language Processing is to be able to extract the Named Entities in the texts, as well as certain relationships that exist between them, which greatly facilitates the subsequent analysis of biomedical data. To perform these tasks, advanced models are available, based on language models trained on large amounts of data. In this work, we propose to use transformers models together with machine learning models such as SVM or Neural Networks to perform the task of entity extraction, either by using models previously trained on large amounts of biomedical text in Spanish, or by training the model from these previously trained models, which has managed to overcome the results of other state-of-the-art systems for the task of entity recognition. Regarding the relation extraction task, we have also used these F1Score models trained with biomedical text in Spanish together with Neural Network models, in addition to using data augmentation techniques such as SMOTE-NC and dimensionality reduction such as LDA, which has resulted in comparable state-of-the-art systems.
Descripción
Categorías UNESCO
Palabras clave
Citación
Centro
Facultades y escuelas::E.T.S. de Ingeniería Informática
Departamento
Lenguajes y Sistemas Informáticos