Improving Medical Entity Recognition in Spanish by Means of Biomedical Language Models

Villaplana Moreno, Aitana; Martínez Unanue, Raquel; Montalvo Herranz, Soto

Fecha

2023

Derechos de acceso

info:eu-repo/semantics/openAccess

Editorial

MDPI

Citas

0 citas en

3 citas en

Resumen

Named Entity Recognition (NER) is an important task used to extract relevant information from biomedical texts. Recently, pre-trained language models have made great progress in this task, particularly in English language. However, the performance of pre-trained models in the Spanish biomedical domain has not been evaluated in an experimentation framework designed specifically for the task. We present an approach for named entity recognition in Spanish medical texts that makes use of pre-trained models from the Spanish biomedical domain. We also use data augmentation techniques to improve the identification of less frequent entities in the dataset. The domain-specific models have improved the recognition of name entities in the domain, beating all the systems that were evaluated in the eHealth-KD challenge 2021. Language models from the biomedical domain seem to be more effective in characterizing the specific terminology involved in this task of named entity recognition, where most entities correspond to the "concept" type involving a great number of medical concepts. Regarding data augmentation, only back translation has slightly improved the results. Clearly, the most frequent types of entities in the dataset are better identified. Although the domain-specific language models have outperformed most of the other models, the multilingual generalist model mBERT obtained competitive results.

Descripción

The registered version of this article, first published in “Electronics, 12, 2023", is available online at the publisher's website: MDPI, https://doi.org/10.3390/electronics12234872 La versión registrada de este artículo, publicado por primera vez en “Electronics, 12, 2023", está disponible en línea en el sitio web del editor: MDPI, https://doi.org/10.3390/electronics12234872

Palabras clave

biomedical natural language processing, Spanish biomedical entity recognition, pre-trained language models, data augmentation

Citación

Villaplana, A., Martínez, R., & Montalvo, S. (2023). Improving Medical Entity Recognition in Spanish by Means of Biomedical Language Models. Electronics, 12(23), 4872. https://doi.org/10.3390/electronics12234872

Centro

E.T.S. de Ingeniería Informática

Departamento

Lenguajes y Sistemas Informáticos

Fecha

Editor/a

Director/a

Tutor/a

Coordinador/a

Prologuista

Revisor/a

Ilustrador/a

Derechos de acceso

Título de la revista

ISSN de la revista

Título del volumen

Editorial

Citas

Proyectos de investigación

Unidades organizativas

Número de la revista

Resumen

Descripción

Categorías UNESCO

Palabras clave

Citación

Centro

Departamento

Grupo de investigación

Grupo de innovación

Programa de doctorado

Cátedra

Datos de investigación relacionados

Handle

DOI

Colecciones