Persona:
Martínez Unanue, Raquel

Cargando...
Foto de perfil
Dirección de correo electrónico
ORCID
0000-0003-1838-632X
Fecha de nacimiento
Proyectos de investigación
Unidades organizativas
Puesto de trabajo
Apellidos
Martínez Unanue
Nombre de pila
Raquel
Nombre

Resultados de la búsqueda

Mostrando 1 - 3 de 3
  • Publicación
    A data driven approach for person name disambiguation in web search results
    (2014-08-23) Víctor Fresno, Víctor; Montalvo, Soto; Delgado Muñoz, Agustín Daniel; Martínez Unanue, Raquel
    This paper presents an unsupervised approach for the task of clustering the results of a search engine when the query is a person name shared by different individuals. We propose an algorithm that calculates the number of clusters and establishes the groups of web pages according to the different individuals without the need to any training data or predefined thresholds, as the successful state of the art systems do. In addition, most of those systems do not deal with social media web pages and their performance could fail in a real scenario. In this paper we also propose a heuristic method for the treatment of social networking profiles. Our approach is compared with four gold standard collections for this task obtaining really competitive results, comparable to those obtained by some approaches with supervision.
  • Publicación
    Early diagnosis of HIV cases by means of text mining and machine learning models on clinical notes
    (ELSEVIER, 2024) Morales Sánchez, Rodrigo; Montalvo Herranz, Soto; Riaño Martínez, Adrián; Martínez Unanue, Raquel; Velasco Arribas, Maria; https://orcid.org/0000-0001-8158-7939; https://orcid.org/0009-0004-8755-255X; https://orcid.org/0000-0001-6554-2095
    Undiagnosed and untreated human immunodeficiency virus (HIV) infection increases morbidity in the HIV-positive person and allows onward transmission of the virus. Minimizing missed opportunities for HIV diagnosis when a patient visits a healthcare facility is essential in restraining the epidemic and working toward its eventual elimination. Most state-of-the-art proposals employ machine learning (ML) methods and structured data to enhance HIV diagnoses, however, there is a dearth of recent proposals utilizing unstructured textual data from Electronic Health Records (EHRs). In this work, we propose to use only the unstructured text of the clinical notes as evidence for the classification of patients as suspected or not suspected. For this purpose, we first compile a dataset of real clinical notes from a hospital with patients classified as suspects and non-suspects of having HIV. Then, we evaluate the effectiveness of two types of classification models to identify patients suspected of being infected with the virus: classical ML algorithms and two Large Language Models (LLMs) from the biomedical domain in Spanish. The results show that both LLMs outperform classical ML algorithms in the two settings we explore: one dataset version is balanced, containing an equal number of suspicious and non-suspicious patients, while the other reflects the real distribution of patients in the hospital, being unbalanced. We obtain F score figures of 94.7 with both LLMs in the unbalanced setting, while in the balance one, RoBERTa model outperforms the other one with a F score of 95.7. The findings indicate that leveraging unstructured text with LLMs in the biomedical domain yields promising outcomes in diminishing missed opportunities for HIV diagnosis. A tool based on our system could assist a doctor in deciding whether a patient in consultation should undergo a serological test.
  • Publicación
    Improving Medical Entity Recognition in Spanish by Means of Biomedical Language Models
    (MDPI, 2023) Villaplana, Aitana; Martínez Unanue, Raquel; Montalvo Herranz, Soto; https://orcid.org/0000-0001-8158-7939
    Named Entity Recognition (NER) is an important task used to extract relevant information from biomedical texts. Recently, pre-trained language models have made great progress in this task, particularly in English language. However, the performance of pre-trained models in the Spanish biomedical domain has not been evaluated in an experimentation framework designed specifically for the task. We present an approach for named entity recognition in Spanish medical texts that makes use of pre-trained models from the Spanish biomedical domain. We also use data augmentation techniques to improve the identification of less frequent entities in the dataset. The domain-specific models have improved the recognition of name entities in the domain, beating all the systems that were evaluated in the eHealth-KD challenge 2021. Language models from the biomedical domain seem to be more effective in characterizing the specific terminology involved in this task of named entity recognition, where most entities correspond to the "concept" type involving a great number of medical concepts. Regarding data augmentation, only back translation has slightly improved the results. Clearly, the most frequent types of entities in the dataset are better identified. Although the domain-specific language models have outperformed most of the other models, the multilingual generalist model mBERT obtained competitive results.