Publicación:
Early diagnosis of HIV cases by means of text mining and machine learning models on clinical notes

dc.contributor.authorMorales Sánchez, Rodrigo
dc.contributor.authorMontalvo Herranz, Soto
dc.contributor.authorRiaño Martínez, Adrián
dc.contributor.authorMartínez Unanue, Raquel
dc.contributor.authorVelasco Arribas, Maria
dc.contributor.orcidhttps://orcid.org/0000-0001-8158-7939
dc.contributor.orcidhttps://orcid.org/0009-0004-8755-255X
dc.contributor.orcidhttps://orcid.org/0000-0001-6554-2095
dc.date.accessioned2025-02-06T12:25:30Z
dc.date.available2025-02-06T12:25:30Z
dc.date.issued2024
dc.descriptionThe registered version of this article, first published in “Computers in Biology and Medicine, Volume 179, 2024", is available online at the publisher's website: Elsevier, https://doi.org/10.1016/j.compbiomed.2024.108830 La versión registrada de este artículo, publicado por primera vez en “Computers in Biology and Medicine, Volume 179, 2024", está disponible en línea en el sitio web del editor: Elsevier, https://doi.org/10.1016/j.compbiomed.2024.108830
dc.description.abstractUndiagnosed and untreated human immunodeficiency virus (HIV) infection increases morbidity in the HIV-positive person and allows onward transmission of the virus. Minimizing missed opportunities for HIV diagnosis when a patient visits a healthcare facility is essential in restraining the epidemic and working toward its eventual elimination. Most state-of-the-art proposals employ machine learning (ML) methods and structured data to enhance HIV diagnoses, however, there is a dearth of recent proposals utilizing unstructured textual data from Electronic Health Records (EHRs). In this work, we propose to use only the unstructured text of the clinical notes as evidence for the classification of patients as suspected or not suspected. For this purpose, we first compile a dataset of real clinical notes from a hospital with patients classified as suspects and non-suspects of having HIV. Then, we evaluate the effectiveness of two types of classification models to identify patients suspected of being infected with the virus: classical ML algorithms and two Large Language Models (LLMs) from the biomedical domain in Spanish. The results show that both LLMs outperform classical ML algorithms in the two settings we explore: one dataset version is balanced, containing an equal number of suspicious and non-suspicious patients, while the other reflects the real distribution of patients in the hospital, being unbalanced. We obtain F score figures of 94.7 with both LLMs in the unbalanced setting, while in the balance one, RoBERTa model outperforms the other one with a F score of 95.7. The findings indicate that leveraging unstructured text with LLMs in the biomedical domain yields promising outcomes in diminishing missed opportunities for HIV diagnosis. A tool based on our system could assist a doctor in deciding whether a patient in consultation should undergo a serological test.en
dc.description.versionversión publicada
dc.identifier.citationRodrigo Morales-Sánchez, Soto Montalvo, Adrián Riaño, Raquel Martínez, María Velasco, Early diagnosis of HIV cases by means of text mining and machine learning models on clinical notes, Computers in Biology and Medicine, Volume 179, 2024, 108830, ISSN 0010-4825, https://doi.org/10.1016/j.compbiomed.2024.108830
dc.identifier.doihttps://doi.org/10.1016/j.compbiomed.2024.108830
dc.identifier.issn0010-4825
dc.identifier.urihttps://hdl.handle.net/20.500.14468/25841
dc.journal.titleComputers in Biology and Medicine
dc.journal.volume179
dc.language.isoen
dc.page.initial108830
dc.publisherELSEVIER
dc.relation.centerFacultades y escuelas::E.T.S. de Ingeniería Informática
dc.relation.departmentLenguajes y Sistemas Informáticos
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/deed.es
dc.subject12 Matemáticas::1203 Ciencia de los ordenadores ::1203.17 Informática
dc.subject.keywordsHIVen
dc.subject.keywordsText miningen
dc.subject.keywordsAutomated screeningen
dc.subject.keywordsElectronic Health Records (EHRs)en
dc.subject.keywordsLarge Language Models (LLMs)en
dc.subject.keywordsMachine Learning (ML)en
dc.titleEarly diagnosis of HIV cases by means of text mining and machine learning models on clinical notesen
dc.typeartículoes
dc.typejournal articleen
dspace.entity.typePublication
relation.isAuthorOfPublication6592a98f-f932-49bb-a7c4-1061510a3da7
relation.isAuthorOfPublication085ba044-ea75-4751-ab01-512f39c160a7
relation.isAuthorOfPublication.latestForDiscovery6592a98f-f932-49bb-a7c4-1061510a3da7
Archivos
Bloque original
Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
MoralesSanchez_Rodrigo_EarlyDiagnosisHIV.pdf
Tamaño:
1.8 MB
Formato:
Adobe Portable Document Format
Bloque de licencias
Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
3.62 KB
Formato:
Item-specific license agreed to upon submission
Descripción: