Publicación:
ICD-10 Coding of Spanish Electronic Discharge Summaries: An Extreme Classification Problem

dc.contributor.authorAlmagro, Mario
dc.contributor.authorMartínez Unanue, Raquel
dc.contributor.authorFresno Fernández, Víctor Diego
dc.contributor.authorMontalvo, Soto
dc.date.accessioned2025-12-02T13:59:40Z
dc.date.available2025-12-02T13:59:40Z
dc.date.issued2020-06-08
dc.descriptionThe registered version of this article, first published in “IEEE Access, 8, (2020), 100073-100083", is available online at the publisher's website: Institute of Electrical and Electronics Engineers, DOI: 10.1109/ACCESS.2020.2997241
dc.descriptionLa versión registrada de este artículo, publicado por primera vez en “IEEE Access, 8, (2020), 100073-100083", está disponible en línea en el sitio web del editor: Institute of Electrical and Electronics Engineers, DOI:10.1109/ACCESS.2020.2997241
dc.description.abstractMedical coding is used to identify and standardize clinical concepts in the records collected from healthcare services. The tenth revision of the International Classification of Diseases (ICD-10) is the most widely-used coding with more than 11,000 different diagnoses, affecting research, reporting, and funding. Unfortunately, ICD-10 code sets tend to follow biased, unbalanced, and scattered distributions. These distribution attributes, along with high lexical variability, severely restrict performance when coded clinical records are used to infer code sets in uncoded records. To improve that inference, we explore a combination of example-based methods optimized to capture codes with different appearance frequencies in data sets. Materials and Methods: The proposed exploration has been carried out on Spanish hospital discharge reports coded by experts, excluding all sentences without any biomedical concept. Representations based on semantic and lexical features are explored, using both global and label-specific attributes. In turn, algorithms based on binary outputs, groups of subsets and extreme classification are compared. Lists of codes together with their confidence values (certainty probabilities) are suggested by each method. Results: Diverse spectral behaviors are shown for each method. Binary classifiers seem to maximize the capture of more popular codes, while extreme classifiers promote infrequent ones. In order to exploit such differences, ensemble approaches are proposed by weighting every output code according to the method, confidence value and appearance frequency. The rule-based combination reaches a 46% Precision at 10 ( P@10 ), which means a 15% improvement over the best individual proposal. Conclusion: Assembling methods based on weighting each code according to training frequency and performance can achieve better overall Precision scores on extreme distributions, such as ICD-10 coding.en
dc.description.versionversión publicada
dc.identifier.citationM. Almagro, R. M. Unanue, V. Fresno and S. Montalvo, "ICD-10 Coding of Spanish Electronic Discharge Summaries: An Extreme Classification Problem," in IEEE Access, vol. 8, pp. 100073-100083, 2020, doi: 10.1109/ACCESS.2020.2997241.
dc.identifier.doiDOI: 10.1109/ACCESS.2020.2997241
dc.identifier.issn2169-3536
dc.identifier.urihttps://hdl.handle.net/20.500.14468/30982
dc.journal.titleIEEE Access
dc.journal.volume8
dc.language.isoen
dc.page.final100083
dc.page.initial100073
dc.publisherInstitute of Electrical and Electronics Engineers
dc.relation.centerE.T.S. de Ingeniería Informática
dc.relation.departmentLenguajes y Sistemas Informáticos
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.es
dc.subject1203.17 Informática
dc.subject1203.18 Sistemas de información, diseño y componentes
dc.subject3304.99 Otras (especificar)
dc.subject.keywordsExtreme classificationen
dc.subject.keywordsXMTCen
dc.subject.keywordsICD-10 codingen
dc.subject.keywordsText miningen
dc.titleICD-10 Coding of Spanish Electronic Discharge Summaries: An Extreme Classification Problemen
dc.typeartículoes
dc.typejournal articleen
dspace.entity.typePublication
relation.isAuthorOfPublication085ba044-ea75-4751-ab01-512f39c160a7
relation.isAuthorOfPublication80cd3492-0ff8-4c8e-a904-2858623c7fc1
relation.isAuthorOfPublication.latestForDiscovery085ba044-ea75-4751-ab01-512f39c160a7
Archivos
Bloque original
Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
(Almagro et al, 2022) ICD-10_Coding_of_Spanis_VICTOR DIEGO FRESNO.pdf
Tamaño:
7.29 MB
Formato:
Adobe Portable Document Format
Bloque de licencias
Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
3.62 KB
Formato:
Item-specific license agreed to upon submission
Descripción: