Publicación:
Effect of data leakage in brain MRI classification using 2D convolutional neural networks

dc.contributor.authorYagis, Ekin
dc.contributor.authorWorkalemahu Atnafu, Selamawet
dc.contributor.authorGarcía Seco de Herrera, Alba
dc.contributor.authorMarzi, Chiara
dc.contributor.authorScheda, Riccardo
dc.contributor.authorGiannelli, Marco
dc.contributor.authorTessa, Carlo
dc.contributor.authorCiti, Luca
dc.contributor.authorDiciotti, Stefano
dc.date.accessioned2025-03-27T09:16:12Z
dc.date.available2025-03-27T09:16:12Z
dc.date.issued2021-11-19
dc.descriptionLa versión registrada de este artículo, publicado por primera vez en Scientific Reports 11, n.º 1 (2021), está disponible en línea en el sitio web del editor: https://doi.org/10.1038/S41598-021-01681-W. The registered version of this article, first published in Scientific Reports 11, No. 1 (2021), is available online at the publisher's website: https://doi.org/10.1038/S41598-021-01681-W.
dc.description.abstractIn recent years, 2D convolutional neural networks (CNNs) have been extensively used to diagnose neurological diseases from magnetic resonance imaging (MRI) data due to their potential to discern subtle and intricate patterns. Despite the high performances reported in numerous studies, developing CNN models with good generalization abilities is still a challenging task due to possible data leakage introduced during cross-validation (CV). In this study, we quantitatively assessed the effect of a data leakage caused by 3D MRI data splitting based on a 2D slice-level using three 2D CNN models to classify patients with Alzheimer’s disease (AD) and Parkinson’s disease (PD). Our experiments showed that slice-level CV erroneously boosted the average slice level accuracy on the test set by 30% on Open Access Series of Imaging Studies (OASIS), 29% on Alzheimer’s Disease Neuroimaging Initiative (ADNI), 48% on Parkinson’s Progression Markers Initiative (PPMI) and 55% on a local de-novo PD Versilia dataset. Further tests on a randomly labeled OASIS-derived dataset produced about 96% of (erroneous) accuracy (slice-level split) and 50% accuracy (subject-level split), as expected from a randomized experiment. Overall, the extent of the effect of an erroneous slice-based CV is severe, especially for small datasets.en
dc.description.versionversión publicada
dc.identifier.citationYagis, E., Atnafu, S.W., García Seco de Herrera, A. et al. Effect of data leakage in brain MRI classification using 2D convolutional neural networks. Sci Rep 11, 22544 (2021). https://doi.org/10.1038/s41598-021-01681-w
dc.identifier.doihttps://doi.org/10.1038/s41598-021-01681-w
dc.identifier.issn2045-2322
dc.identifier.urihttps://hdl.handle.net/20.500.14468/26373
dc.journal.issue1
dc.journal.titleScientific Reports
dc.journal.volume11
dc.language.isoen
dc.publisherNature Research
dc.relation.centerE.T.S. de Ingeniería Informática
dc.relation.departmentLenguajes y Sistemas Informáticos
dc.rightsinfo:eu-repo/semantics/openAccess
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/deed.es
dc.subject12 Matemáticas::1203 Ciencia de los ordenadores ::1203.17 Informática
dc.titleEffect of data leakage in brain MRI classification using 2D convolutional neural networksen
dc.typeartículoes
dc.typejournal articleen
dspace.entity.typePublication
relation.isAuthorOfPublication33e1cf81-6a46-4cc6-828f-1c0f2a7e7497
relation.isAuthorOfPublication.latestForDiscovery33e1cf81-6a46-4cc6-828f-1c0f2a7e7497
Archivos
Bloque original
Mostrando 1 - 1 de 1
Cargando...
Miniatura
Nombre:
GarciaSecoDeHerrera_Alba_DataLeakage.pdf
Tamaño:
2.11 MB
Formato:
Adobe Portable Document Format
Bloque de licencias
Mostrando 1 - 1 de 1
No hay miniatura disponible
Nombre:
license.txt
Tamaño:
3.62 KB
Formato:
Item-specific license agreed to upon submission
Descripción: