Publicación:
Thermographic Breast Cancer Detection. Deep Learning with a Small Dataset

Fecha
2020-03-06
Editor/a
Tutor/a
Coordinador/a
Prologuista
Revisor/a
Ilustrador/a
Derechos de acceso
info:eu-repo/semantics/openAccess
Título de la revista
ISSN de la revista
Título del volumen
Editor
Universidad Nacional de Educación a Distancia (España). Escuela Técnica Superior de Ingeniería Informática. Departamento de Inteligencia Artificial.
Proyectos de investigación
Unidades organizativas
Número de la revista
Resumen
According to the World Health Organization (WHO), breast carcinoma is the cancer with highest prevalence among women, with 2.1 million new diagnoses every year. Given the risk of death associated to the metastasis during the late stages of the cancer, early detection is the optimal strategy to reduce the risk of death. Among the numerous tests that can be used in the breast cancer screening, thermography represents a non-invasive, painless, and free of ionizing radiation. The research group within which I have done this research is interested in applying artificial intelligence to analyzing thermographic images for breast cancer screening. Given that the project that this group intends to carry out in collaboration with HM Hospitales has not yet begun, we have used in this master thesis the Database for Mastology Research (DMR) developed at the Visual Lab of the Universidade Federal Fluminense, in Brazil, which is the only dataset of breast thermograms publicly available. It contains 216 patients, with up to 25 image per patient. It has been studied in dozens of research works, most of them using statistical feature extraction and machine learning algorithms for classification. Unfortunately this database has important flaws, such as two different patient having exactly the same image (pixel by pixel), which have not been mentioned in previous works. For this reason we have devoted a significant effort to cleaning the dataset, which reduced it to only 188 images. We have then tried several deep learning models for image classification. We first built from scratch several Convolutional Neural Networks (CNNs), each consisting of n pairs of convolutional-maxpool layers, a flatten layer, and n dense layers, for different values of n. All the CNNs gave poor results: the highest accuracy, obtained for n = 4, was 75%, and the largest area under the ROC (AUC), obtained for n = 5, was 0.70. We also took into account that a false positive, which may cause anxiety and discomfort to the patient and lead to a biopsy, is not as serious as a false positive, which may delay the detection of cancer, thus requiring more aggressive and expensive treatments and drastically reducing the survival rate. After consulting with a radiologist of HM Montepríncipe hospital, we estimate that the relative cost of a false negative is at least 20 times higher than that of a false positive and defined a metric in which a false negative weighs the same as 20 false positives. In our study, the CNN with n = 5 has the smallest weighted error, by far, so we have selected this network as a reference for the next phases of our study. In the second group of experiments we have used three of the most popular pre-trained CNNs available in Keras: VGG16, VGG19, and ResNet50, and optimized their parameters for our dataset; this process is usually called transfer learning. Contrary to other results published in the literature, all these re-trained CNNs performed worse than the optimal network built from scratch, i.e., the one with n = 5. Finally, we have built several hybrid models by replacing the top m layers of the optimal CNN with either a Support Vector Machine (SVM) or a Sum-Product Network (SPN), for different values of m. Again the performance was lower than for the optimal pure CNN. The conclusion is that when the dataset contains a relatively small number of images, large CNNs tend to overfit, thus leading to poor AUCs, contrary to the case of large datasets, for which very deep networks usually perform much better than shallow ones. An additional reason for which transfer learning did not work in our study is that the above-mentioned networks were trained for color images, while in a thermogram every pixel does not represent a red-green-blue (RGB) color, but a temperature, and for this reason in our case the networks built from scratch (at least some of them) performed better than re-trained CNNs.
Descripción
Categorías UNESCO
Palabras clave
Citación
Centro
E.T.S. de Ingeniería Informática
Departamento
Inteligencia Artificial
Grupo de investigación
Grupo de innovación
Programa de doctorado
Cátedra
DOI