Training deep neural networks: a static load balancing approach

Moreno Álvarez, Sergio; Haut, Juan Mario; Paoletti, Mercedes Eugenia; Rico Gallego, Juan Antonio; Díaz Martín, Juan Carlos; Plaza, Javier

Fecha

2020-03-02

Derechos de acceso

info:eu-repo/semantics/openAccess

Editorial

Springer

Citas

0 citas en

Resumen

Deep neural networks are currently trained under data-parallel setups on high-performance computing (HPC) platforms, so that a replica of the full model is charged to each computational resource using non-overlapped subsets known as batches. Replicas combine the computed gradients to update their local copies at the end of each batch. However, differences in performance of resources assigned to replicas in current heterogeneous platforms induce waiting times when synchronously combining gradients, leading to an overall performance degradation. Albeit asynchronous communication of gradients has been proposed as an alternative, it suffers from the so-called staleness problem. This is due to the fact that the training in each replica is computed using a stale version of the parameters, which negatively impacts the accuracy of the resulting model. In this work, we study the application of well-known HPC static load balancing techniques to the distributed training of deep models. Our approach is assigning a different batch size to each replica, proportional to its relative computing capacity, hence minimizing the staleness problem. Our experimental results (obtained in the context of a remotely sensed hyperspectral image processing application) show that, while the classification accuracy is kept constant, the training time substantially decreases with respect to unbalanced training. This is illustrated using heterogeneous computing platforms, made up of CPUs and GPUs with different performance.

Descripción

The registered version of this article, first published in “Journal of Supercomputing 76", is available online at the publisher's website: Springer, https://doi.org/10.1007/s11227-020-03200-6 La versión registrada de este artículo, publicado por primera vez en “Journal of Supercomputing 76", está disponible en línea en el sitio web del editor: Springer, https://doi.org/10.1007/s11227-020-03200-6

Palabras clave

Deep learning, High-performance computing, Distributed training, Heterogeneous platforms

Citación

Moreno-Álvarez, S., Haut, J.M., Paoletti, M.E. et al. Training deep neural networks: a static load balancing approach. J Supercomput 76, 9739–9754 (2020). https://doi.org/10.1007/s11227-020-03200-6

Centro

E.T.S. de Ingeniería Informática

Departamento

Lenguajes y Sistemas Informáticos

Fecha

Editor/a

Director/a

Tutor/a

Coordinador/a

Prologuista

Revisor/a

Ilustrador/a

Derechos de acceso

Título de la revista

ISSN de la revista

Título del volumen

Editorial

Citas

Proyectos de investigación

Unidades organizativas

Número de la revista

Resumen

Descripción

Categorías UNESCO

Palabras clave

Citación

Centro

Departamento

Grupo de investigación

Grupo de innovación

Programa de doctorado

Cátedra

Datos de investigación relacionados

Handle

DOI

Colecciones