Training deep neural networks: a static load balancing approach

Moreno Álvarez, Sergio; Haut, Juan Mario; Paoletti, Mercedes Eugenia; Rico Gallego, Juan Antonio; Díaz Martín, Juan Carlos; Plaza, Javier

Publicación:
Training deep neural networks: a static load balancing approach

dc.contributor.author	Moreno Álvarez, Sergio
dc.contributor.author	Haut, Juan Mario
dc.contributor.author	Paoletti, Mercedes Eugenia
dc.contributor.author	Rico Gallego, Juan Antonio
dc.contributor.author	Díaz Martín, Juan Carlos
dc.contributor.author	Plaza, Javier
dc.contributor.orcid	https://orcid.org/0000-0003-1030-3729
dc.contributor.orcid	https://orcid.org/0000-0002-4264-7473
dc.contributor.orcid	https://orcid.org/0000-0002-8435-3844
dc.contributor.orcid	https://orcid.org/0000-0002-8908-1606
dc.date.accessioned	2024-11-15T11:09:07Z
dc.date.available	2024-11-15T11:09:07Z
dc.date.issued	2020-03-02
dc.description	The registered version of this article, first published in “Journal of Supercomputing 76", is available online at the publisher's website: Springer, https://doi.org/10.1007/s11227-020-03200-6 La versión registrada de este artículo, publicado por primera vez en “Journal of Supercomputing 76", está disponible en línea en el sitio web del editor: Springer, https://doi.org/10.1007/s11227-020-03200-6
dc.description.abstract	Deep neural networks are currently trained under data-parallel setups on high-performance computing (HPC) platforms, so that a replica of the full model is charged to each computational resource using non-overlapped subsets known as batches. Replicas combine the computed gradients to update their local copies at the end of each batch. However, differences in performance of resources assigned to replicas in current heterogeneous platforms induce waiting times when synchronously combining gradients, leading to an overall performance degradation. Albeit asynchronous communication of gradients has been proposed as an alternative, it suffers from the so-called staleness problem. This is due to the fact that the training in each replica is computed using a stale version of the parameters, which negatively impacts the accuracy of the resulting model. In this work, we study the application of well-known HPC static load balancing techniques to the distributed training of deep models. Our approach is assigning a different batch size to each replica, proportional to its relative computing capacity, hence minimizing the staleness problem. Our experimental results (obtained in the context of a remotely sensed hyperspectral image processing application) show that, while the classification accuracy is kept constant, the training time substantially decreases with respect to unbalanced training. This is illustrated using heterogeneous computing platforms, made up of CPUs and GPUs with different performance.	en
dc.description.version	versión final
dc.identifier.citation	Moreno-Álvarez, S., Haut, J.M., Paoletti, M.E. et al. Training deep neural networks: a static load balancing approach. J Supercomput 76, 9739–9754 (2020). https://doi.org/10.1007/s11227-020-03200-6
dc.identifier.doi	https://doi.org/10.1007/s11227-020-03200-6
dc.identifier.issn	0920-8542
dc.identifier.uri	https://hdl.handle.net/20.500.14468/24388
dc.journal.title	Journal of Supercomputing
dc.journal.volume	76
dc.language.iso	en
dc.page.final	9754
dc.page.initial	9739
dc.publisher	Springer
dc.relation.center	E.T.S. de Ingeniería Informática
dc.relation.department	Lenguajes y Sistemas Informáticos
dc.rights	info:eu-repo/semantics/openAccess
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/deed.es
dc.subject	12 Matemáticas::1203 Ciencia de los ordenadores ::1203.17 Informática
dc.subject.keywords	Deep learning	en
dc.subject.keywords	High-performance computing	en
dc.subject.keywords	Distributed training	en
dc.subject.keywords	Heterogeneous platforms	en
dc.title	Training deep neural networks: a static load balancing approach	en
dc.type	artículo	es
dc.type	journal article	en
dspace.entity.type	Publication
relation.isAuthorOfPublication	3482d7bc-e120-48a3-812e-cc4b25a6d2fe
relation.isAuthorOfPublication.latestForDiscovery	3482d7bc-e120-48a3-812e-cc4b25a6d2fe

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: MorenoAlvarez_Sergio_2020TrainingDeepNeuralNe.pdf
Tamaño:: 1.5 MB
Formato:: Adobe Portable Document Format

Descargar

Bloque de licencias

Mostrando 1 - 1 de 1

Nombre:: license.txt
Tamaño:: 3.62 KB
Formato:: Item-specific license agreed to upon submission
Descripción:

Descargar

Colecciones

Artículos y papers

Publicación: Training deep neural networks: a static load balancing approach

Archivos

Bloque original

Bloque de licencias

Colecciones

Publicación:
Training deep neural networks: a static load balancing approach