Enhancing Distributed Neural Network Training Through Node-Based Communications

Moreno Álvarez, Sergio; Paoletti, Mercedes Eugenia; Cavallaro, Gabriele; Haut, Juan M.

Publicación:
Enhancing Distributed Neural Network Training Through Node-Based Communications

dc.contributor.author	Moreno Álvarez, Sergio
dc.contributor.author	Paoletti, Mercedes Eugenia
dc.contributor.author	Cavallaro, Gabriele
dc.contributor.author	Haut, Juan M.
dc.contributor.orcid	https://orcid.org/0000-0003-1030-3729
dc.contributor.orcid	https://orcid.org/0000-0002-3239-9904
dc.contributor.orcid	https://orcid.org/0000-0001-6701-961X
dc.date.accessioned	2024-11-20T08:07:09Z
dc.date.available	2024-11-20T08:07:09Z
dc.date.issued	2023
dc.description	The registered version of this article, first published in “IEEE Transactions on Neural Networks and Learning Systems, 2023", is available online at the publisher's website: Elsevier, https://doi.org/10.1016/j.chb.2020.106595 La versión registrada de este artículo, publicado por primera vez en “IEEE Transactions on Neural Networks and Learning Systems, 2023", está disponible en línea en el sitio web del editor: Elsevier, https://doi.org/10.1016/j.chb.2020.106595
dc.description.abstract	The amount of data needed to effectively train modern deep neural architectures has grown significantly, leading to increased computational requirements. These intensive computations are tackled by the combination of last generation computing resources, such as accelerators, or classic processing units. Nevertheless, gradient communication remains as the major bottleneck, hindering the efficiency notwithstanding the improvements in runtimes obtained through data parallelism strategies. Data parallelism involves all processes in a global exchange of potentially high amount of data, which may impede the achievement of the desired speedup and the elimination of noticeable delays or bottlenecks. As a result, communication latency issues pose a significant challenge that profoundly impacts the performance on distributed platforms. This research presents node-based optimization steps to significantly reduce the gradient exchange between model replicas whilst ensuring model convergence. The proposal serves as a versatile communication scheme, suitable for integration into a wide range of general-purpose deep neural network (DNN) algorithms. The optimization takes into consideration the specific location of each replica within the platform. To demonstrate the effectiveness, different neural network approaches and datasets with disjoint properties are used. In addition, multiple types of applications are considered to demonstrate the robustness and versatility of our proposal. The experimental results show a global training time reduction whilst slightly improving accuracy. Code: https://github.com/mhaut/eDNNcomm.	en
dc.description.version	versión publicada
dc.identifier.citation	S. Moreno-Álvarez, M. E. Paoletti, G. Cavallaro and J. M. Haut, "Enhancing Distributed Neural Network Training Through Node-Based Communications," in IEEE Transactions on Neural Networks and Learning Systems, doi: 10.1109/TNNLS.2023.3309735
dc.identifier.doi	https://doi.org/10.1109/TNNLS.2023.3309735
dc.identifier.issn	2162-237X \| eISSN 2162-2388
dc.identifier.uri	https://hdl.handle.net/20.500.14468/24438
dc.journal.title	IEEE Transactions on Neural Networks and Learning Systems
dc.language.iso	en
dc.page.final	15
dc.page.initial	1
dc.publisher	IEEE
dc.relation.center	E.T.S. de Ingeniería Informática
dc.relation.department	Lenguajes y Sistemas Informáticos
dc.rights	info:eu-repo/semantics/openAccess
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.es
dc.subject	12 Matemáticas::1203 Ciencia de los ordenadores ::1203.17 Informática
dc.subject.keywords	Training	en
dc.subject.keywords	Computational modeling	en
dc.subject.keywords	Data models	en
dc.subject.keywords	Distributed databases	en
dc.subject.keywords	Parallel processing	en
dc.subject.keywords	Costs	en
dc.subject.keywords	Optimization	en
dc.title	Enhancing Distributed Neural Network Training Through Node-Based Communications	en
dc.type	artículo	es
dc.type	journal article	en
dspace.entity.type	Publication
relation.isAuthorOfPublication	3482d7bc-e120-48a3-812e-cc4b25a6d2fe
relation.isAuthorOfPublication.latestForDiscovery	3482d7bc-e120-48a3-812e-cc4b25a6d2fe

Archivos

Bloque original

Mostrando 1 - 1 de 1

Nombre:: MorenoAlvarez_Sergio_2023EnhancingDistributed.pdf
Tamaño:: 3.03 MB
Formato:: Adobe Portable Document Format

Descargar

Bloque de licencias

Mostrando 1 - 1 de 1

Nombre:: license.txt
Tamaño:: 3.62 KB
Formato:: Item-specific license agreed to upon submission
Descripción:

Descargar

Colecciones

Artículos y papers

Publicación: Enhancing Distributed Neural Network Training Through Node-Based Communications

Archivos

Bloque original

Bloque de licencias

Colecciones

Publicación:
Enhancing Distributed Neural Network Training Through Node-Based Communications