Generalization Analysis of Asynchronous SGD Variants

Asynchronous Stochastic Gradient Descent (ASGD) improves training efficiency by enabling parallel workers to update model parameters asynchronously, which introduces staleness in the updates.

ASGD Diagram

While convergence of ASGD algorithms is well established, their impact on generalization is less explored.

Our study shows that Asynchronous SGD methods achieve comparable convergence and equal or better generalization than standard SGD despite staleness.

Project Report

Generalization of Asynchronous SGD Variants.pdf

Project Repository

GitHub Repository GitHub Repository

Code Documentation