Generalization Analysis of Asynchronous SGD Variants

Asynchronous Stochastic Gradient Descent (ASGD) improves training efficiency by enabling parallel workers to update model parameters asynchronously, which introduces staleness in the updates.

While convergence of ASGD algorithms is well established, their impact on generalization is less explored.

Our study shows that Asynchronous SGD methods achieve comparable convergence and equal or better generalization than standard SGD despite staleness.

Project Report

Generalization of Asynchronous SGD Variants.pdf

Project Repository

GitHub Repository

Code Documentation