Science Cast

Momentum Further Constrains Sharpness at the Edge of Stochastic Stability

Arseniy AndreyevApril 16, 2026 3:04am

Views (21)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

Momentum Further Constrains Sharpness at the Edge of Stochastic Stability

arXivPDFApril 15, 2026 12:00am

Authors

Arseniy Andreyev, Advikar Ananthkumar, Marc Walden, Tomaso Poggio, Pierfrancesco Beneventano

Abstract

Recent work suggests that (stochastic) gradient descent self-organizes near an instability boundary, shaping both optimization and the solutions found. Momentum and mini-batch gradients are widely used in practical deep learning optimization, but it remains unclear whether they operate in a comparable regime of instability. We demonstrate that SGD with momentum exhibits an Edge of Stochastic Stability (EoSS)-like regime with batch-size-dependent behavior that cannot be explained by a single momentum-adjusted stability threshold. Batch Sharpness (the expected directional mini-batch curvature) stabilizes in two distinct regimes: at small batch sizes it converges to a lower plateau $2(1-β)/η$, reflecting amplification of stochastic fluctuations by momentum and favoring flatter regions than vanilla SGD; at large batch sizes it converges to a higher plateau $2(1+β)/η$, where momentum recovers its classical stabilizing effect and favors sharper regions consistent with full-batch dynamics. We further show that this aligns with linear stability thresholds and discuss the implications for hyperparameter tuning and coupling.

TwitterandLinkedIn

0 comments

Add comment

Momentum Further Constrains Sharpness at the Edge of Stochastic Stability

Momentum Further Constrains Sharpness at the Edge of Stochastic Stability

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments