The Stochastic Control View of Attention Gradient

Backward attention gradients as advantage-based policy gradients

TBD