Recurrent Transformers

TBD

TBD

https://lilianweng.github.io/posts/2021-09-25-train-large/

bottleneck of data, solution: improve token efficiency