GPU Parallelism
Training Large Models on Multiple GPUs
https://lilianweng.github.io/posts/2021-09-25-train-large/
bottleneck of data, solution: improve token efficiency
https://lilianweng.github.io/posts/2021-09-25-train-large/
bottleneck of data, solution: improve token efficiency