What is High-Quality Data

Training Large Models on Multiple GPUs

Zhilin Yang speech, how to balance between SFT and RL, and reward hacking.