What is High-Quality Data

01 October 2026

Zhilin Yang speech, how to balance between SFT and RL, and reward hacking.