Mean Flow
One-step ultra-fast generations for images and videos
Diffusion models (Song et al., 2021) and flow matching (Lipman et al., 2023) have become the dominant paradigm for image and video generation. Although iterative refinement eases the optimization challenge, it results in slow sampling. A common acceleration approach is progressive distillation (Salimans & Ho, 2022) (Luo et al., 2023) (Yin et al., 2024), but it requires a pretrained teacher and offers limited flexibility.
Consisteny Models
Consistency models (Song et al., 2023) (Song & Dhariwal, 2024) address this by learning a direct mapping from noisy samples to clean data, enforcing self-consistency across noise levels. This enables training from scratch and supports one-step generation without iterative refinement or teacher distillation.
However, consistency models remain heuristic, constraining model outputs rather than a ground-truth field, which often leads to training instability and requires curriculum learning over time discretization.
Mean Flows
MeanFlow (Geng et al., 2025) avoids both distillation and ad-hoc curriculum learning by modeling the average velocity field directly, providing a principled and stable one-step training framework from scratch.
Denote $\mathrm{x \sim p_{\text{data}}}$ and $\mathrm{\epsilon \sim p_{\text{prior}}}$. Consider the linear interpolation $\mathrm{z_t = (1 - t)x + t\epsilon}$. The probability flow ODE yields $\mathrm{\nu_t \equiv z_t’ = \epsilon - x}$ (link), where \(\mathrm{\nu_t \equiv \nu_t(z_t \mid x)}\) is the conditional velocity. The authors introduce the elegant concept of the average velocity:
\[\begin{align} \mathrm{u(z_t, r, t) \triangleq \frac{1}{t - r} \int_{r}^{t} v(z_{\tau}, \tau) \, d\tau.}\label{def_mf} \end{align}\]The above equality reduces to flow matching when $\mathrm{r=t}$ and the visualization is shown below.
Taking the derivative w.r.t. $\mathrm{t}$ on both sides of \eqref{def_mf}, we have
\[\begin{align} \mathrm{(\text{LHS})\quad \dfrac{d z_t}{d t}\partial_z u+\partial_t u} &= \mathrm{\frac{\partial}{\partial t} \left( \frac{1}{t-r} \int_{r}^{t} v(z_{\tau}, \tau)\, d\tau \right)} \notag \qquad (\text{RHS}) \\[6pt] \mathrm{\nu_t\partial_z u+\partial_t u} &= \mathrm{\frac{1}{t-r} \cdot v(z_t, t) \;-\; \frac{1}{(t-r)^2} \int_{r}^{t} v(z_{\tau}, \tau)\, d\tau} \notag \\[6pt] \mathrm{\nu_t\partial_z u+\partial_t u} &= \mathrm{\frac{v(z_t, t)(t-r) - \int_{r}^{t} v(z_{\tau}, \tau)\, d\tau}{(t-r)^2}} \notag \\[6pt] \mathrm{\nu_t\partial_z u + 0 \cdot \partial_r u +1 \cdot \partial_t u} &= \mathrm{\frac{v(z_t, t) - u(z_t, r, t)}{t-r}}. \label{eq_} \end{align}\]The derivatives $\mathrm{[\partial_z u, \partial_r u, \partial_t u]}$ on the LHS can be computed via a Jacobian–vector product (JVP) along the tangent vector $\mathrm{[\nu, 0, 1]}$. Let $\mathrm{u_{\theta}}$ parameterize the mean velocity $\mathrm{u}$. The training loss becomes:
\[\begin{align} \mathrm{E_{t, r}[\|u_{\theta}(z_t, r, t) - sg\big[v(z_t, t) - (t-r)(\nu_t\partial_z u + 0 \cdot \partial_r u +1 \cdot \partial_t u)\big]\|^2_2]},\notag \end{align}\]where $\mathrm{sg\big[\cdot\big]}$ denotes the stop-gradient operator.
After optimizing $\mathrm{u_{\theta}}$, using the probability flow ODE $\mathrm{\nu_t \equiv z_t’}$, sampling is simply:
\[\begin{align} \mathrm{z_0=z_1 - \int_0^1 \nu_t dt=z_1-u_{\theta}(z_1, 0, 1)}.\notag \end{align}\]Core Code Snippet
# -----------------------------------------------------------------
# Instantaneous velocity
t, r = self.sample_tr(bz)
e = jax.random.normal(self.make_rng('gen'), x.shape, dtype=self.dtype)
z_t = (1 - t) * x + t * e
v = e - x
# Guided velocity
v_g = self.guidance_fn(v, z_t, t, labels, train=False) if self.guidance_eq else v
# Cond dropout (dropout class labels)
y_inp, v_g = self.cond_drop(v, v_g, labels)
# -----------------------------------------------------------------
# Compute u_tr (average velocity) and du_dt using jvp
def u_fn(z_t, t, r):
return self.u_fn(z_t, t, t - r, y=y_inp, train=train)
dt_dt = jnp.ones_like(t)
dr_dt = jnp.zeros_like(t)
u, du_dt = jax.jvp(u_fn, (z_t, t, r), (v_g, dt_dt, dr_dt))
# -----------------------------------------------------------------
# Compute loss
u_tgt = v_g - jnp.clip(t - r, a_min=0.0, a_max=1.0) * du_dt
u_tgt = jax.lax.stop_gradient(u_tgt)
loss = (u - u_tgt) ** 2
loss = jnp.sum(loss, axis=(1, 2, 3)) # sum over pixels
# Adaptive weighting
adp_wt = (loss + self.norm_eps) ** self.norm_p
loss = loss / jax.lax.stop_gradient(adp_wt)
# -----------------------------------------------------------------
loss = loss.mean() # mean over batch
- Song, Y., Sohl-Dickstein, J., P. Kingma, D., Kumar, A., Ermon, S., & Poole, B. (2021). Score-Based Generative Modeling through Stochastic Differential Equations. International Conference on Learning Representations (ICLR).
- Lipman, Y., Chen, R. T. Q., Ben-Hamu, H., Nickel, M., & Le, M. (2023). Flow Matching for Generative Modeling. ICLR.
- Salimans, T., & Ho, J. (2022). Progressive Distillation for Fast Sampling of Diffusion Models. ICLR.
- Luo, W., Hu, T., Zhang, S., Sun, J., Li, Z., & Zhang, Z. (2023). Diff-Instruct: A Universal Approach for Transferring Knowledge From Pre-trained Diffusion Models. NeurIPS.
- Yin, T., Gharbi, M., Zhang, R., Shechtman, E., Durand, F., Freeman, W. T., & Park, T. (2024). One-Step Diffusion with Distribution Matching Distillation. CVPR.
- Song, Y., Dhariwal, P., Chen, M., & Sutskever, I. (2023). Consistency Models. ICML.
- Song, Y., & Dhariwal, P. (2024). Improved Techniques for Training Consistency Models. ICLR.
- Geng, Z., Deng, M., Bai, X., Kolter, J. Z., & He, K. (2025). Mean Flows for One-step Generative Modeling. NeurIPS.