Blog

What I cannot create, I do not understand.

Richard Feynman

Mixture of Experts: Essentials

A sparse routing for scalable Transformers

Sequence GAN Explained

Adversarial generations of discrete sequences

Notes on Training NanoChat

A practical walkthrough for building a tiny ChatGPT

Flow Matching: A Minimal Guide

Learning vector fields in continuous and discrete spaces

Attention = Optimal Transport? Yes.

The backward pass is policy gradient

Inside the Transformers

Understanding the model architecture behind LLMs

Generalized Reparametrization Tricks

Backpropagation of continuous and discrete random variables

Generative Transformer with Enigmatic Orders

Image synthesis via transformers

Discrete Diffusion Models: The Modern BERT

Diffusion language models for images, languages, and general state spaces

Policy Gradients and Stochastic Control

Thoughts on diffusion model alignment

Mean Flow: A Brief Introduction

One-step ultra-fast generations for images and videos

Feynman–Kac Formula Without the Mystery

A popular tool in finance and stochastic optimal control

Sequential Monte Carlo: A Quick Guide

A general framework for modeling nonlinear state-space models

Why Tweedie Formula Matters

A simple empirical Bayesian posterior estimate

A Transport View of Filtering

Interpretating Bayes’ law via optimal transport for filtering problems

Hutchinson Estimator, Explained

An unbiased Monte Carlo sampler for implicit trace estimation

Transformer Filter

Can a Transformer represent a Kalman filter?

Random Fourier Features

A Monte Carlo sampler for radial basis function kernels and positional embedding

Kalman Filter: The Core Ideas

A standard template for linear state-space models

Autoregressive Flow

The pioneering normalizing flows within generative models.

Rectified Flow and Beyond

Does a straighter flow always yield more efficient transport?

Implicit Ridge Regularization

The optimal penalty can be zero or negative for real-world high dimensional data.

The Triangle of Flow, Diffusion, and PDE

Connections between probability flows, diffusions, and PDEs.

Coupling by Reflection (II)

A general coupling technique for characterizing a broad range of diffusions.

Girsanov and MLE

An application of Girsanov theorem in parameter estimation.

Schrödinger Bridge Problem

A framework that unifies ODE, PDE, SDE, stochastic control, optimal transport, and fluid dynamics

Understanding Hamiltonian Monte Caro

An elegant sampler that utilizes Hamiltonian dynamics to propose new states in simulations.

Couplings and Monte Carlo Methods (I)

A family of techniques to understand the convergence of random variables.

Lyapunov Function for Poincaré Inequality

An inequality that unifies ODE, PDE, SDE, functional analysis, and Riemannian geometry.

Replica Exchange and Variance Reduction

Running multiple MCMCs at different temperatures to explore the solution thoroughly.

Dynamic Importance Sampling and Beyond

Negative learning rates help escape local traps.