Blog

Mixture of Experts: Essentials

23 December 2025

A sparse routing for scalable Transformers

Sequence GAN Explained

30 November 2025

Adversarial generations of discrete sequences

Notes on Training NanoChat

15 November 2025

A practical walkthrough for building a tiny ChatGPT

Flow Matching: A Minimal Guide

10 November 2025

Learning vector fields in continuous and discrete spaces

Attention = Optimal Transport? Yes.

26 October 2025

The backward pass is policy gradient

Inside the Transformers

01 September 2025

Understanding the model architecture behind LLMs

Generalized Reparametrization Tricks

09 August 2025

Backpropagation of continuous and discrete random variables

Generative Transformer with Enigmatic Orders

03 August 2025

Image synthesis via transformers

Discrete Diffusion Models: The Modern BERT

18 July 2025

Diffusion language models for images, languages, and general state spaces

Policy Gradients and Stochastic Control

07 June 2025

Thoughts on diffusion model alignment

Mean Flow: A Brief Introduction

31 May 2025

One-step ultra-fast generations for images and videos

Feynman–Kac Formula Without the Mystery

15 March 2025

A popular tool in finance and stochastic optimal control

Sequential Monte Carlo: A Quick Guide

08 February 2025

A general framework for modeling nonlinear state-space models

Why Tweedie Formula Matters

05 January 2025

A simple empirical Bayesian posterior estimate

A Transport View of Filtering

11 November 2024

Interpretating Bayes’ law via optimal transport for filtering problems

Hutchinson Estimator, Explained

27 July 2024

An unbiased Monte Carlo sampler for implicit trace estimation

Transformer Filter

15 April 2024

Can a Transformer represent a Kalman filter?

Random Fourier Features

10 April 2024

A Monte Carlo sampler for radial basis function kernels and positional embedding

Kalman Filter: The Core Ideas

02 April 2024

A standard template for linear state-space models

Autoregressive Flow

16 March 2024

The pioneering normalizing flows within generative models.

Rectified Flow and Beyond

09 March 2024

Does a straighter flow always yield more efficient transport?

Implicit Ridge Regularization

07 March 2024

The optimal penalty can be zero or negative for real-world high dimensional data.

The Triangle of Flow, Diffusion, and PDE

01 July 2023

Connections between probability flows, diffusions, and PDEs.

Coupling by Reflection (II)

20 May 2023

A general coupling technique for characterizing a broad range of diffusions.

Girsanov and MLE

20 March 2023

An application of Girsanov theorem in parameter estimation.

Schrödinger Bridge Problem

19 June 2022

A framework that unifies ODE, PDE, SDE, stochastic control, optimal transport, and fluid dynamics

Understanding Hamiltonian Monte Caro

01 November 2021

An elegant sampler that utilizes Hamiltonian dynamics to propose new states in simulations.

Couplings and Monte Carlo Methods (I)

01 August 2021

A family of techniques to understand the convergence of random variables.

Lyapunov Function for Poincaré Inequality

01 June 2021

An inequality that unifies ODE, PDE, SDE, functional analysis, and Riemannian geometry.

Replica Exchange and Variance Reduction

01 May 2021

Running multiple MCMCs at different temperatures to explore the solution thoroughly.

Dynamic Importance Sampling and Beyond

05 November 2020

Negative learning rates help escape local traps.