Language Models

Scaling transformers for long futures (TBC)

This is an ongoing blog where I explore and improve my understanding of the language models:

I’ll keep sharing interesting topics I discover about language models over time.

TBD

BPE tokenizen

Bert

RoBert.

ALBERT

DistilBERT

DeBERTa

T5

GPT

scalability via KV caching

Scaling laws

The flops calculus of Language Model Training