Language Models
Scaling transformers for long futures (TBC)
This is an ongoing blog where I explore and improve my understanding of the language models:
I’ll keep sharing interesting topics I discover about language models over time.
TBD
BPE tokenizen
Bert
RoBert.
ALBERT
DistilBERT
DeBERTa
T5
GPT
scalability via KV caching