LLMs From Scratch: Day 0

May 20, 2026

Inspired by recent posts from Vlad Feinberg and others, I will be implementing an LLM from scratch. Roughly this means going through the literature, starting with early papers like Attention Is All You Need and then moving to more advanced techniques like Rotarty Embeddings, training playbooks, etc.

The plan is to implement almost everything from scratch in JAX: matmuls, scaled dot-product attention, multi-head attention, encoders, decoders, positional enconders, and finally a full transformer model. The first objective is to recreate the results from Attention Is All You Need, and then move to more complicated architectures and tasks. All will be posted to GitHub, and I suppose you'll have to take my word for it that the code is written by me, by hand, without any help from AI.

See you on Day 1.