Recent Posts

June Papers: Mamba-2 & Matmul-free Models

14 minute read

Improving transformers is now not “just one area” of machine learning research. This is illustrated by the breadth of papers we got excited about this month,...

A transformer walk-through, with Gemma

36 minute read

Transformer-based LLMs seem mysterious, but they don’t need to. In this post, we’ll walk through a modern transformer LLM, Google’s Gemma, providing bare-bon...