Posts by Category

papers-of-the-month

July Papers: All About Scaling

17 minute read

Scaling continues to be a super hot topic of research and our selection of papers for this month all tackle different angles of how to scale models efficient...

June Papers: Mamba-2 & Matmul-free Models

14 minute read

Improving transformers is now not “just one area” of machine learning research. This is illustrated by the breadth of papers we got excited about this month,...

Back to Top ↑

posts

Scale-preserving nonlinearities for u-μP

5 minute read

My colleagues and I always get excited when, every once in a while, deep learning research throws up a fun little maths problem. Our recent work on u-μP does...

A transformer walk-through, with Gemma

36 minute read

Transformer-based LLMs seem mysterious, but they don’t need to. In this post, we’ll walk through a modern transformer LLM, Google’s Gemma, providing bare-bon...

Back to Top ↑