Recent Posts

Scale-preserving nonlinearities for u-μP

5 minute read

My colleagues and I always get excited when, every once in a while, deep learning research throws up a fun little maths problem. Our recent work on u-μP does...

July Papers: All About Scaling

17 minute read

Scaling continues to be a super hot topic of research and our selection of papers for this month all tackle different angles of how to scale models efficient...

June Papers: Mamba-2 & Matmul-free Models

14 minute read

Improving transformers is now not “just one area” of machine learning research. This is illustrated by the breadth of papers we got excited about this month,...