Scale-preserving nonlinearities for u-μP
My colleagues and I always get excited when, every once in a while, deep learning research throws up a fun little maths problem. Our recent work on u-μP does...
My colleagues and I always get excited when, every once in a while, deep learning research throws up a fun little maths problem. Our recent work on u-μP does...
Scaling continues to be a super hot topic of research and our selection of papers for this month all tackle different angles of how to scale models efficient...
Improving transformers is now not “just one area” of machine learning research. This is illustrated by the breadth of papers we got excited about this month,...