Sparser llamas run faster — speed up LLM inference with SparQ Attention
Improving transformers is now not “just one area” of machine learning research. This is illustrated by the breadth of papers we got excited about this month,...
May is always an eventful time of year for ML researchers, with final ICML paper decisions and ICLR taking place in early May, and NeurIPS submission deadlin...
For our April selection of AI research papers, there is a clear common thread: efficient LLM inference. But as it happens, ML researchers are showing there a...
Transformer-based LLMs seem mysterious, but they don’t need to. In this post, we’ll walk through a modern transformer LLM, Google’s Gemma, providing bare-bon...