October Papers: Improving image generation & making LLMs think
This month brought us some exciting developments in improving image-generating models, as well as some interesting insights into how to make large language m...
This month brought us some exciting developments in improving image-generating models, as well as some interesting insights into how to make large language m...
We’re pleased to share four papers from different domains: LLM self-correction, FP8 training, generative crystals and optimisation. They are united, somewhat...
If there’s one thing you can count on from Graphcore Research, it’s tireless enthusiasm for effective compute utilsation! Our favourite papers from August i...
Scaling continues to be a super hot topic of research and our selection of papers for this month all tackle different angles of how to scale models efficient...
Improving transformers is now not “just one area” of machine learning research. This is illustrated by the breadth of papers we got excited about this month,...
May is always an eventful time of year for ML researchers, with final ICML paper decisions and ICLR taking place in early May, and NeurIPS submission deadlin...
For our April selection of AI research papers, there is a clear common thread: efficient LLM inference. But as it happens, ML researchers are showing there a...
March was a fruitful month for AI research, with plenty of papers for us to choose from. A trend in the work we’ve selected is the pushing of previously publ...
Improving LLM inference is a key research topic at the moment, and something we’re particularly interested in at Graphcore because of its hardware implicatio...
For the research community, 2023 was dominated by large transformers and the associated challenges with training, tuning and deploying them. This trend has c...
The last month saw impressive developments in the space of efficient transformers and applied ML, from materials discovery to chip design.
We are pleased to have announce we have open positions for Research Scientists and Engineers to join our team.
With the rapid advances in the capabilities of large language models (LLMs), there is an increasing need for efficient inference platforms that would enable ...
My colleagues and I always get excited when, every once in a while, deep learning research throws up a fun little maths problem. Our recent work on u-μP does...
Transformer-based LLMs seem mysterious, but they don’t need to. In this post, we’ll walk through a modern transformer LLM, Google’s Gemma, providing bare-bon...
TL;DR: Scaled dot product attention isn’t properly scaled, and that’s a good thing!