Transformer-Squared: Self-Adaptive LLMs
The key idea
The key idea
The key idea
The key idea
With their V3 and R1 models, DeepSeek sets a new state-of-the-art in open-weight models and trades benchmark to benchmark with the best models from Anthropic...
Vision-Language Models (VLMs) allow LLMs to “see”, but how do they work? In this post, we’ll walk through the model changes needed to turn an LLM into a VLM ...
The key idea
The key idea
The key idea
The key idea
With the rapid advances in the capabilities of large language models (LLMs), there is an increasing need for efficient inference platforms that would enable ...
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
Transformer-based LLMs seem mysterious, but they don’t need to. In this post, we’ll walk through a modern transformer LLM, Google’s Gemma, providing bare-bon...
The key idea
The key idea
The key idea
The key idea
The key idea
With their V3 and R1 models, DeepSeek sets a new state-of-the-art in open-weight models and trades benchmark to benchmark with the best models from Anthropic...
The key idea
The key idea
With the rapid advances in the capabilities of large language models (LLMs), there is an increasing need for efficient inference platforms that would enable ...
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
My colleagues and I always get excited when, every once in a while, deep learning research throws up a fun little maths problem. Our recent work on u-μP does...
The key idea
The key idea
The key idea
The key idea
TL;DR: Scaled dot product attention isn’t properly scaled, and that’s a good thing!
With their V3 and R1 models, DeepSeek sets a new state-of-the-art in open-weight models and trades benchmark to benchmark with the best models from Anthropic...
The key idea
The key idea
The key idea
The key idea
The key idea
Vision-Language Models (VLMs) allow LLMs to “see”, but how do they work? In this post, we’ll walk through the model changes needed to turn an LLM into a VLM ...
The key idea
With the rapid advances in the capabilities of large language models (LLMs), there is an increasing need for efficient inference platforms that would enable ...
Transformer-based LLMs seem mysterious, but they don’t need to. In this post, we’ll walk through a modern transformer LLM, Google’s Gemma, providing bare-bon...
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
With the rapid advances in the capabilities of large language models (LLMs), there is an increasing need for efficient inference platforms that would enable ...
The key idea
The key idea
With their V3 and R1 models, DeepSeek sets a new state-of-the-art in open-weight models and trades benchmark to benchmark with the best models from Anthropic...
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
My colleagues and I always get excited when, every once in a while, deep learning research throws up a fun little maths problem. Our recent work on u-μP does...
My colleagues and I always get excited when, every once in a while, deep learning research throws up a fun little maths problem. Our recent work on u-μP does...
TL;DR: Scaled dot product attention isn’t properly scaled, and that’s a good thing!
The key idea
The key idea
The key idea
The key idea
With the rapid advances in the capabilities of large language models (LLMs), there is an increasing need for efficient inference platforms that would enable ...
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
The key idea
We are pleased to have announce we have open positions for Research Scientists and Engineers to join our team.
The key idea
The key idea
The key idea
Vision-Language Models (VLMs) allow LLMs to “see”, but how do they work? In this post, we’ll walk through the model changes needed to turn an LLM into a VLM ...
With their V3 and R1 models, DeepSeek sets a new state-of-the-art in open-weight models and trades benchmark to benchmark with the best models from Anthropic...
The key idea