Articles

UltRAG: a Universal Simple Scalable Recipe for Knowledge Graph RAG

Knowledge graphs are an efficient and easily verifiable repository of factual information and using knowledge graph queries as a tool for LLMs to improve the factuality of their output is a promising direction. But have you ever wondered how to make query execution work for knowledge graph RAG? "No!"/"Boring!" Let us guess — queries were flawed, knowledge graphs incomplete, results were simply suboptimal. What if we tell you that we have discovered a secret... recipe.

Why Graph Topology Matters: Insights from Applications in Drug Discovery

Knowledge Graphs in Drug Discovery

Repurposing existing drugs to treat diseases beyond what they were originally designed for can be a way to identify new disease treatment opportunities. But how do we identify which drugs might affect a given disease? This and similar questions in drug discovery, which require identifying new links between known entities, can be addressed with the help of Knowledge Graphs (KGs), graph-structured repositories of information that represent facts as (head, relation, tail) triples, connecting entities head and tail with an edge that categorizes their relationship. In the biomedical domain, entities can represent drugs and diseases, but also genes, pathways, side effects, etc. KG edges represent interactions like (disease A, associates, gene B), (gene X, upregulates, gene Y) and many more.

Optimal Formats and the Cube Root of the PDF

Your boss emails you a point in 128-billion-dimensional space. "Llama 3.1 8B," the message reads. "A not-so-large language model in bfloat16. But it's too big. Trim the fat (ASAP)." You open up your toolbox: quantisation, sparsity, distillation.

Quantisation comes first, with two problems. First, you must choose a space smaller than a 128-billion-dimensional binary number for the model to sit in. Second, you need to find a good point in that space. In our recent work on optimal formats for weight quantisation, we've had a crack at the first question.

In this post, we'll learn how to construct optimal formats for known scalar distributions via the "cube root rule". We'll start with a recap of an existing format that claims optimality for the normal distribution. Then we'll explore the cube root rule — a non-intuitive result from the 1950s — and use it to build our own quantisation formats for scaled normal, Laplace and Student's t distributions.

Llama 3.2 Vision — A Deep Dive

Vision-Language Models (VLMs) allow LLMs to "see", but how do they work? In this post, we'll walk through the model changes needed to turn an LLM into a VLM for inference. To understand the LLM starting point, please see A transformer walk-through with Gemma, as we shall assume that content here.

Problem — Text generation, conditioned on an image: take an RGB image (below) and a short string prompt "What colour shirt is the person to the left of the laptop wearing?", then use an already-trained VLM (Llama-3.2-11B-Vision-Instruct by Meta) to generate an answer to the prompt.

Image of four people looking at a laptop

Graphcore Research is hiring!

We are pleased to have announce we have open positions for Research Scientists and Engineers to join our team.

Our role within Graphcore is to help define what the next generation of AI compute systems should look like. Specialised hardware has been the key driver of the progress of AI over the last decade, and we believe that hardware-aware AI algorithms and AI-aware hardware developments will continue to be critical to the advancement of this exciting field.

Scale-preserving nonlinearities for u-μP

My colleagues and I always get excited when, every once in a while, deep learning research throws up a fun little maths problem. Our recent work on u-μP does just this, and in a reasonably systematic way, since we need to work out how to compensate for changes in scale (standard deviation) through deep learning ops. In this post and the accompanying notebook, we explore this problem.

A transformer walk-through, with Gemma

Transformer-based LLMs seem mysterious, but they don't need to. In this post, we'll walk through a modern transformer LLM, Google's Gemma, providing bare-bones PyTorch code and some intuition for why each step is there. If you're a programmer and casual ML enthusiast, this is written for you.