<?xml version="1.0" encoding="UTF-8" ?> <?xml-stylesheet type="text/xsl" href="rss.xsl"?><rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/"> <channel> <title>Graphcore Research</title><link>https://graphcore-research.github.io/</link><atom:link href="https://graphcore-research.github.io/feed_rss_updated.xml" rel="self" type="application/rss+xml" /> <language>en</language> <pubDate>Tue, 12 May 2026 16:41:38 -0000</pubDate> <lastBuildDate>Tue, 12 May 2026 16:41:38 -0000</lastBuildDate> <ttl>1440</ttl> <generator>MkDocs RSS plugin - v1.19.0</generator> <image> <url>None</url> <title>Graphcore Research</title> <link>https://graphcore-research.github.io/</link> </image> <item> <title>1-bit Wonderful Weights for LLMs</title> <author>Douglas Orr</author> <category>Articles</category> <category>LLMs</category> <category>number-formats</category> <category>quantisation</category> <description>&lt;p&gt;Would you rather use 1 million $\times$ 16-bit weights, 4 million $\times$ 4-bit weights, or even 16 million $\times$ 1-bit weights?&lt;/p&gt; &lt;p&gt;In joint work between Aleph Alpha Research and Graphcore, we asked this question of LLMs — the answer encouraged us to embrace the wonder ✨ of 1-bit weights, which can outperform 4-bit and 16-bit weights on a fixed weight memory budget.&lt;/p&gt; &lt;p&gt;&lt;img alt=&#34;1-bit weights rule!&#34; src=&#34;img/1bitwonderful.jpg&#34;&gt;&lt;/p&gt;</description> <link>https://graphcore-research.github.io/2026-03-11-1-bit-wonder/</link> <pubDate>Wed, 11 Mar 2026 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2026-03-11-1-bit-wonder/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2026-03-11-1-bit-wonder.png" type="image/png" length="32449" /> </item> <item> <title>February Papers: Thinking Depth, Latent Actions, Quantization and Riemannian Flows</title> <author>Luke Hudlass-Galley</author> <author>Tom Cashman</author> <author>Douglas Orr</author> <author>Benoit Gaujac</author> <category>Papers of the Month</category> <category>flow map learning</category> <category>llms</category> <category>long-context</category> <category>promotoer DNA design</category> <category>protein backbone generation</category> <category>quantisation</category> <category>reasoning</category> <category>reinforcement-learning</category> <category>riemannian manifold generative modeling</category> <category>transformers</category> <category>video-generation</category> <description>&lt;p&gt;The stream of papers never ends, even so, in February our team found 4 we&#39;d like to share:&lt;/p&gt; &lt;ul&gt; &lt;li&gt; &lt;p&gt;&lt;em&gt;Think Deep, Not Just Long: Measuring LLM Reasoning Effort via Deep-Thinking Tokens&lt;/em&gt; investigates how many layers are actually needed for each token during autoregressive LM rollouts.&lt;/p&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt;&lt;em&gt;Factored Latent Action World Models&lt;/em&gt; takes videos that contain multiple objects, and instead of encoding them into one latent state for the &lt;em&gt;whole scene&lt;/em&gt;, employs one latent state &lt;em&gt;per object&lt;/em&gt;.&lt;/p&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt;&lt;em&gt;LATMiX: Learnable Affine Transformations for Microscaling Quantization of LLMs&lt;/em&gt; generalises Hadamard transforms to better handle outliers when block-quantizing LLMs.&lt;/p&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt;&lt;em&gt;Riemannian Mean Flow&lt;/em&gt; extends &lt;a href=&#34;https://arxiv.org/pdf/2505.13447&#34;&gt;MeanFlow&lt;/a&gt; for generating proteins within the corresponding structured spaces, e.g. the space of all residue positions and orientations.&lt;/p&gt; &lt;/li&gt; &lt;/ul&gt;</description> <link>https://graphcore-research.github.io/2026-03-03-potm/</link> <pubDate>Tue, 03 Mar 2026 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2026-03-03-potm/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2026-03-03-potm.png" type="image/png" length="37541" /> </item> <item> <title>UltRAG: a Universal Simple Scalable Recipe for Knowledge Graph RAG</title> <author>Dobrik Georgiev</author> <category>Articles</category> <category>RAG</category> <category>graph foundational models</category> <category>knowledge-graphs</category> <description>&lt;p&gt;Knowledge graphs are an efficient and easily verifiable repository of factual information and using knowledge graph queries as a tool for LLMs to improve the factuality of their output is a promising direction. But have you ever wondered how to make query execution work for knowledge graph RAG? &#34;No!&#34;/&#34;Boring!&#34; Let us guess &amp;mdash; queries were flawed, knowledge graphs incomplete, &lt;strong&gt;results were simply suboptimal&lt;/strong&gt;. What if we tell you that we have discovered a secret... &lt;em&gt;recipe&lt;/em&gt;.&lt;/p&gt; &lt;p&gt;&lt;img alt=&#34;&#34; src=&#34;images/wedabest.png&#34;&gt;&lt;/p&gt;</description> <link>https://graphcore-research.github.io/2026-02-20-ultrag/</link> <pubDate>Fri, 20 Feb 2026 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2026-02-20-ultrag/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2026-02-20-ultrag.png" type="image/png" length="38174" /> </item> <item> <title>January Papers: Conditional Memories for LMs, Audio-Visual FMs, and Batch Size Schedulers</title> <author>Luke Prince</author> <author>Benoit Gaujac</author> <author>Callum McLean</author> <category>LLM</category> <category>Papers of the Month</category> <category>audio-visual generation</category> <category>diffusion</category> <category>diffusion transformer</category> <category>efficient-inference</category> <category>llm</category> <category>memory</category> <category>pretraining</category> <category>sparsity</category> <category>training dynamics</category> <description>&lt;p&gt;Welcome to the first edition of our Paper of the Month newsletter for 2026! &lt;/p&gt; &lt;p&gt;This month, our team went through 21 different papers to find the most insightful new pieces of literature that we think have the potential to leave a mark. From this selection, three papers stood out in particular: &lt;/p&gt; &lt;ul&gt; &lt;li&gt; &lt;p&gt;&lt;em&gt;Conditional Memory via Scalable Lookup: A New Axis of Sparsity for Large Language Models.&lt;/em&gt; Cheng et al. introduce a simple, scalable memory-augmentation for large language models to offload the cost of simple knowledge-based retrieval to embedding lookups.&lt;/p&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt;&lt;em&gt;LTX-2: Efficient Joint Audio-Visual Foundation Model.&lt;/em&gt; HaCohen et al. propose a joint text-conditioned audio-visual generation framework built using modality-specific VAEs, a refined text-conditioning module, and an asymmetric dual-stream diffusion transformer.&lt;/p&gt; &lt;/li&gt; &lt;li&gt; &lt;p&gt;&lt;em&gt;How to Set the Batch Size for Large-Scale Pre-training?&lt;/em&gt; Zhou et al. discuss how to identify the optimal batch size for large-scale pretraining, and find that dyamically increasing the batch size through time can improve performance.&lt;/p&gt; &lt;/li&gt; &lt;/ul&gt;</description> <link>https://graphcore-research.github.io/2026-02-10-potm/</link> <pubDate>Tue, 10 Feb 2026 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2026-02-10-potm/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2026-02-10-potm.png" type="image/png" length="37816" /> </item> <item> <title>December Papers: MoE, Fact-storing and Byteifying Language Models</title> <author>Seán Comerford</author> <author>Johanna Vielhaben</author> <author>Luka Ribar</author> <category>LLMs</category> <category>Papers of the Month</category> <category>byte-level</category> <category>efficient-training</category> <category>mixture-of-experts</category> <category>sparsity</category> <category>transformers</category> <description>&lt;p&gt;Despite the holiday season and the busy NeurIPS period, December closed the year with set of insightful papers. Our team reviewed the following three papers:&lt;/p&gt; &lt;!-- SonicMoE: Accelerating MoE with IO and Tile-aware Optimizations --&gt; &lt;ul&gt; &lt;li&gt;First up, &lt;a href=&#34;#sonicmoe-accelerating-moe-with-io-and-tile-aware-optimizations&#34;&gt;SonicMoE&lt;/a&gt; tackles issues of fine-grained and sparse MoEs using hardware-aware optimizations to restore efficiency.&lt;/li&gt; &lt;/ul&gt; &lt;!-- Constructing Efficient Fact-Storing MLPs for Transformers --&gt; &lt;ul&gt; &lt;li&gt;Next, &lt;a href=&#34;#constructing-efficient-fact-storing-mlps-for-transformers&#34;&gt;Constructing Efficient Fact-Storing MLPs for Transformers&lt;/a&gt; shows how MLP layers can be explicitly constructed as key–value stores to achieve high facts-per-parameter efficiency. &lt;/li&gt; &lt;/ul&gt; &lt;!-- Bolmo: Byteifying the Next Generation of Language Models --&gt; &lt;ul&gt; &lt;li&gt;Finally, &lt;a href=&#34;#bolmo-byteifying-the-next-generation-of-language-models&#34;&gt;Bolmo&lt;/a&gt; presents a method for &#34;byteifying&#34; existing subword-level language models that improves character-level understanding while achieving comparable performance to subword-level models.&lt;/li&gt; &lt;/ul&gt;</description> <link>https://graphcore-research.github.io/2026-01-13-potm/</link> <pubDate>Tue, 13 Jan 2026 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2026-01-13-potm/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2026-01-13-potm.png" type="image/png" length="38645" /> </item> <item> <title>November Papers: Perspectives on efficiency</title> <author>Tom Cashman</author> <author>Sylvain Viguier</author> <author>Paul Balança</author> <category>Papers of the Month</category> <category>efficiency</category> <category>efficient-training</category> <category>inference</category> <category>llm</category> <category>mup</category> <category>optimisation</category> <category>optimization</category> <category>power</category> <category>quantization</category> <category>scaling-laws</category> <description>&lt;p&gt;November is back to a favourite topic of ours: efficiency. We reviewed three of our favorite papers looking on LLM efficiency from different angles:&lt;/p&gt; &lt;ul&gt; &lt;li&gt;First up, &lt;a href=&#34;#how-to-scale-second-order-optimization&#34;&gt;How to Scale Second-Order Optimization&lt;/a&gt; is looking at optimal tuning of second order optimizers such as Muon. &lt;/li&gt; &lt;li&gt;&lt;a href=&#34;#intelligence-per-watt-measuring-intelligence-efficiency-of-local-ai&#34;&gt;Intelligence per Watt&lt;/a&gt; discusses our favorite metric on large language models: energy efficiency. And how to take advantage of edge AI inference.&lt;/li&gt; &lt;li&gt;Finally, &lt;a href=&#34;#int-vs-fp-a-comprehensive-study-of-fine-grained-low-bit-quantization-formats&#34;&gt;Int vs FP&lt;/a&gt; is contributing to an old-timer topic in quantization: integer vs floating (block) point formats.&lt;/li&gt; &lt;/ul&gt;</description> <link>https://graphcore-research.github.io/2025-12-08-potm/</link> <pubDate>Mon, 08 Dec 2025 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2025-12-08-potm/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2025-12-08-potm.png" type="image/png" length="37306" /> </item> <item> <title>Why Graph Topology Matters: Insights from Applications in Drug Discovery</title> <author>Daniel Justus</author> <category>Articles</category> <category>graph-learning</category> <category>knowledge-graphs</category> <category>life-sciences</category> <description>&lt;h2&gt;Knowledge Graphs in Drug Discovery&lt;/h2&gt; &lt;p&gt;Repurposing existing drugs to treat diseases beyond what they were originally designed for can be a way to identify new disease treatment opportunities. But how do we identify which drugs might affect a given disease? This and similar questions in drug discovery, which require identifying new links between known entities, can be addressed with the help of &lt;strong&gt;Knowledge Graphs (KGs)&lt;/strong&gt;, graph-structured repositories of information that represent facts as &lt;em&gt;(head, relation, tail)&lt;/em&gt; triples, connecting entities &lt;em&gt;head&lt;/em&gt; and &lt;em&gt;tail&lt;/em&gt; with an edge that categorizes their relationship. In the biomedical domain, entities can represent drugs and diseases, but also genes, pathways, side effects, etc. KG edges represent interactions like (disease A, associates, gene B), (gene X, upregulates, gene Y) and many more.&lt;/p&gt;</description> <link>https://graphcore-research.github.io/2025-11-05-the-role-of-graph-topology/</link> <pubDate>Wed, 05 Nov 2025 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2025-11-05-the-role-of-graph-topology/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2025-11-05-the-role-of-graph-topology.png" type="image/png" length="46376" /> </item> <item> <title>October Papers: Fast and Smart Language Models</title> <author>Douglas Orr</author> <author>Benoit Gaujac</author> <author>Sam Olesker-Taylor</author> <author>Kheeran Naidu</author> <category>GNNs</category> <category>LLMs</category> <category>Papers of the Month</category> <category>diffusion</category> <category>efficient-inference</category> <category>graph-learning</category> <category>quantisation</category> <category>reasoning</category> <category>reinforcement-learning</category> <description>&lt;p&gt;October was packed with insights into making language models faster and smarter. We reviewed four of our favorite papers for you in detail:&lt;/p&gt; &lt;ul&gt; &lt;li&gt;First up, &lt;a href=&#34;#learning-grouped-lattice-vector-quantizers-for-low-bit-llm-compression&#34;&gt;Grouped Lattice Vector Quantisation&lt;/a&gt; introduces a novel technique for a fine-grained post-training quantisation of LLMs, retaining good performance even at low bit widths.&lt;/li&gt; &lt;li&gt;&lt;a href=&#34;#planned-diffusion&#34;&gt;Planned Diffusion&lt;/a&gt; combines autoregressive planning with text diffusion, achieving low-latency text generation.&lt;/li&gt; &lt;li&gt;&lt;a href=&#34;#rethinking-thinking-tokens-llms-as-improvement-operators&#34;&gt;Rethinking Thinking&lt;/a&gt; addresses the problem of long reasoning chains by distilling intermediate results into a bounded workspace for faster answers.&lt;/li&gt; &lt;li&gt;Finally, &lt;a href=&#34;#when-structure-doesnt-help-llms-do-not-read-text-attributed-graphs-as-effectively-as-we-expected&#34;&gt;When Structure Doesn’t Help&lt;/a&gt; compares techniques for encoding graphs for consumption by LLMs with surprising results.&lt;/li&gt; &lt;/ul&gt;</description> <link>https://graphcore-research.github.io/2025-11-04-potm/</link> <pubDate>Tue, 04 Nov 2025 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2025-11-04-potm/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2025-11-04-potm.png" type="image/png" length="38731" /> </item> <item> <title>September Papers: The L in ML Stands for LLMs</title> <author>Sam Olesker-Taylor</author> <author>Dobrik Georgiev</author> <author>Douglas Orr</author> <author>Luke Hudlass-Galley</author> <category>LLMs</category> <category>Papers of the Month</category> <category>RAG</category> <category>efficient-inference</category> <category>fine-tuning</category> <category>reasoning</category> <category>reinforcement-learning</category> <category>self-improvement</category> <description>&lt;p&gt;For September, the research team reviewed a whopping 22 papers! Needless to say, competition was fierce, and only four made the final cut for this month’s edition, which is LLM-themed: &lt;/p&gt; &lt;ul&gt; &lt;li&gt;&lt;a href=&#34;#flowrl-matching-reward-distributions-for-llm-reasoning&#34;&gt;FlowRL&lt;/a&gt; uses GFlowNets to train LLMs on full reward distributions, promoting diverse reasoning paths instead of just reward maximization. &lt;/li&gt; &lt;li&gt;&lt;a href=&#34;#soft-tokens-hard-truths&#34;&gt;Soft Tokens, Hard Truths&lt;/a&gt; proposes using continuous “soft” tokens with injected noise to enable reinforcement learning fine-tuning of LLM reasoning. &lt;/li&gt; &lt;li&gt;&lt;a href=&#34;#set-block-decoding-is-a-language-model-inference-accelerator&#34;&gt;Set Block Decoding&lt;/a&gt; accelerates LLM inference by generating multiple tokens in parallel using non-causal attention and iterative entropy-based sampling. &lt;/li&gt; &lt;li&gt;&lt;a href=&#34;#metacognitive-reuse-turning-recurring-llm-reasoning-into-concise-behaviors&#34;&gt;Metacognitive Reuse&lt;/a&gt; enables LLMs to extract and reuse concise reasoning “behaviors” to improve efficiency and reduce repeated computation. &lt;/li&gt; &lt;/ul&gt;</description> <link>https://graphcore-research.github.io/2025-10-07-potm/</link> <pubDate>Tue, 07 Oct 2025 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2025-10-07-potm/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2025-10-07-potm.png" type="image/png" length="36217" /> </item> <item> <title>August Papers: Optimal Dataset Mixtures, Stable Molecule Generation, and Agentic Hypergraph RAG</title> <author>Michael Pearce</author> <author>Yeman Brhane Hagos</author> <author>Kheeran Naidu</author> <category>LLMs</category> <category>Papers of the Month</category> <category>active-learning</category> <category>dataset</category> <category>diffusion</category> <category>efficient-training</category> <category>fine-tuning</category> <category>ligand</category> <category>molecule-generation</category> <category>optimisation</category> <category>reinforcement-learning</category> <category>retrieval-augmented-generation</category> <description>&lt;p&gt;August, even with its heat waves and holidays, left no shortage of exciting research. Our top papers for this month are the following: - &lt;a href=&#34;#admire-bayesopt-accelerated-data-mixture-re-weighting-for-language-models-with-bayesian-optimization&#34;&gt;ADMIRE-BayesOpt&lt;/a&gt; that investigates how to weight different data sources when they are mixed to make a single training dataset where, using multi-Fidelity Bayesian Optimization, the search for the optimal mixture can be automated; - &lt;a href=&#34;#guiding-diffusion-models-with-reinforcement-learning-for-stable-molecule-generation&#34;&gt;Stable Molecule Generation&lt;/a&gt; that uses a force-field based reward function to fine-tune pre-trained 3D molecule generation diffusion models with the goal of sampling physically stable and valid molecules; and - &lt;a href=&#34;#graph-r1-towards-agentic-graphrag-framework-via-end-to-end-reinforcement-learning&#34;&gt;Graph-R1&lt;/a&gt; that takes an agentic RAG approach with a knowledge hypergraph to effectively represent and retrieve information from a corpus of documents.&lt;/p&gt;</description> <link>https://graphcore-research.github.io/2025-09-08-potm/</link> <pubDate>Mon, 08 Sep 2025 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2025-09-08-potm/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2025-09-08-potm.png" type="image/png" length="41999" /> </item> <item> <title>July Papers: Subliminal Learning, Mixture of Recursions and Dataset Curation</title> <author>Tom Cashman</author> <author>Luka Ribar</author> <author>Paul Balança</author> <category>LLMs</category> <category>Papers of the Month</category> <category>dataset</category> <category>distillation</category> <category>efficient-inference</category> <category>mixture-of-experts</category> <description>&lt;p&gt;As July brought tennis at Wimbledon, so too did the ML world serve up a volley of research. This month, we took an eagle-eyed approach—or, perhaps, &lt;em&gt;Hawk Eye&lt;/em&gt;d approach—to three papers.&lt;/p&gt; &lt;p&gt;In our first paper, &lt;a href=&#34;#subliminal-learning-language-models-transmit-behavioral-traits-via-hidden-signals-in-data&#34;&gt;Subliminal Learning&lt;/a&gt; addresses the question, &#34;Can we control or filter the distillation training data so that a student learns desirable properties but avoids picking up undesirable traits?&#34; The authors conclude that the student learns &lt;em&gt;all&lt;/em&gt; the teacher&#39;s traits, whether they&#39;re desirable or not!&lt;/p&gt; &lt;p&gt;Next, &lt;a href=&#34;#mixture-of-recursions-learning-dynamic-recursive-depths-for-adaptive-token-level-computation&#34;&gt;Mixture of Recursions&lt;/a&gt; brings a twist to token-level computation: instead of fixed-depth processing, the model learns to recurse adaptively, allocating compute per token dynamically and efficiently—like a rally whose length depends on the importance of the point.&lt;/p&gt; &lt;p&gt;Last up is &lt;a href=&#34;#datarater-meta-learned-dataset-curation&#34;&gt;DataRater&lt;/a&gt;, where the problem of dataset quality is addressed. A &#39;rater&#39; is meta-learned to curate training data without manual filtering—an ace for data-centric AI.&lt;/p&gt;</description> <link>https://graphcore-research.github.io/2025-08-01-potm/</link> <pubDate>Fri, 01 Aug 2025 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2025-08-01-potm/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2025-08-01-potm.png" type="image/png" length="37059" /> </item> <item> <title>June Papers: Gradient Norms, LLM Reasoning and Video Generation</title> <author>Callum McLean</author> <author>Sam Olesker-Taylor</author> <author>Michael Pearce</author> <category>LLMs</category> <category>Papers of the Month</category> <category>diffusion</category> <category>efficient-inference</category> <category>fine-tuning</category> <category>image-generation</category> <category>reasoning</category> <category>reinforcement-learning</category> <category>training-dynamics</category> <category>video-generation</category> <description>&lt;p&gt;This June not only brought us very hot and sunny days (at least here in the UK), but also an excellent selection of new and exciting ML research! Out of the many good candidates, this month we selected three papers, covering quite a lot of different ground.&lt;/p&gt; &lt;p&gt;In the first paper, &lt;a href=&#34;#why-gradients-rapidly-increase-near-the-end-of-training&#34;&gt;Why Gradients Rapidly Increase Near the End of Training&lt;/a&gt;, a researcher from FAIR explores the puzzling phenomenon of increasing gradient magnitudes during training, offering an elegant mathematical explanation and a simple remedy.&lt;/p&gt; &lt;p&gt;Next, in &lt;a href=&#34;#prorl-prolonged-reinforcement-learning-expands-reasoning-boundaries-in-large-language-models&#34;&gt;ProRL&lt;/a&gt;, NVIDIA researchers dive into the evolving topic of large language model reasoning, showing how prolonged reinforcement learning can indeed introduce novel reasoning abilities.&lt;/p&gt; &lt;p&gt;Finally, we look at &lt;a href=&#34;#autoregressive-adversarial-post-training-for-real-time-interactive-video-generation&#34;&gt;AAPT&lt;/a&gt;, a fresh approach from the ByteDance Seed team that turns pre-trained offline diffusion models into real-time video generators via adversarial post-training.&lt;/p&gt;</description> <link>https://graphcore-research.github.io/2025-07-01-potm/</link> <pubDate>Tue, 01 Jul 2025 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2025-07-01-potm/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2025-07-01-potm.png" type="image/png" length="44579" /> </item> <item> <title>Optimal Formats and the Cube Root of the PDF</title> <author>Douglas Orr</author> <category>Articles</category> <category>efficient-inference</category> <category>number-formats</category> <category>quantisation</category> <description>&lt;p&gt;Your boss emails you a point in 128-billion-dimensional space. &#34;Llama 3.1 8B,&#34; the message reads. &#34;A not-so-large language model in &lt;code&gt;bfloat16&lt;/code&gt;. But it&#39;s too big. Trim the fat (ASAP).&#34; You open up your toolbox: quantisation, sparsity, distillation.&lt;/p&gt; &lt;p&gt;Quantisation comes first, with two problems. First, you must choose a space smaller than a 128-billion-dimensional binary number for the model to sit in. Second, you need to find a good point in that space. In our recent work on &lt;a href=&#34;https://arxiv.org/abs/2505.12988&#34;&gt;optimal formats for weight quantisation&lt;/a&gt;, we&#39;ve had a crack at the first question.&lt;/p&gt; &lt;p&gt;In this post, we&#39;ll learn how to construct optimal formats for known scalar distributions via the &#34;cube root rule&#34;. We&#39;ll start with a recap of an existing format that claims optimality for the normal distribution. Then we&#39;ll explore the cube root rule — a non-intuitive result from the 1950s — and use it to build our own quantisation formats for scaled normal, Laplace and Student&#39;s t distributions.&lt;/p&gt;</description> <link>https://graphcore-research.github.io/2025-06-11-cube-root-formats/</link> <pubDate>Wed, 11 Jun 2025 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2025-06-11-cube-root-formats/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2025-06-11-cube-root-formats.png" type="image/png" length="33830" /> </item> <item> <title>May Papers: Parallel scaling, Evolving code, Understanding LLM reasoning</title> <author>Tom Pollak</author> <author>Robert Hu</author> <author>Luke Hudlass-Galley</author> <author>Sam Olesker-Taylor</author> <category>AGI</category> <category>Evolutionary Algorithms</category> <category>LLMs</category> <category>Papers of the Month</category> <category>RAG</category> <category>efficient-inference</category> <category>fine-tuning</category> <category>reasoning</category> <category>reinforcement-learning</category> <category>scaling-laws</category> <category>test-time-compute</category> <description>&lt;p&gt;Hurtling past the NeurIPS submission deadline into the summer months, we switch from huddling around server rooms to keep warm to babysitting experiments whilst basking in the sun. We&#39;ve had a bumper month of papers to sift through and once again we offer summaries of a few of our favourites.&lt;/p&gt; &lt;p&gt;First, &lt;a href=&#34;#parallel-scaling-laws-for-language-models&#34;&gt;Parallel Scaling Laws for Language Models&lt;/a&gt; proposes a novel method of scaling compute with language models inspired by classifier-free guidance that finetunes a model to run multiple forward passes with different learned vector prefixes. We also looked into &lt;a href=&#34;#alphaevolve-a-coding-agent-for-scientific-and-algorithmic-discovery&#34;&gt;AlphaEvolve&lt;/a&gt;, an evolutionary algorithm from Google DeepMind that generates and refine prompts for Gemini that can advance the state-of-the-art in algorithm design. &lt;/p&gt; &lt;p&gt;Since it has been a particularly exciting month for contributions on LLM reasoning, we picked two papers to dive into deeper. In &lt;a href=&#34;#soft-thinking-unlocking-the-reasoning-potential-of-llms-in-continuous-concept-space&#34;&gt;Soft Thinking&lt;/a&gt; the authors attempt to improve on prior work sampling continuous token embeddings rather than discrete tokens during reasoning phases of text generation. Finally, in &lt;a href=&#34;#spurious-rewards-rethinking-training-signals-in-rlvr&#34;&gt;Spurious Rewards&lt;/a&gt; they find that even rewarding random answers can improve reasoning ability, potentially forcing us to reconsider how we understand post-training techniques to improve the use of test-time compute. &lt;/p&gt;</description> <link>https://graphcore-research.github.io/2025-06-02-potm/</link> <pubDate>Mon, 02 Jun 2025 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2025-06-02-potm/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2025-06-02-potm.png" type="image/png" length="36197" /> </item> <item> <title>April Papers: Motion Prompting, Mamba Reasoning and Modeling Rewards</title> <author>Alex Cunha</author> <author>Arianna Saracino</author> <author>Kheeran Naidu</author> <category>LLMs</category> <category>Papers of the Month</category> <category>diffusion</category> <category>generative-models</category> <category>inference-time-compute</category> <category>mamba</category> <category>reasoning</category> <category>reinforcement-learning</category> <category>reward-modeling</category> <category>test-time-compute</category> <description>&lt;p&gt;April has been a busy month for the AI research community, with ICLR (the first of the &#34;big three&#34; AI conferences of the year) taking place in Singapore. We&#39;re pleased to share summaries of a few of our favourite papers we&#39;ve seen this month.&lt;/p&gt; &lt;p&gt;First up, &lt;a href=&#34;#motion-prompting-controlling-video-generation-with-motion-trajectories&#34;&gt;Motion Prompting&lt;/a&gt; introduces flexible spatio-temporal trajectories, or &#34;motion prompts&#34;, as a powerful new way to control nuanced dynamic actions and motion in video generation, overcoming the limitations of text prompts. This is followed by &lt;a href=&#34;#inference-time-scaling-for-generalist-reward-modeling&#34;&gt;Inference-Time Scaling for Generalist Reward Modeling&lt;/a&gt;, which presents Self-Principled Critique Tuning (SPCT), a method that powers DeepSeek-GRM—a generalist reward model capable of generating adaptive, high-quality rewards and achieving strong performance gains through scalable inference-time compute. Finally, &lt;a href=&#34;#m1-towards-scalable-test-time-compute-with-mamba-reasoning-models&#34;&gt;M1&lt;/a&gt; looks at using a Mamba-based architecture to tackle reasoning problems, as a more computationally-efficient approach when compared to transformers with chains-of-thought.&lt;/p&gt;</description> <link>https://graphcore-research.github.io/2025-05-07-potm/</link> <pubDate>Wed, 07 May 2025 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2025-05-07-potm/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2025-05-07-potm.png" type="image/png" length="41273" /> </item> <item> <title>March Papers: De-Norming, Skill-Scaling, Over-Training and Drug-Generating</title> <author>Luke Prince</author> <author>Alberto Cattaneo</author> <author>Douglas Orr</author> <author>Yeman Brhane Hagos</author> <category>GNNs</category> <category>LLMs</category> <category>Papers of the Month</category> <category>activation-functions</category> <category>drug-design</category> <category>fine-tuning</category> <category>flow-matching</category> <category>normalisation</category> <category>scaling-laws</category> <category>transformers</category> <description>&lt;p&gt;We&#39;ve enjoyed March, bringing improving weather and many excellent ML papers to keep us busy. As usual, we&#39;re here to share summaries of four of our favourites.&lt;/p&gt; &lt;p&gt;First, Meta share their work that successfully removes the need for &lt;code&gt;LayerNorm&lt;/code&gt; in transformers, replacing them with a reduction-free $\tanh$ (&lt;a href=&#34;#transformers-without-normalisation&#34;&gt;de-norming&lt;/a&gt;). This is followed by two papers on scaling - studying the different scaling laws for skill-based vs knowledge-based downstream tasks (&lt;a href=&#34;#compute-optimal-scaling-of-skills-knowledge-vs-reasoning&#34;&gt;skill-scaling&lt;/a&gt;), and whether pretraining can go on too long, making downstream performance worse (&lt;a href=&#34;#overtrained-language-models-are-harder-to-fine-tune&#34;&gt;over-training&lt;/a&gt;). Finally, EPFL share a flow-matching GNN model for generating small molecules for drug design (&lt;a href=&#34;#multi-domain-distribution-learning-for-de-novo-drug-design&#34;&gt;drug-generating&lt;/a&gt;).&lt;/p&gt;</description> <link>https://graphcore-research.github.io/2025-04-06-potm/</link> <pubDate>Sun, 06 Apr 2025 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2025-04-06-potm/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2025-04-06-potm.png" type="image/png" length="37187" /> </item> <item> <title>February Papers: Learning to Scale</title> <author>Luke Prince</author> <author>Luka Ribar</author> <author>Paul Balança</author> <author>Luke Hudlass-Galley</author> <category>LLMs</category> <category>Papers of the Month</category> <category>efficient-inference</category> <category>quantisation</category> <category>reasoning</category> <category>scaling-laws</category> <category>speculative-decoding</category> <category>transformers</category> <description>&lt;p&gt;Welcome to Papers of the Month! This time around, our monthly selection of ML papers revolves around the central theme of &lt;em&gt;scale&lt;/em&gt; – and learning how to scale efficiently. Scaling-laws for LLMs, multi-scale quantisation training and scaling test-time compute: it&#39;s a rich buffet!&lt;/p&gt; &lt;p&gt;The first paper, &lt;strong&gt;Distillation Scaling Laws&lt;/strong&gt;, presents a thorough study of distillation for Language Models, with the aim of estimating how student performance scales as a function of model size and amount of distillation data used -- offering very useful insights, in an era where distillation pre-training of LLMs is becoming more and more widespread to improve &#34;capability per watt&#34;.&lt;/p&gt; &lt;p&gt;The problem of computational efficiency and cost reduction is also at the heart of &lt;strong&gt;Matryoshka Quantisation&lt;/strong&gt;, DeepMind&#39;s solution for training a quantised model that can then be easily served at different lower numerical precisions, by leveraging the nested structure of integer data types. And if you are a quantisation geek like we are, make sure to also read our summary of &lt;strong&gt;ParetoQ&lt;/strong&gt;, a new unified framework to investigate the scaling laws that govern the trade-off between quantised model size and accuracy in extremely low-bit regimes.&lt;/p&gt; &lt;p&gt;Finally, we jump from training scaling laws to &lt;strong&gt;scaling up test-time compute&lt;/strong&gt;, with a paper that introduces a recurrent block in LLMs at test-time to allow the model to perform iterative reasoning in latent space, without verbalizing its intermediate thoughts, to improve its performance.&lt;/p&gt; &lt;p&gt;&lt;em&gt;We hope you enjoy these month&#39;s papers as much as we did! If you have thoughts or questions, please reach out to us at &lt;a href=&#34;https://x.com/GCResearchTeam&#34;&gt;@GCResearchTeam&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</description> <link>https://graphcore-research.github.io/2025-02-27-potm/</link> <pubDate>Thu, 27 Feb 2025 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2025-02-27-potm/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2025-02-27-potm.png" type="image/png" length="32136" /> </item> <item> <title>January Papers: More Like &#34;Reas-anuary Papers&#34;</title> <author>Alex Cunha</author> <author>Luka Ribar</author> <author>Paul Balança</author> <author>Alexandre Payot</author> <category>LLMs</category> <category>Papers of the Month</category> <category>efficient-inference</category> <category>efficient-training</category> <category>fine-tuning</category> <category>inference</category> <category>long-context</category> <category>memory</category> <category>quantisation</category> <category>reasoning</category> <category>reinforcement learning</category> <description>&lt;p&gt;New year, new Papers of the Month! Kicking off 2025, it&#39;s apparent that reasoning and test-time compute are the hot topics on the block, with much research investigating how to best use these new methods to improve LLM capabilities.&lt;/p&gt; &lt;p&gt;We start with &lt;strong&gt;Titans&lt;/strong&gt;, which introduces a memory module to architectures that can be updated during inference. This results in a hybrid between attention mechanisms and recurrent models, and unlocks the ability to handle really long sequence lengths.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Evolving Deeper LLM Thinking&lt;/strong&gt; explores evolutionary search strategies to scale test-time compute, outperforming other inference strategies in natural language planning tasks.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Transformer-Squared&lt;/strong&gt; is a novel approach that adapts LLMs for new tasks by selectively adjusting the singular components of their weight matrices, helping broaden LLMs&#39; abilities to handle diverse tasks with fewer parameters and greater efficiency.&lt;/p&gt; &lt;p&gt;Finally, we look at two recent models from DeepSeek; &lt;strong&gt;DeepSeek-V3&lt;/strong&gt; and &lt;strong&gt;DeepSeek-R1&lt;/strong&gt;. Given this double-release is packed with so much information, today we&#39;ll only cover the high-level details on the innovations described in the papers and their impact on efficiency and model performance — we will release a new blog post soon with a deep-dive into DeepSeek&#39;s recent publications.&lt;/p&gt; &lt;p&gt;&lt;em&gt;We hope you enjoy these month&#39;s papers as much as we did! If you have thoughts or questions, please reach out to us at &lt;a href=&#34;https://x.com/GCResearchTeam&#34;&gt;@GCResearchTeam&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</description> <link>https://graphcore-research.github.io/2025-01-31-potm/</link> <pubDate>Fri, 31 Jan 2025 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2025-01-31-potm/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2025-01-31-potm.png" type="image/png" length="38890" /> </item> <item> <title>Llama 3.2 Vision — A Deep Dive</title> <author>Douglas Orr</author> <category>Articles</category> <category>LLMs</category> <category>VLMs</category> <category>transformers</category> <description>&lt;p&gt;Vision-Language Models (VLMs) allow LLMs to &#34;see&#34;, but how do they work? In this post, we&#39;ll walk through the model changes needed to turn an LLM into a VLM for inference. To understand the LLM starting point, please see &lt;a href=&#34;/posts/2024/04-gemma/gemma.md&#34;&gt;A transformer walk-through with Gemma&lt;/a&gt;, as we shall assume that content here.&lt;/p&gt; &lt;p&gt;&lt;strong&gt;Problem&lt;/strong&gt; — Text generation, &lt;em&gt;conditioned on an image&lt;/em&gt;: take an RGB image (below) and a short string prompt &lt;em&gt;&#34;What colour shirt is the person to the left of the laptop wearing?&#34;&lt;/em&gt;, then use an already-trained VLM (&lt;a href=&#34;https://huggingface.co/meta-llama/Llama-3.2-11B-Vision-Instruct&#34;&gt;Llama-3.2-11B-Vision-Instruct&lt;/a&gt; by Meta) to generate an answer to the prompt.&lt;/p&gt; &lt;p&gt;&lt;img alt=&#34;Image of four people looking at a laptop&#34; src=&#34;./image.png&#34;&gt;{:.img-large}&lt;/p&gt;</description> <link>https://graphcore-research.github.io/2024-12-30-llama-vision/</link> <pubDate>Mon, 30 Dec 2024 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2024-12-30-llama-vision/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2024-12-30-llama-vision.png" type="image/png" length="29377" /> </item> <item> <title>December Papers: Spend Your FLOPs Wisely</title> <author>Luka Ribar</author> <author>Luke Prince</author> <author>Douglas Orr</author> <author>Alexandre Payot</author> <category>LLMs</category> <category>Papers of the Month</category> <category>byte-level</category> <category>diffusion</category> <category>efficient-training</category> <category>embedding-models</category> <category>generative-models</category> <category>language-models</category> <category>reasoning</category> <category>sparsity</category> <category>synthetic data</category> <category>training</category> <category>transformers</category> <description>&lt;p&gt;Welcome to Papers of the Month — Graphcore Research&#39;s effort to bring you our pick of the most interesting ML papers. In December we noted a collection of papers which took innovative approaches to allocating compute (FLOPs) to input data.&lt;/p&gt; &lt;p&gt;We start with the Byte Latent Transformer. This modifies the standard transformer to operate on &lt;em&gt;patches&lt;/em&gt;, which comprise a variable number of input bytes, as determined by an entropy metric. The consequence of this is that compute is dynamically allocated towards &#34;harder input data&#34;. This has some similarities with the Concept Model architecture, which also uses a flexible intermediate representation. The model performs autoregressive sentence generation in this modality-agnostic space, rather than token space. &lt;/p&gt; &lt;p&gt;The Memory Layers architecture allows extra parameters to be added to a model without increasing FLOPs. Decoupling these resources gives model designers more control (e.g. for co-design, to fit their hardware resources) and potentially facilitates more effective models in general.&lt;/p&gt; &lt;p&gt;Finally, the Phi-4 paper presents a rather different FLOPs angle: spending compute in the data-generation process to create higher quality data, leading to &#34;student&#34; models that (in some domains) out-perform their &#34;teachers&#34;.&lt;/p&gt; &lt;p&gt;&lt;em&gt;We hope you enjoy these month&#39;s papers as much as we did! If you have thoughts or questions, please reach out to us at &lt;a href=&#34;https://x.com/GCResearchTeam&#34;&gt;@GCResearchTeam&lt;/a&gt;.&lt;/em&gt;&lt;/p&gt;</description> <link>https://graphcore-research.github.io/2024-12-30-potm/</link> <pubDate>Mon, 30 Dec 2024 00:00:00 +0000</pubDate> <source url="https://graphcore-research.github.io/feed_rss_updated.xml">Graphcore Research</source><guid isPermaLink="true">https://graphcore-research.github.io/2024-12-30-potm/</guid> <enclosure url="https://graphcore-research.github.io/assets/images/social/2024-12-30-potm.png" type="image/png" length="39964" /> </item> </channel> </rss>