Skip to content
Tom Cashman

Tom Cashman

Research Team Lead

Posts

November Papers: Perspectives on efficiency

November is back to a favourite topic of ours: efficiency. We reviewed three of our favorite papers looking on LLM efficiency from different angles:

  • First up, How to Scale Second-Order Optimization is looking at optimal tuning of second order optimizers such as Muon.
  • Intelligence per Watt discusses our favorite metric on large language models: energy efficiency. And how to take advantage of edge AI inference.
  • Finally, Int vs FP is contributing to an old-timer topic in quantization: integer vs floating (block) point formats.

July Papers: Subliminal Learning, Mixture of Recursions and Dataset Curation

As July brought tennis at Wimbledon, so too did the ML world serve up a volley of research. This month, we took an eagle-eyed approach—or, perhaps, Hawk Eyed approach—to three papers.

In our first paper, Subliminal Learning addresses the question, "Can we control or filter the distillation training data so that a student learns desirable properties but avoids picking up undesirable traits?" The authors conclude that the student learns all the teacher's traits, whether they're desirable or not!

Next, Mixture of Recursions brings a twist to token-level computation: instead of fixed-depth processing, the model learns to recurse adaptively, allocating compute per token dynamically and efficiently—like a rally whose length depends on the importance of the point.

Last up is DataRater, where the problem of dataset quality is addressed. A 'rater' is meta-learned to curate training data without manual filtering—an ace for data-centric AI.