October Papers: Fast and Smart Language Models
October was packed with insights into making language models faster and smarter. We reviewed four of our favorite papers for you in detail:
- First up, Grouped Lattice Vector Quantisation introduces a novel technique for a fine-grained post-training quantisation of LLMs, retaining good performance even at low bit widths.
- Planned Diffusion combines autoregressive planning with text diffusion, achieving low-latency text generation.
- Rethinking Thinking addresses the problem of long reasoning chains by distilling intermediate results into a bounded workspace for faster answers.
- Finally, When Structure Doesn’t Help compares techniques for encoding graphs for consumption by LLMs with surprising results.