December Papers: MoE, Fact-storing and Byteifying Language Models
Despite the holiday season and the busy NeurIPS period, December closed the year with set of insightful papers. Our team reviewed the following three papers:
- First up, SonicMoE tackles issues of fine-grained and sparse MoEs using hardware-aware optimizations to restore efficiency.
- Next, Constructing Efficient Fact-Storing MLPs for Transformers shows how MLP layers can be explicitly constructed as key–value stores to achieve high facts-per-parameter efficiency.
- Finally, Bolmo presents a method for "byteifying" existing subword-level language models that improves character-level understanding while achieving comparable performance to subword-level models.