December Papers: FP8 Training & Simpler Transformers
The last month saw impressive developments in the space of efficient transformers and applied ML, from materials discovery to chip design.
The last month saw impressive developments in the space of efficient transformers and applied ML, from materials discovery to chip design.
TL;DR: Scaled dot product attention isn’t properly scaled, and that’s a good thing!