Summary: Scalable MatMul-free Language Modeling
ATLAN TEAM
Authors: Rui-Jie Zhu, Yu Zhang, Ethan Sifferman, Tyler Sheaves, Yiqiao Wang, Dustin Richmond, Peng Zhou, Jason K. Eshraghian
Abstract: This paper presents a method to eliminate matrix multiplication (MatMul) in large language models (LLMs), significantly reducing computational cost and memory usage. The proposed models perform comparably to state-of-the-art Transformers but use less memory, particularly during inference. The approach includes a GPU-efficient implementation and a custom FPGA hardware solution, highlighting the efficiency and scalability of MatMul-free models for billion-parameter scales.
For detailed insights, visit the full paper here.