Summary: Scalable MatMul-free Language Modeling

Authors: Rui-Jie Zhu, Yu Zhang, Ethan Sifferman, Tyler Sheaves, Yiqiao Wang, Dustin Richmond, Peng Zhou, Jason K. Eshraghian

Abstract: This paper presents a method to eliminate matrix multiplication (MatMul) in large language models (LLMs), significantly reducing computational cost and memory usage. The proposed models perform comparably to state-of-the-art Transformers but use less memory, particularly during inference. The approach includes a GPU-efficient implementation and a custom FPGA hardware solution, highlighting the efficiency and scalability of MatMul-free models for billion-parameter scales.

For detailed insights, visit the full paper here.

Contact Us

Enquiries

Tom Kallo

June 12, 2024

ATLAN TEAM

Contact Us

How can we help?