
- This event has passed.
Yury POLYANSKIY (MIT) – Optimal Quantization for LLMs and Matrix Multiplication
Statistical Seminar: Every Monday at 2:00 pm.
Time: 2:00 pm – 3:00 pm
Date: 12th May
Place: 3001
Yury POLYANSKIY (MIT) – Optimal Quantization for LLMs and Matrix Multiplication
Abstract:
The main building block of large language models is matrix multiplication, which is often bottlenecked by the speed of loading these matrices from memory. A number of recent quantization algorithms (SmoothQuant, GPTQ, QuIP, SpinQuant etc) address this issue by storing matrices in lower precision. We derive optimal asymptotic information-theoretic tradeoff between accuracy of the matrix product and compression rate (number of bits per matrix entry). We also show that a non-asymptotic version of our construction (based on nested Gosset lattices and Conway-Sloan decoding), which we call NestQuant, reduces perplexity deterioration almost three-fold compared to the state-of-the-art algorithms (as measured on LLama-2, Llama-3 with 8B to 70B parameters). Based on a joint work with Or Ordentlich (HUJI), Eitan Porat and Semyon Savkin (MIT EECS).
Organizers:
Anna KORBA (CREST), Karim LOUNICI (CMAP) , Jaouad MOURTADA (CREST)
Sponsors:
CREST-CMAP