Loading Events
  • This event has passed.

Yury POLYANSKIY (MIT) – Optimal Quantization for LLMs and Matrix Multiplication

May 12 @ 2:00 pm

Statistical Seminar: Every Monday at 2:00 pm.
Time: 2:00 pm – 3:00 pm
Date: 12th May
Place: 3001

 

Yury POLYANSKIY (MIT) – Optimal Quantization for LLMs and Matrix Multiplication

 

 Abstract: 

The main building block of large language models is matrix multiplication, which is often bottlenecked by the speed of loading these matrices from memory. A number of recent quantization algorithms (SmoothQuant, GPTQ, QuIP, SpinQuant etc) address this issue by storing matrices in lower precision. We derive optimal asymptotic information-theoretic tradeoff between accuracy of the matrix product and compression rate (number of bits per matrix entry). We also show that a non-asymptotic version of our construction (based on nested Gosset lattices and Conway-Sloan decoding), which we call NestQuant, reduces perplexity deterioration almost three-fold compared to the state-of-the-art algorithms (as measured on LLama-2, Llama-3 with 8B to 70B parameters). Based on a joint work with Or Ordentlich (HUJI), Eitan Porat and Semyon Savkin (MIT EECS).

 

 

Organizers:

Anna KORBA (CREST), Karim LOUNICI (CMAP) , Jaouad MOURTADA (CREST)

Sponsors:
CREST-CMAP