LLM Quantization Book

TurboQuant: Reducing LLM Memory Usage With Vector Quantization

Large language models (LLMs) aren’t actually giant computer brains. Instead, they are massive vector spaces in which the probabilities of tokens occurring in a specific order is encoded. Billions of ...

InfoWorld

What is model quantization? Smaller, faster LLMs

Reducing the precision of model weights can make deep neural networks run faster in less GPU memory, while preserving model accuracy. If ever there were a salient example of a counter-intuitive ...

Hackaday

An LLM For The Raspberry Pi

Microsoft’s latest Phi4 LLM has 14 billion parameters that require about 11 GB of storage. Can you run it on a Raspberry Pi? Get serious. However, the Phi4-mini ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results

TurboQuant: Reducing LLM Memory Usage With Vector Quantization

What is model quantization? Smaller, faster LLMs

An LLM For The Raspberry Pi

Trending now