Radio
Now Playing
Quickyla Radio โ€” Click to play
Open โ†’
3 min left
Back to News

The latest Gemma 4 models use a training trick to slash their on-device memory footprint

Affiliate links on Android Authority may earn us a commission. Learn more. Following Googleโ€™s launch of the laptop-grade Gemma 4 12B model earlier this week, the company is releasing new Gemma 4 model checkpoints with quantization-aware training. Quantization is necessary to red

The latest Gemma 4 models use a training trick to slash their on-device memory footprint
Android Authority โ€” 5 June 2026
Text:
10 0 0

Affiliate links on Android Authority may earn us a commission. Learn more.

Following Googleโ€™s launch of the laptop-grade Gemma 4 12B model earlier this week, the company is releasing new Gemma 4 model checkpoints with quantization-aware training. Quantization is necessary to reduce the amount of memory required to run lightweight models. The standard method is post-training quantization (PTQ), which quantizes the model after training, but could result in weaker performance. The latest Gemma 4 versions use quantization-aware training (QAT) instead to reduce model quality loss and accelerate decode speed, according to Googleโ€™s blog post .

Google says that incorporating quantization into the training process results in checkpoints with better performance than models refined with PTQ. The compressed models run on phones and laptops well thanks to a custom mobile-quantization schema. This involves using pre-calculated settings, 2-bit compression in certain parts of the model, and vocabulary list and short-term memory compression. For the user, this results in a smaller model that consumes less system memory.

There are multiple model sizes available with QAT optimization, include Gemma 4 E2B, Gemma 4 E4B, Gemma 4 12B, Gemma 4 26B A4B, and Gemma 4 31B. The smallest versions, like the text-only Gemma 4 E2B model , require less than a gigabyte of memory to run. These small Gemma 4 checkpoints without intensive resource requirements are ideal for running on phones.

Google shared the approximate memory requirements to load the new Gemma 4 models with QAT in various sizes:

There are four different formats of Gemma 4 QAT models available for download: unquantized QAT checkpoints, GPT-Generated Unified Format (GGUF), mobile-optimized, and Compressed Tensors. These models preserve โ€œsimilar quality to bfloat16 while dramatically reducing the memory requirements to load the model,โ€ according to Google.

After downloading the Gemma 4 QAT model weights, users can run the checkpoints on their phones, laptops, or desktops. You can find the mobile and desktop models on Hugging Face, as well as in LM Studio .

Thank you for being part of our community. Read our Comment Policy before posting.

Advertisement
React:
Sponsored

More to Read

Cash App made a magic wand for contactless payments
๐Ÿ’ป Technology
Cash App made a magic wand for contactless payments
The Verge ยท 9 days ago
Meta is reportedly developing an AI pendant
๐Ÿ’ป Technology
Meta is reportedly developing an AI pendant
TechCrunch ยท 14 days ago
Hackers hijacked Instagram accounts by tricking Meta AI supโ€ฆ
๐Ÿ’ป Technology
Hackers hijacked Instagram accounts by tricking Meta AI support chatbot into granting accโ€ฆ
TechCrunch ยท 12 days ago
'Astonishing': James Webb telescope spots the most chemicalโ€ฆ
๐Ÿ”ฌ Science
'Astonishing': James Webb telescope spots the most chemically primitive galaxy in the ancโ€ฆ
Live Science ยท 13 days ago
CBS News insiders worry how 60 Minutes will endure after fiโ€ฆ
๐Ÿ’ฐ Business
CBS News insiders worry how 60 Minutes will endure after firings: โ€˜What are they going toโ€ฆ
Guardian Business ยท 9 days ago
Sam Altman says OpenAI's top token spender uses 100 billionโ€ฆ
๐Ÿ“ˆ Markets & Finance
Sam Altman says OpenAI's top token spender uses 100 billion tokens a month โ€” and they're โ€ฆ
Business Insider Mkt ยท 10 days ago
Full view