💻 Technology Live

Google’s latest on-device AI model is custom-made for your laptop

Affiliate links on Android Authority may earn us a commission. Learn more. Back in April, Google released its mobile-friendly Gemma E2B and E4B models , bringing on-device multimodal AI to Android and iOS devices. It also released the high-end 26B Mixture of Experts (MoE) and 31

Android Authority

4 Jun 2026 2 hours ago 1 min read

Google’s latest on-device AI model is custom-made for your laptop

Android Authority — 4 June 2026

Text:

2 0 0

🎙️ AI Podcast — Two-Host Discussion

Google’s latest on-device AI model is custom-made for your laptop

Kokoro TTS · ~5 min episode · American English voices

Choose voices for Host A and Host B. Changes take effect on next play.

Host A 🟥

Host B 🟦

Affiliate links on Android Authority may earn us a commission. Learn more.

Back in April, Google released its mobile-friendly Gemma E2B and E4B models , bringing on-device multimodal AI to Android and iOS devices. It also released the high-end 26B Mixture of Experts (MoE) and 31B Dense models for higher-end devices with dedicated AI GPUs. Now, the company is launching another Gemma model that sits nicely between the four.

Google today announced the Gemma 4 12B model aimed at bringing on-device AI capabilities to laptops. It offers multimodal features and is the first mid-sized model from Google to support native audio input.

The company claims that its 12B model delivers performance similar to the 26B MoE model in benchmarks, while being small enough to run on normal consumer laptops with 16GB of RAM.

To achieve this, the company came up with unique solutions for supporting multimodal inputs without increasing latency and memory usage. Gemma 4 12B uses an encoder-free architecture to avoid the memory costs associated with encoders that are typically used in most multimodal AI models.

For vision, it’s using a lightweight module that utilizes “single matrix multiplication, positional embedding, and normalizations,” allowing image data to be passed to the LLM without requiring an encoder in the middle.

It also completely does away with encoding for audio inputs. Google was able to project the raw audio signal directly into the same dimensional space as text tokens.

What that means is that Gemma 4 12B can handle multimodal inputs, just like the other Gemma models, but without the added overhead of encoding such inputs. This should result in much better performance on laptops without the need for dedicated AI hardware.