Radio
Now Playing
Quickyla Radio โ€” Click to play
Open โ†’
3 min left
Back to News

DiffusionGemma is Googleโ€™s fastest AI yet, but it comes with a big trade-off

Affiliate links on Android Authority may earn us a commission. Learn more. Google has released DiffusionGemma, an experimental AI model that takes a very different approach to how most chatbots generate text today. Instead of writing one word after another in a strict sequence,

DiffusionGemma is Googleโ€™s fastest AI yet, but it comes with a big trade-off
Android Authority โ€” 10 June 2026
Text:
10 0 0

Affiliate links on Android Authority may earn us a commission. Learn more.

Google has released DiffusionGemma, an experimental AI model that takes a very different approach to how most chatbots generate text today. Instead of writing one word after another in a strict sequence, it generates a whole block of text at once and then keeps refining it until it becomes readable. The idea is to push for speed and hardware efficiency, even if it means giving up some polish in the final output.

This new AI model is open-sourced under the Apache 2.0 license and is aimed at developers and researchers rather than everyday users. To understand why this matters, it helps to look at how most large language models work. Systems like Googleโ€™s Gemma 4 generate text step by step, one token at a time. Each new word depends on what came before it, which makes the process inherently sequential and harder to speed up.

DiffusionGemma, on the other hand, starts with a full canvas of random tokens, essentially noisy, unreadable text, and then repeatedly cleans it up in multiple passes. With each pass, the output becomes more structured and coherent until it settles into a final response. A simple way to picture it is that traditional models write, while DiffusionGemma drafts and edits everything at once.

That shift has a direct impact on performance. Per Googleโ€™s claims, DiffusionGemma can be up to four times faster than standard autoregressive models in low-concurrency scenarios, where a single user or process uses the GPU. On high-end hardware, the numbers are even more aggressive. The company asserts more than 1,000 tokens per second on an NVIDIA H100 and over 700 tokens per second on an RTX 5090.

Under the hood, DiffusionGemma is a 26-billion-parameter Mixture-of-Experts model, but it does not activate all of that at once. Only about 3.8 billion parameters are used during inference, helping keep compute requirements manageable. Google says this makes it possible to run the model on high-end consumer GPUs when quantized, with a memory footprint of around 18GB VRAM.

Where things get more interesting is how the model actually generates text. It can produce up to 256 tokens in parallel in a single step, and each token can attend to every other token in the block. That gives the model a global view of the output instead of a strictly linear one.

This makes it better suited for structured or rule-based tasks. For example, it can help fill in missing sections of code, complete structured formats like JSON, work through logic-heavy problems such as Sudoku-style puzzles, or handle mathematical patterns where consistency across the whole output matters more than sentence-by-sentence flow. Because it sees the entire block at once, it can also correct contradictions within the same generation cycle, rather than waiting for a later token to fix them.

Advertisement
React:
Sponsored

More to Read

Cash App made a magic wand for contactless payments
๐Ÿ’ป Technology
Cash App made a magic wand for contactless payments
The Verge ยท 9 days ago
Meta is reportedly developing an AI pendant
๐Ÿ’ป Technology
Meta is reportedly developing an AI pendant
TechCrunch ยท 14 days ago
Coders are refusing to work without AIย โ€”ย and that could comโ€ฆ
๐Ÿ’ป Technology
Coders are refusing to work without AIย โ€”ย and that could come back to bite them
TechCrunch ยท 15 days ago
'Astonishing': James Webb telescope spots the most chemicalโ€ฆ
๐Ÿ”ฌ Science
'Astonishing': James Webb telescope spots the most chemically primitive galaxy in the ancโ€ฆ
Live Science ยท 13 days ago
CBS News insiders worry how 60 Minutes will endure after fiโ€ฆ
๐Ÿ’ฐ Business
CBS News insiders worry how 60 Minutes will endure after firings: โ€˜What are they going toโ€ฆ
Guardian Business ยท 9 days ago
Sam Altman says OpenAI's top token spender uses 100 billionโ€ฆ
๐Ÿ“ˆ Markets & Finance
Sam Altman says OpenAI's top token spender uses 100 billion tokens a month โ€” and they're โ€ฆ
Business Insider Mkt ยท 10 days ago
Full view