Google unveils DiffusionGemma, an AI model that breaks free of left-to-right processing

It can also save users money. Technology analyst Carmi Levy noted that existing pay-per-token monetization models “penalize the use of less than optimally efficient AI solutions.”

But DiffusionGemma “could herald a new generation of task-defined, efficient solutions that can enable expanded compute capacity without draining the operations budget,” he said.

A contrast to left-to-right processing

Built on Google’s Gemma 4 family and its Gemini Diffusion research, DiffusionGemma is a 26B mixture-of-experts (MoE) model designed to maximize text output generation.

It essentially shifts how models use hardware, giving processors a larger hunk of work each cycle so it can draft full 256-token paragraphs in sequence. This allows the model to generate text up to 4x faster on GPUs, Google claims. It activates only 3.8B parameters during inference, and, when quantized, can fit within 18GB VRAM on high-end consumer GPUs like Nvidia RTX 5090.

AWS Tunes Up Graviton5 For Agentic AI, Boosts Bang For The Buck Bigtime

Critical UpdraftPlus Flaw Puts 3 Million WordPress Sites At Risk

“Don’t just grab random stuff off the internet”: What Chainguard found in 52,000 open-source packages

Paxton’s win over Cornyn sets up high-stakes Texas clash with Talarico

Global Resources Outlook 2024 | UNEP

Texas Democrat Talarico claims voting laws are rigged ahead of Paxton race