💻 Technology

On-device AI agents hit a hard memory limit. Apple's new architecture routes around it.

VentureBeat

9 Jun 2026 3 days ago 1 min read

On-device AI agents hit a hard memory limit. Apple's new architecture routes around it.

VentureBeat — 9 June 2026

Text:

10 0 0

🎙️ AI Podcast — Two-Host Discussion

On-device AI agents hit a hard memory limit. Apple's new architecture routes ar…

Kokoro TTS · ~5 min episode · American English voices

Choose voices for Host A and Host B. Changes take effect on next play.

Host A 🟥

Host B 🟦

On-device AI models have stayed small because the entire weight set has to live in DRAM, capping practical parameter counts well below what server-side deployments use. Enterprise architects evaluating agentic workloads have had to choose between capable cloud-dependent models and limited on-device ones. Apple's third-generation foundation models, announced at WWDC26, break that constraint by moving the weight set off DRAM entirely . The AFM 3 family was developed in collaboration with Google and spans five models: two on-device and three server-based, all running within Apple's Private Cloud Compute boundary. The server-side models, including AFM 3 Cloud Pro for agentic tool use and complex reasoning, run on Nvidia GPUs in Google Cloud. The on-device architecture is Apple's own. AFM 3 Core Advanced is a 20-billion-parameter model that stores weights in NAND flash rather than DRAM. "Instead of forcing the entire model into DRAM, the full model is stored in flash memory," Apple's resear

This report comes from VentureBeat. The story centres on On-device AI agents hit a hard memory limit. Apple's new architecture routes around it.. Full coverage and background context is available at the original source. Readers seeking more detail on this developing topic are encouraged to follow updates from VentureBeat and related outlets covering this beat.