On-device AI agents hit a hard memory limit. Apple's new architecture routes around it.
On-device AI models have stayed small because the entire weight set has to live in DRAM, capping practical parameter counts well below what server-side deployments use. Enterprise architects evaluating agentic workloads have had to choose between capable cloud-dependent models an
On-device AI models have stayed small because the entire weight set has to live in DRAM, capping practical parameter counts well below what server-side deployments use. Enterprise architects evaluating agentic workloads have had to choose between capable cloud-dependent models and limited on-device ones. Apple's third-generation foundation models, announced at WWDC26, break that constraint by moving the weight set off DRAM entirely . The AFM 3 family was developed in collaboration with Google and spans five models: two on-device and three server-based, all running within Apple's Private Cloud Compute boundary. The server-side models, including AFM 3 Cloud Pro for agentic tool use and complex reasoning, run on Nvidia GPUs in Google Cloud. The on-device architecture is Apple's own. AFM 3 Core Advanced is a 20-billion-parameter model that stores weights in NAND flash rather than DRAM. "Instead of forcing the entire model into DRAM, the full model is stored in flash memory," Apple's resear
This report comes from VentureBeat. The story centres on On-device AI agents hit a hard memory limit. Apple's new architecture routes around it.. Full coverage and background context is available at the original source. Readers seeking more detail on this developing topic are encouraged to follow updates from VentureBeat and related outlets covering this beat.

