Gemma 4 26B A4B now available on Workers AI

Last week, Google introduced its Gemma 4 model, which was claimed to be Google’s most capable family of open models. Recently, Cloudflare has officially announced that it is partnering with Google to bring @cf/google/gemma-4-26b-a4b-it to Workers AI.
Gemma 4 just landed on the edge on Workers AI!
💎 MoE model with 26B and 4B active, for fast inference
💎 Tool calling, reasoning, vision capabilities. Generates code and is multilingual
💎 256k context window and Chat Completions compatible API
💎 Perfect for building fast…— Cloudflare Developers (@CloudflareDev) April 4, 2026
Gemma 4 26B A4B available on Workers AI
Gemma 4 26B A4B is a Mixture-of-Experts (MoE) model built from Gemini 3 research, with 26B total parameters and only 4B active per forward pass. It is revealed that by activating a small subset of parameters during inference, the model runs almost as fast as a 4B-parameter model while delivering the quality of a much larger one.
Key capabilities of Gemma 4
- Mixture-of-Experts architecture with 8 active experts out of 128 total (plus 1 shared expert), delivering frontier-level performance at a fraction of the compute cost of dense models.
- 256,000 token context window for retaining full conversation history, tool definitions, and long documents across extended sessions.
- Built-in thinking mode that lets the model reason step-by-step before answering, improving accuracy on complex tasks.
- Vision understanding for object detection, document and PDF parsing, screen and UI understanding, chart comprehension, OCR (including multilingual), and handwriting recognition, with support for variable aspect ratios and resolutions.
- Function calling with native support for structured tool use, enabling agentic workflows and multi-step planning.
- Multilingual with out-of-the-box support for 35+ languages, pre-trained on 140+ languages.
- Coding for code generation, completion, and correction.
How to access Gemma 4?
Use Gemma 4 26B A4B through the Workers AI binding (env.AI.run()), the REST API at /run or /v1/chat/completions, or the OpenAI-compatible endpoint.