Gemma 4 26B A4B now available on Workers AI

By Estuti Bajpai Published on April 6, 2026, 10:25 IST

Last week, Google introduced its Gemma 4 model, which was claimed to be Google’s most capable family of open models. Recently, Cloudflare has officially announced that it is partnering with Google to bring @cf/google/gemma-4-26b-a4b-it to Workers AI.

Gemma 4 just landed on the edge on Workers AI!
💎 MoE model with 26B and 4B active, for fast inference
💎 Tool calling, reasoning, vision capabilities. Generates code and is multilingual
💎 256k context window and Chat Completions compatible API
💎 Perfect for building fast…

— Cloudflare Developers (@CloudflareDev) April 4, 2026

Gemma 4 26B A4B available on Workers AI

Gemma 4 26B A4B is a Mixture-of-Experts (MoE) model built from Gemini 3 research, with 26B total parameters and only 4B active per forward pass. It is revealed that by activating a small subset of parameters during inference, the model runs almost as fast as a 4B-parameter model while delivering the quality of a much larger one.

Key capabilities of Gemma 4

Mixture-of-Experts architecture with 8 active experts out of 128 total (plus 1 shared expert), delivering frontier-level performance at a fraction of the compute cost of dense models.
256,000 token context window for retaining full conversation history, tool definitions, and long documents across extended sessions.
Built-in thinking mode that lets the model reason step-by-step before answering, improving accuracy on complex tasks.
Vision understanding for object detection, document and PDF parsing, screen and UI understanding, chart comprehension, OCR (including multilingual), and handwriting recognition, with support for variable aspect ratios and resolutions.
Function calling with native support for structured tool use, enabling agentic workflows and multi-step planning.
Multilingual with out-of-the-box support for 35+ languages, pre-trained on 140+ languages.
Coding for code generation, completion, and correction.

How to access Gemma 4?

Use Gemma 4 26B A4B through the Workers AI binding (env.AI.run()), the REST API at /run or /v1/chat/completions, or the OpenAI-compatible endpoint.

Cloudflare Google