Daily Tech News, Interviews, Reviews and Updates

Gemma 4 26B A4B now available on Workers AI

Last week, Google introduced its Gemma 4 model, which was claimed to be Google’s most capable family of open models. Recently, Cloudflare has officially announced that it is partnering with Google to bring @cf/google/gemma-4-26b-a4b-it to Workers AI.

Gemma 4 26B A4B available on Workers AI

Gemma 4 26B A4B is a Mixture-of-Experts (MoE) model built from Gemini 3 research, with 26B total parameters and only 4B active per forward pass. It is revealed that by activating a small subset of parameters during inference, the model runs almost as fast as a 4B-parameter model while delivering the quality of a much larger one.

Key capabilities of Gemma 4

  • Mixture-of-Experts architecture with 8 active experts out of 128 total (plus 1 shared expert), delivering frontier-level performance at a fraction of the compute cost of dense models.
  • 256,000 token context window for retaining full conversation history, tool definitions, and long documents across extended sessions.
  • Built-in thinking mode that lets the model reason step-by-step before answering, improving accuracy on complex tasks.
  • Vision understanding for object detection, document and PDF parsing, screen and UI understanding, chart comprehension, OCR (including multilingual), and handwriting recognition, with support for variable aspect ratios and resolutions.
  • Function calling with native support for structured tool use, enabling agentic workflows and multi-step planning.
  • Multilingual with out-of-the-box support for 35+ languages, pre-trained on 140+ languages.
  • Coding for code generation, completion, and correction.

How to access Gemma 4?

Use Gemma 4 26B A4B through the Workers AI binding (env.AI.run()), the REST API at /run or /v1/chat/completions, or the OpenAI-compatible endpoint.

Get real time updates directly on you device, subscribe now.

You might also like