Daily Tech News, Interviews, Reviews and Updates

Microsoft introduces its latest AI inference accelerator, Maia 200, in Azure

Yesterday, Microsoft officially introduced Maia 200- an AI inference accelerator built on 3nm process with native FP8/FP4 tensor cores- all within a 750W SoC TDP envelope, a redesigned memory system with 216GB HBM3e at 7 TB/s and 272MB of on-chip SRAM. It is said to be the most efficient inference system Microsoft has ever deployed, with 30% better performance per dollar than the latest generation hardware in Microsoft’s fleet today.

Maia 200 joins Microsoft’s portfolio of CPUs, GPUs, and custom accelerators, giving customers more options to run advanced AI workloads faster and more cost-effectively on Azure.

Maia 200

This AI inference accelerator is said to serve multiple models, including the latest GPT-5.2 models from OpenAI, bringing a performance per dollar advantage to Microsoft Foundry and Microsoft 365 Copilot. It is revealed that this accelerator will be used by the Microsoft Superintelligence team for synthetic data generation and reinforcement learning to improve next-generation in-house models.

Each Maia 200 chip contains over 140 billion transistors and is tailored for large-scale AI workloads. The Maia 200 memory subsystem is centred on narrow-precision datatypes, a specialised DMA engine, on-die SRAM, and a specialised NoC fabric for high-bandwidth data movement, increasing token throughput.

At the systems level, Maia 200 introduces a novel, two-tier scale-up network design built on standard Ethernet. Each accelerator exposes: 2.8 TB/s of bidirectional, dedicated scaleup bandwidth; predictable, high-performance collective operations across clusters of up to 6,144 accelerators.

The unified fabric of Maia 200 simplifies programming, improves workload flexibility, and reduces stranded capacity while maintaining consistent performance and cost efficiency at cloud scale.

A sophisticated pre-silicon environment guided the Maia 200 architecture from its earliest stages, modelling the computation and communication patterns of LLMs with high fidelity. While the accelerator is designed for fast, seamless availability in the datacenter from the beginning, building out early validation of some of the most complex system elements, including the backend network and Microsoft’s second-generation, closed-loop, liquid cooling Heat Exchanger Unit.

Availability

Microsoft is inviting developers, AI startups, and academics to begin exploring early model and workload optimisation with the new Maia 200 software development kit (SDK). The SDK includes a Triton Compiler, support for PyTorch, low-level programming in NPL, and a Maia simulator and cost calculator to optimise for efficiencies earlier in the code lifecycle.

Maia 200 integrates seamlessly with Azure, and the Maia SDK is being previewed to build and optimise models for this accelerator.

Get real time updates directly on you device, subscribe now.

You might also like