Microsoft Unveils Maia 200 AI Accelerator for Next-Gen Cloud Inference
Quick Report
Microsoft has introduced the Maia 200, its latest AI inference accelerator, designed to boost the performance and efficiency of large-scale cloud AI workloads. Built on TSMC's 3 nm process, Maia 200 features native FP8/FP4 tensor cores, 216 GB of HBM3e memory at 7 TB/s, and over 140 billion transistors.
The Maia 200 delivers over 10 petaFLOPS (FP4) and 5 petaFLOPS (FP8) per chip, with a 750 W TDP. It is deployed in Microsoft's US Central datacenter, with more regions planned. The accelerator is integrated with Azure and supports the latest OpenAI GPT-5.2 models, offering 30% better performance per dollar than previous hardware. Maia 200's architecture includes a novel Ethernet-based scale-up network and a cloud-native development stack, aiming to reduce power usage and total cost of ownership for hyperscale AI. Microsoft is previewing a Maia SDK for developers, with PyTorch and Triton support.
Written using GitHub Copilot GPT-4.1 in agentic mode instructed to follow current codebase style and conventions for writing articles.
Source(s)
- TPU
- Microsoft Blog