Z.ai has released GLM-5.2, now the leading open-weights model on the Artificial Analysis Intelligence Index. It is a roughly 753B-parameter Mixture-of-Experts model (about 40B active) shipped under the permissive MIT license, and on Artificial Analysis' new AA-Briefcase agentic knowledge-work eval it scored above GPT-5.5. Z.ai positions it as the most powerful text-only open-weights LLM, built for long-horizon tasks, and it tops its cohort on the new agentic benchmark (alongside Claude Fable in a separate cohort).
Why it matters
For the first time, the strongest open-weights model is not just competitive on static benchmarks but ahead of a closed frontier model on real agentic knowledge work. As Simon Willison notes, GLM-5.2 leads the Intelligence Index among open models while costing a fraction of GPT-5.5 on hosted endpoints. The MIT license removes the usual asterisks: no regional limits, full commercial and research use, and the freedom to self-host. If you have been waiting for an open model you can actually deploy for serious agent workloads, this is it. It is the clearest signal yet that open weights are now a frontier story, not a budget alternative.
The best agentic model you can download and own now beats a leading closed model on knowledge work.
What changes in practice
- Self-hosting is viable for top-tier agents. A 1M-token context and MoE efficiency mean long-horizon trajectories run without a closed API in the loop.
- Cost math flips. Hosted GLM-5.2 runs near $1.40 in / $4.40 out per million tokens, versus roughly $5 / $30 for GPT-5.5.
- Token budgets grow. GLM-5.2 consumes about 43k output tokens per Intelligence Index task, well above leaner models, so reasoning depth is not free.
- Text-only is fine for code. No image input, yet it ranks second on Code Arena WebDev behind only Claude Fable, much like the text-first tradeoff seen in Gemma 4 12B discussions.
How to use it
- Pull the weights from Hugging Face or ModelScope and serve with vLLM, SGLang, or transformers for full control.
- Start hosted via OpenRouter or Z.ai to benchmark on your own tasks before committing infra.
- Tune reasoning effort. Use High or Max levels for long-horizon agent runs, and dial down for cheap routine calls to control that 43k-token tail.
- Wire it into existing agent harnesses (Claude Code, ZCode, OpenCode) since it slots into standard tool-calling loops.
- Watch your output token meter in production: depth is the point, but it is also the bill.
The open-weights frontier just caught up, and it fits on your own hardware.
READY TO ASCEND
Get AI news that respects your time
The signal, distilled. Curated AI news and prompt-engineering insight. No noise.