Reviews AI Tools Open Source Live News AI Official

Open Source

Explore the latest AI open-source projects from GitHub and HuggingFace.

GLM-4.5 - Open Source | Evermx | Evermx

Back to Open Source

TrendingFeatured

GLM-4.5

Z.AI (Zhipu AI)Apache-2.0

View on GitHub

LLM4.2K Stars431 Forks292 views

GLM-4.5 is an open-source foundation model series developed by Z.AI (formerly Zhipu AI) that unifies agentic task execution, advanced reasoning, and coding in a single architecture. The repository, hosted at github.com/zai-org/GLM-4.5, also includes the updated GLM-4.6 and GLM-4.7 releases. With 4,200 GitHub stars and an Apache-2.0 license, the GLM series has established itself as China's leading contribution to competitive open-source LLMs, achieving benchmark performance comparable to models from OpenAI and Anthropic. ## Architecture and Scale GLM-4.5 uses a Mixture-of-Experts (MoE) architecture with 355 billion total parameters and 32 billion active parameters. The Air variant reduces the footprint to 106 billion total and 12 billion active parameters, maintaining strong performance with lower deployment requirements. The MoE design allows the model to activate only a fraction of parameters for each token, making high parameter counts economically viable for inference. All models in the series support a 128K native context window, expandable to 200K in GLM-4.6 through architectural improvements. The extended context is particularly valuable for agentic workflows that require processing long documents, codebases, or extended conversation histories. ## Hybrid Reasoning Modes A defining feature is the hybrid reasoning system that allows the model to switch between thinking mode and non-thinking mode within the same deployment. Thinking mode activates extended chain-of-thought reasoning for complex problems, using additional compute to trace through multi-step logic. Non-thinking mode provides immediate responses for straightforward queries where deliberate reasoning is unnecessary. Users can select between modes through a parameter flag or allow the model to determine mode automatically. GLM-4.7 introduced interleaved thinking, where the model alternates between reasoning steps and tool calls within a single agentic workflow. This allows the model to gather information through tool use, reason about it, gather more information, and reason again within a single coherent inference pass rather than requiring multiple separate calls. ## Benchmark Performance GLM-4.5 achieved a score of 63.2 on a combined evaluation covering 12 benchmarks across agentic tasks, reasoning, and coding, ranking third among all proprietary and open-source models at the time of release. GLM-4.7 improved SWE-bench performance to 73.8% (a 5.8 percentage point gain over GLM-4.6) and HLE reasoning to 42.8% (a 12.4 percentage point improvement). On SWE-bench Multilingual, which tests coding across non-English programming contexts, GLM-4.7 achieved 66.7%, an improvement of 12.9 percentage points. GLM-5, the successor released on HuggingFace in early 2026, scaled to 744B total parameters with 40B active and claims top performance among open-source models on reasoning, coding, and agentic tasks, including first place among open-source models on Vending Bench 2 with a score of $4,432. ## Agentic Capabilities The models are specifically designed for tool-using and agentic applications. All variants support function calling with structured output, web browsing, code execution, and file manipulation tools. The GLM-4.7 Flash variant (30B-A3B) is optimized for multi-turn agent interactions where response latency and cost matter, running on a single H100 GPU. Integration with popular agent frameworks including Claude Code, Cline, and Roo Code is tested and documented. ## Inference Framework Support GLM-4.5 and its successors work with all major open-source inference frameworks: Hugging Face Transformers for research use, vLLM for high-throughput production serving, SGLang for structured generation workflows, and xLLM for Ascend NPU hardware from Huawei. Fine-tuning is supported through Llama Factory and Swift with configurations for 4 to 128 H20 GPUs depending on model size and precision. ## Accessibility Model weights are distributed through HuggingFace and ModelScope in BF16 and FP8 precision formats. The Apache-2.0 license permits commercial use without restrictions. Quantized versions and GGUF format models are community-maintained, enabling local deployment on consumer hardware for the smaller variants. ## Limitations Full GLM-4.5 inference requires 16 H100 GPUs for BF16 precision, placing it beyond individual developer resources. The Flash variants are more accessible but show capability tradeoffs versus the full models. As a model series developed primarily in Chinese research contexts, English instruction following can occasionally show idiosyncrasies. Benchmark scores reflect controlled evaluation settings that may not capture real-world performance variability across diverse production tasks.

Key Features

MoE architecture with 355B total / 32B active parameters (GLM-4.5) and 106B/12B Air variant
Hybrid reasoning: switchable thinking and non-thinking modes in a single deployment
128K native context window, extended to 200K in GLM-4.6
73.8% SWE-bench score (GLM-4.7), competing directly with Claude Sonnet-class models
Interleaved thinking: alternating tool calls and reasoning within single agent inference pass
Full agentic support: function calling, web browsing, code execution, and file tools
Compatible with vLLM, SGLang, transformers, and Llama Factory for fine-tuning
Apache-2.0 license with weights on HuggingFace and ModelScope in BF16 and FP8

Related Projects

TrendingLLM

GitHub

159.1K32.8K

Hugging Face Transformers

huggingface

Apache-2.0369

Open Source

GLM-4.5

Key Features

Tags

Related Projects

Hugging Face Transformers

Hermes Agent

LangChain

Open WebUI