Open Source
Explore the latest AI open-source projects from GitHub and HuggingFace.
Explore the latest AI open-source projects from GitHub and HuggingFace.
GLM-4.5 is an open-source foundation model series developed by Z.AI (formerly Zhipu AI) that unifies agentic task execution, advanced reasoning, and coding in a single architecture. The repository, hosted at github.com/zai-org/GLM-4.5, also includes the updated GLM-4.6 and GLM-4.7 releases. With 4,200 GitHub stars and an Apache-2.0 license, the GLM series has established itself as China's leading contribution to competitive open-source LLMs, achieving benchmark performance comparable to models from OpenAI and Anthropic. ## Architecture and Scale GLM-4.5 uses a Mixture-of-Experts (MoE) architecture with 355 billion total parameters and 32 billion active parameters. The Air variant reduces the footprint to 106 billion total and 12 billion active parameters, maintaining strong performance with lower deployment requirements. The MoE design allows the model to activate only a fraction of parameters for each token, making high parameter counts economically viable for inference. All models in the series support a 128K native context window, expandable to 200K in GLM-4.6 through architectural improvements. The extended context is particularly valuable for agentic workflows that require processing long documents, codebases, or extended conversation histories. ## Hybrid Reasoning Modes A defining feature is the hybrid reasoning system that allows the model to switch between thinking mode and non-thinking mode within the same deployment. Thinking mode activates extended chain-of-thought reasoning for complex problems, using additional compute to trace through multi-step logic. Non-thinking mode provides immediate responses for straightforward queries where deliberate reasoning is unnecessary. Users can select between modes through a parameter flag or allow the model to determine mode automatically. GLM-4.7 introduced interleaved thinking, where the model alternates between reasoning steps and tool calls within a single agentic workflow. This allows the model to gather information through tool use, reason about it, gather more information, and reason again within a single coherent inference pass rather than requiring multiple separate calls. ## Benchmark Performance GLM-4.5 achieved a score of 63.2 on a combined evaluation covering 12 benchmarks across agentic tasks, reasoning, and coding, ranking third among all proprietary and open-source models at the time of release. GLM-4.7 improved SWE-bench performance to 73.8% (a 5.8 percentage point gain over GLM-4.6) and HLE reasoning to 42.8% (a 12.4 percentage point improvement). On SWE-bench Multilingual, which tests coding across non-English programming contexts, GLM-4.7 achieved 66.7%, an improvement of 12.9 percentage points. GLM-5, the successor released on HuggingFace in early 2026, scaled to 744B total parameters with 40B active and claims top performance among open-source models on reasoning, coding, and agentic tasks, including first place among open-source models on Vending Bench 2 with a score of $4,432. ## Agentic Capabilities The models are specifically designed for tool-using and agentic applications. All variants support function calling with structured output, web browsing, code execution, and file manipulation tools. The GLM-4.7 Flash variant (30B-A3B) is optimized for multi-turn agent interactions where response latency and cost matter, running on a single H100 GPU. Integration with popular agent frameworks including Claude Code, Cline, and Roo Code is tested and documented. ## Inference Framework Support GLM-4.5 and its successors work with all major open-source inference frameworks: Hugging Face Transformers for research use, vLLM for high-throughput production serving, SGLang for structured generation workflows, and xLLM for Ascend NPU hardware from Huawei. Fine-tuning is supported through Llama Factory and Swift with configurations for 4 to 128 H20 GPUs depending on model size and precision. ## Accessibility Model weights are distributed through HuggingFace and ModelScope in BF16 and FP8 precision formats. The Apache-2.0 license permits commercial use without restrictions. Quantized versions and GGUF format models are community-maintained, enabling local deployment on consumer hardware for the smaller variants. ## Limitations Full GLM-4.5 inference requires 16 H100 GPUs for BF16 precision, placing it beyond individual developer resources. The Flash variants are more accessible but show capability tradeoffs versus the full models. As a model series developed primarily in Chinese research contexts, English instruction following can occasionally show idiosyncrasies. Benchmark scores reflect controlled evaluation settings that may not capture real-world performance variability across diverse production tasks.

Shubhamsaboo
Collection of 100+ production-ready LLM apps with AI agents, RAG, voice agents, and MCP using OpenAI, Anthropic, Gemini, and open-source models
infiniflow
Leading open-source RAG engine with deep document understanding, grounded citations, and agent capabilities, with 73K+ GitHub stars.