IBM launches ‘Granite 4.0’ an hyper-efficient and high-performance hybrid model for enterprise

October 6, 2025

NEW DELHI, Oct 5: US-based tech company IBM launches ‘Granite 4’ the next generation of IBM language models, an official blog said. ‘Granite 4.0’ features a new hybrid Mamba/transformer architecture that greatly reduces memory requirements without sacrificing performance.

“The launch of Granite 4.0 initiates a new era for IBM’s family of enterprise-ready large language models, leveraging novel architecture advancements to double down on small, efficient language models that provide competitive performance at reduced costs and latency.

The Granite 4.0 models were developed with a particular emphasis on essential tasks for agentic workflows, both in standalone deployments and as cost-efficient building blocks in complex systems alongside larger reasoning models.” said official blog.

The Granite 4.0 collection comprises multiple model sizes and architecture styles to provide optimal production across a wide array of hardware constraints including :- (i) Granite-4.0-H-Small (ii) Granite-4.0-H-Tiny (iii) Granite-4.0-H-Micro (iv) Granite-4.0-Micro Granite 4.0-H Small is a workhorse model for strong, cost-effective performance on enterprise workflows like multi-tool agents and customer support automation.

Moreover, the Tiny and Micro models are designed for low latency, edge and local applications, and can serve as a building block within larger agentic workflows for fast execution of key tasks such as function calling. Granite 4.0 benchmark performance shows substantial improvements over prior generations–even smallest Granite 4.0 models significantly outperform Granite 3.3 8B, despite being less than half its size–but their most notable strength is a remarkable increase in inference efficiency.

“Relative to conventional LLMs, our hybrid Granite 4.0 models require significantly less RAM to run, especially for tasks involving long context lengths ( like ingesting a large code base or extensive documentation) and multiple sessions at the same time (like a customer service agent handling many detailed user inquiries simultaneously).” it said.

“Granite 4.0 models are now available across a broad spectrum of platform providers and inference frameworks for us as both fast and efficient standalone workhorse models and key building blocks of ensemble workflows alongside leading large frontier models. You can try them out on the Granite Playground.”

“The new Granite Hybrid architecture has full, optimized support in vLLM 0.10.2 and Hugging Face Transformers. The Granite Hybrid architecture is also supported in IIama.cpp and MLX, though work to fully optimize throughput in these runtimes is still ongoing. We thank our ecosystem partners for their collaboration and hope that our work will help facilitate further experimentation with hybrid models.” said official blog.

(UNI0

Follow our WhatsApp channel

Farooq survives assassination attempt as man opens firing from pistol at wedding function

Enough LPG supplies to meet household needs, says Govt

Many MLAs including from NC move pvt members’ bills seeking ban on liquor, tobacco products in J&K

CM reviews plans to establish new JK Houses in Delhi, Mumbai

Agrarian Reforms Act: Revisional authority can step in where statutory procedure violated: HC