NVIDIA Backs Hermes Agent With Qwen AI Models

NVIDIA introduced official support for Nous Research’s Hermes Agent across RTX PCs, RTX PRO workstations, and DGX Spark systems. The company also published a deployment guide that pairs the open-source agent with Alibaba’s Qwen 3.6 models for local AI workloads.

Hermes Agent has grown rapidly since its February 2026 launch. Last week, the framework became the most-used AI agent on OpenRouter after processing more than 224 billion daily tokens. Consequently, Hermes overtook OpenClaw in global usage rankings.

NVIDIA’s rollout focuses on delivering advanced AI agent capabilities directly on local hardware. Therefore, developers can run persistent AI systems without depending entirely on cloud infrastructure.

Hermes Agent Learns Over Time

Unlike traditional chatbots, Hermes Agent continuously improves through long-term memory and reusable skill generation. The framework stores completed tasks and user feedback as reusable workflows that expand the agent’s capabilities over time.

In addition, Hermes uses isolated sub-agents with focused contexts to complete specialized tasks efficiently. As a result, the framework performs effectively with smaller local AI models instead of relying on extremely large context windows.

Nous Research also curates the skills, tools, and plug-ins included with Hermes. Furthermore, developer testing has shown strong performance across multiple agent workflows when compared with competing frameworks.

Qwen Models Power Local AI Systems

NVIDIA highlighted Alibaba’s Qwen 3.6 models as optimized companions for Hermes Agent deployments. The Qwen 3.6 35B model uses a mixture-of-experts architecture with only 3 billion active parameters while operating on about 20GB of memory.

OpenAI Launches ChatGPT for Intune iOS

Meanwhile, the dense Qwen 3.6 27B model reportedly delivers performance similar to significantly larger models at a fraction of the size. Both open-weight models launched in April and target agentic coding and reasoning tasks.

NVIDIA also positioned DGX Spark as a dedicated platform for continuous AI agent workloads. The system combines 128GB of unified memory with one petaflop of AI performance for local deployment environments.

For RTX users, NVIDIA stated that RTX PRO GPUs can generate tokens up to three times faster when running Qwen 3.6 models through llama.cpp. Moreover, developers can deploy Hermes through LM Studio and Ollama with minimal setup requirements.

Hermes Agent Learns Over Time

Qwen Models Power Local AI Systems

Leave a Reply Cancel reply