Top Models Used by Hermes Agent This Month on OpenRouter

Why This Matters for Hermes Agent

Hermes Agent is an autonomous agent that makes hundreds of API calls per session — browsing, executing code, reading files, reasoning through tasks. The model you choose directly impacts your cost, speed, and task quality.

This ranking is based on actual OpenRouter API traffic data — measured in trillions of tokens routed through the platform. The models listed here are the ones Hermes Agent users are actively reaching for right now.

10.9T

Total Tokens Routed

Models Ranked

Providers

Free Models

"The top model alone accounts for 37% of all Hermes Agent API traffic on OpenRouter — nearly 4 trillion tokens this month. The gap between #1 and the rest is staggering." — ZVHH OpenRouter Usage Analysis, June 2026

The Rankings

Top 10 most used models by Hermes Agent users on OpenRouter — ranked by total tokens processed

Owl Alpha

Model ID: openrouter/owl-alpha · OpenRouter · Free
Traffic This Month: 4.08T tokens

Context

1.05M tokens

Max Output

262K tokens

Input

Free

Output

Free

Modalities

Text + Image

Released

Apr 2026

👑 The undisputed #1 — free, 1M context, and nearly 4 trillion tokens routed this month

Owl Alpha dominates Hermes Agent workloads on OpenRouter by an enormous margin. At 4.08 trillion tokens — more than the next three models combined — it's the go-to model for agents that make hundreds of API calls per session.

Being free removes all cost barriers for long-running agent sessions, while its 1.05M context window gives agents room to process entire codebases, long conversation histories, and multi-step reasoning chains. Its native tool-use support and strong agentic workflow performance make it a natural fit.

Feature	Value
Tool/Function Calling	✅ Native support
Reasoning	✅ Adjustable (low/medium/high)
1M+ Context	✅ 1.05M tokens
Code Generation	✅ Strong
Cost	✅ Free
Best For	Primary agent model, long sessions

DeepSeek V4 Flash

Model ID: deepseek/deepseek-v4-flash · DeepSeek · Budget
Traffic This Month: 3.9T tokens

Context

1.05M tokens

Max Output

131K tokens

Parameters

284B / 13B (MoE)

Input

$0.098/M

Output

$0.197/M

Released

Apr 2026

⚡ The value powerhouse — 1.05M context, ultra-low cost, nearly as much traffic as #1

DeepSeek V4 Flash is a Mixture-of-Experts model with 284B total parameters but only 13B activated per token, giving it extraordinary inference speed. At just $0.098/M input and $0.197/M output, it's one of the cheapest models that still delivers top-tier reasoning.

With selectable reasoning levels (high/xhigh), 1.05M context, and hybrid attention for efficient long-context processing, it's built for coding assistants and high-throughput agent workloads. Its near-parity with Owl Alpha in traffic usage suggests many agents route here for cost-sensitive batch operations.

Feature	Value
Tool/Function Calling	✅ Excellent
Reasoning	✅ High / xhigh selectable
1M+ Context	✅ 1.05M tokens
Coding	✅ Strong
Cost	✅ Ultra-low ($0.098/$0.197 per M)
Best For	Batch processing, coding, cost-sensitive agents

DeepSeek V4 Pro

Model ID: deepseek/deepseek-v4-pro · DeepSeek
Traffic This Month: 1.14T tokens

Context

1M tokens

Parameters

1.6T / 49B (MoE)

Input

$0.435/M

Output

$0.87/M

Modalities

Text + Image

Released

Apr 2026

🧠 The heavy hitter — largest model in the top 10 at 1.6T parameters, built for complex agentic work

DeepSeek V4 Pro is the largest model in this ranking at 1.6 trillion total parameters (49B activated), and it shows. Designed for advanced reasoning, long-horizon agent workflows, and full-codebase analysis, it's the choice when Hermes Agent needs maximum intelligence rather than raw throughput.

It shares V4 Flash's architecture — MoE with hybrid attention — but at a much larger scale. Pricing at $0.435/$0.87 per M tokens is still reasonable for a model of this capability, and it supports the same high/xhigh reasoning modes. Agents typically route here for deep research, complex code architecture, and strategic planning tasks.

Feature	Value
Tool/Function Calling	✅ Excellent
Reasoning	✅ High / xhigh selectable
1M Context	✅ Yes
Code Analysis	✅ Full-codebase support
Cost	✅ Moderate ($0.435/$0.87 per M)
Best For	Deep research, complex reasoning, long-horizon agents

Nemotron 3 Super

Model ID: nvidia/nemotron-3-super-120b-a12b · NVIDIA · Free
Traffic This Month: 723B tokens

Context

1M tokens

Max Output

262K tokens

Parameters

120B / 12B (MoE)

Input

Free

Output

Free

Released

Mar 2026

🆓 The free heavyweight — 120B parameters, 1M context, and completely free on OpenRouter

NVIDIA Nemotron 3 Super is a 120B-parameter open model using a hybrid Mamba-Transformer MoE architecture with multi-token prediction. It activates just 12B parameters for cost-efficient inference, yet delivers over 50% higher token generation than leading open models.

It's open under the NVIDIA Open License and free on OpenRouter — making it an excellent choice for budget-conscious agents. Strong performance on AIME 2025, TerminalBench, and SWE-Bench Verified means it punches well above its weight class for a free model.

Feature	Value
Architecture	Hybrid Mamba-Transformer MoE
Multi-Token Prediction	✅ Yes (50% faster generation)
1M Context	✅ Yes
Open Weights	✅ NVIDIA Open License
Cost	✅ Free
Best For	Budget agents, coding, free-tier workflows

MiniMax M3

Model ID: minimax/minimax-m3 · MiniMax
Traffic This Month: 674B tokens

Context

1M tokens

Max Output

512K tokens

Parameters

674B

Input

$0.30/M

Output

$1.20/M

Released

May 2026

🎬 The multimodal workhorse — text, image, and video input with a massive 512K output window

MiniMax M3 is the most multimodal model in the top 10, supporting text, image, video, and file inputs. Its MiniMax Sparse Attention (MSA) technology replaces full attention with KV-block selection, cutting per-token compute at long context to roughly 1/20 the cost of the previous generation.

Its 512K maximum output — the largest in this ranking — makes it uniquely suited for agents that need to generate extended responses: full documents, code files, or structured data in a single call. The 1M context window means the agent can process an enormous amount of context alongside it.

Feature	Value
Tool/Function Calling	✅ Excellent
Vision	✅ Image + Video + File
1M Context	✅ Yes
Max Output	✅ 512K (largest in top 10)
Cost	✅ Low ($0.30/$1.20 per M)
Best For	Multi-modal agents, long output generation

Step 3.7 Flash

Model ID: stepfun/step-3.7-flash · StepFun
Traffic This Month: 572B tokens

Context

256K tokens

Max Output

256K tokens

Parameters

196B / ~11B (MoE)

Input

$0.20/M

Output

$1.15/M

Released

May 2026

🚀 The speed champion — 48 tokens/sec with adjustable reasoning levels and native video understanding

Step 3.7 Flash is StepFun's high-efficiency multimodal MoE model with 196B parameters but only ~11B activated per token. It achieves 48 tokens/sec throughput — one of the fastest models on OpenRouter — while still supporting selectable reasoning levels (high/medium/low) for trading off speed, cost, and reasoning depth.

Its native image and video understanding (via a vision encoder) makes it a rare choice among fast models that can process multimedia inputs. At $0.20/$1.15 per M tokens, it's priced for high-throughput agent workloads that need to be both fast and cheap.

<>✅ Low ($0.20/$1.15 per M)

Feature	Value
Tool/Function Calling	✅ Excellent
Reasoning Levels	✅ High / Medium / Low
Throughput	✅ 48 tok/s
Vision	✅ Image + Video
Cost
Best For	Fast agent responses, multimedia, coding

Claude Sonnet 4.6

Model ID: anthropic/claude-sonnet-4.6 · Anthropic · Premium
Traffic This Month: 429B tokens

Context

1M tokens

Input

$3.00/M

Output

$15.00/M

Modalities

Text + Image + File + Audio

Released

Feb 2026

🏆 The premium choice — Anthropic's most capable Sonnet model with frontier agent performance

Claude Sonnet 4.6 is Anthropic's most capable Sonnet-class model, delivering frontier performance across coding, agents, and professional work. Despite its premium pricing ($3/$15 per M tokens), it remains in the top 10 because agent workloads that require maximum reliability and quality justify the cost.

It excels at iterative development, complex codebase navigation, end-to-end project management with memory, and confident computer use for web QA and workflow automation. Available through multiple providers including Amazon Bedrock, Claude on AWS, and Google Vertex.

Feature	Value
Tool/Function Calling	✅ Best-in-class
Computer Use	✅ Confident web QA
1M Context	✅ Yes
Structured Outputs	✅ Supported
Providers	AWS, Bedrock, Vertex
Best For	Premium agent tasks, codebase navigation

MiniMax M2.7

Model ID: minimax/minimax-m2.7 · MiniMax
Traffic This Month: 408B tokens

Context

205K tokens

Max Output

131K tokens

Parameters

408B

Input

$0.30/M

Output

$1.20/M

Released

Mar 2026

💼 The productivity specialist — multi-agent collaboration, live debugging, full document generation

MiniMax M2.7 is designed for autonomous, real-world productivity. It integrates multi-agent collaboration for workflows like live debugging, root cause analysis, financial modeling, and full document generation across Word, Excel, and PowerPoint.

With 56.2% on SWE-Pro and 57.0% on Terminal Bench 2, it demonstrates strong agentic coding capabilities. It's available through multiple providers (NovitaAI, Morph, Fireworks) and continues to improve with each update.

Feature	Value
Multi-Agent	✅ Collaboration support
Tool/Function Calling	✅ Excellent
Vision	✅ Image support
SWE-Pro	✅ 56.2%
Cost	✅ Low ($0.30/$1.20 per M)
Best For	Productivity agents, document generation

Qwen3.6 Plus

Model ID: qwen/qwen3.6-plus · Alibaba
Traffic This Month: 323B tokens

Context

1M tokens

Max Output

65K tokens

Architecture

Hybrid linear attention + sparse MoE

Input

$0.325/M

Output

$1.95/M

Released

Apr 2026

🇨🇳 The vibe coding champion — hybrid architecture with major gains in agentic coding and front-end development

Qwen 3.6 Plus uses a hybrid architecture combining efficient linear attention with sparse MoE routing, enabling strong scalability and high-performance inference. It delivers major gains in agentic coding, front-end development, and 3D scene generation.

With a 78.8 score on SWE-bench Verified and multi-modal interactive agent capability (perceiving real-world scenes and interacting with GUIs), it's a strong choice for agents that need to navigate real-world interfaces. It also supports audio input for multimodal workflows.

Feature	Value
SWE-bench Verified	✅ 78.8 score
1M Context	✅ Yes
Vision + Audio	✅ Multimodal
GUI Interaction	✅ Native agent capability
Cost	✅ Low ($0.325/$1.95 per M)
Best For	Agentic coding, GUI interaction, vibe coding

Kimi K2.6

Model ID: moonshotai/kimi-k2.6 · Moonshot AI
Traffic This Month: 193B tokens

Context

262K tokens

Max Output

262K tokens

Parameters

193B

Input

$0.68/M

Output

$3.41/M

Released

Apr 2026

🐝 The agent swarm — scales to hundreds of parallel sub-agents, delivers complete projects from a single prompt

Kimi K2.6 is Moonshot AI's next-generation model with an agent swarm architecture that scales to hundreds of parallel sub-agents for autonomous task decomposition. It can deliver complete documents, websites, and spreadsheets in a single run without human oversight.

Its long-horizon coding capabilities across Python, Rust, and Go — combined with coding-driven UI/UX generation from prompts and visual inputs — make it uniquely suited for agents that need to build complete applications end-to-end. Available through io.net, Baidu Qianfan, and Inceptron.

Feature	Value
Agent Swarm	✅ Hundreds of parallel sub-agents
Tool/Function Calling	✅ Excellent
Vision	✅ Image + File
Multi-Language	✅ Python, Rust, Go
Cost	✅ Moderate ($0.68/$3.41 per M)
Best For	Long-horizon coding, project generation

Usage Distribution

How Hermes Agent API traffic breaks down across provider families this month:

Category	Est. Traffic	Models
DeepSeek Family	5.04T (46%)	V4 Flash, V4 Pro
OpenRouter / Owl	4.08T (37%)	Owl Alpha
MiniMax Family	1.08T (10%)	M3, M2.7
NVIDIA	723B (7%)	Nemotron 3 Super
Anthropic	429B (4%)	Claude Sonnet 4.6
StepFun	572B (5%)	Step 3.7 Flash
Alibaba / Moonshot	516B (5%)	Qwen3.6 Plus, Kimi K2.6

Pricing Comparison

A quick reference for cost-conscious model selection:

Rank	Model	Input/M	Output/M	Context	Cost Tier
🥇 1	Owl Alpha	Free	Free	1.05M	Free
🥈 2	DeepSeek V4 Flash	$0.098	$0.197	1.05M	Ultra Budget
🥉 3	DeepSeek V4 Pro	$0.435	$0.87	1M	Budget
4	Nemotron 3 Super	Free	Free	1M	Free
5	MiniMax M3	$0.30	$1.20	1M	Budget
6	Step 3.7 Flash	$0.20	$1.15	256K	Budget
7	Claude Sonnet 4.6	$3.00	$15.00	1M	Premium
8	MiniMax M2.7	$0.30	$1.20	205K	Budget
9	Qwen3.6 Plus	$0.325	$1.95	1M	Budget
10	Kimi K2.6	$0.68	$3.41	262K	Moderate

My Recommendation for Hermes Agent

👑 Primary Model: Owl Alpha

Free, 1.05M context, and the most used model by a wide margin. Start here for any agent task and you won't go wrong.

🧠 Deep Reasoning: DeepSeek V4 Pro

When you need maximum intelligence — complex code analysis, strategic planning, or deep research — V4 Pro's 1.6T parameters deliver the best results.

⚡ Speed + Budget: DeepSeek V4 Flash

Nearly as much traffic as #1 but at 1/500th the cost of Claude. Perfect for high-throughput batch operations where speed matters.

🆓 Best Free Option: Nemotron 3 Super

When Owl Alpha isn't suitable, Nemotron 3 Super offers 120B parameters, 1M context, and strong benchmarks — all free.

Conclusion

The most used models for Hermes Agent on OpenRouter this month tell a clear story: free and budget-friendly models dominate, with DeepSeek and OpenRouter's own Owl Alpha capturing 83% of all API traffic. Premium models like Claude Sonnet 4.6 maintain a presence but represent a small fraction of total usage.

The trend is clear: Hermes Agent users are optimizing for throughput and cost-effectiveness, reserving premium models for tasks that genuinely need maximum intelligence. The gap between #1 (Owl Alpha at 4.08T tokens) and the rest is enormous — nearly 4 trillion more tokens than any other model.