Models Jun 09, 2026 · Top 10 models · 7 sections

Top Models Used by Hermes Agent This Month

The definitive ranking of the most used AI models by Hermes Agent users on OpenRouter this month — ranked by real API traffic measured in trillions of tokens.

OpenRouter Hermes Agent Usage Stats June 2026 API Traffic

Why This Matters for Hermes Agent

Hermes Agent is an autonomous agent that makes hundreds of API calls per session — browsing, executing code, reading files, reasoning through tasks. The model you choose directly impacts your cost, speed, and task quality.

This ranking is based on actual OpenRouter API traffic data — measured in trillions of tokens routed through the platform. The models listed here are the ones Hermes Agent users are actively reaching for right now.

10.9T
Total Tokens Routed
10
Models Ranked
7
Providers
2
Free Models
"The top model alone accounts for 37% of all Hermes Agent API traffic on OpenRouter — nearly 4 trillion tokens this month. The gap between #1 and the rest is staggering." — ZVHH OpenRouter Usage Analysis, June 2026

The Rankings

Top 10 most used models by Hermes Agent users on OpenRouter — ranked by total tokens processed

1

Owl Alpha

Model ID: openrouter/owl-alpha · OpenRouter · Free
Traffic This Month: 4.08T tokens

Context
1.05M tokens
Max Output
262K tokens
Input
Free
Output
Free
Modalities
Text + Image
Released
Apr 2026
👑 The undisputed #1 — free, 1M context, and nearly 4 trillion tokens routed this month

Owl Alpha dominates Hermes Agent workloads on OpenRouter by an enormous margin. At 4.08 trillion tokens — more than the next three models combined — it's the go-to model for agents that make hundreds of API calls per session.

Being free removes all cost barriers for long-running agent sessions, while its 1.05M context window gives agents room to process entire codebases, long conversation histories, and multi-step reasoning chains. Its native tool-use support and strong agentic workflow performance make it a natural fit.

FeatureValue
Tool/Function Calling✅ Native support
Reasoning✅ Adjustable (low/medium/high)
1M+ Context✅ 1.05M tokens
Code Generation✅ Strong
Cost✅ Free
Best ForPrimary agent model, long sessions

2

DeepSeek V4 Flash

Model ID: deepseek/deepseek-v4-flash · DeepSeek · Budget
Traffic This Month: 3.9T tokens

Context
1.05M tokens
Max Output
131K tokens
Parameters
284B / 13B (MoE)
Input
$0.098/M
Output
$0.197/M
Released
Apr 2026
⚡ The value powerhouse — 1.05M context, ultra-low cost, nearly as much traffic as #1

DeepSeek V4 Flash is a Mixture-of-Experts model with 284B total parameters but only 13B activated per token, giving it extraordinary inference speed. At just $0.098/M input and $0.197/M output, it's one of the cheapest models that still delivers top-tier reasoning.

With selectable reasoning levels (high/xhigh), 1.05M context, and hybrid attention for efficient long-context processing, it's built for coding assistants and high-throughput agent workloads. Its near-parity with Owl Alpha in traffic usage suggests many agents route here for cost-sensitive batch operations.

FeatureValue
Tool/Function Calling✅ Excellent
Reasoning✅ High / xhigh selectable
1M+ Context✅ 1.05M tokens
Coding✅ Strong
Cost✅ Ultra-low ($0.098/$0.197 per M)
Best ForBatch processing, coding, cost-sensitive agents

3

DeepSeek V4 Pro

Model ID: deepseek/deepseek-v4-pro · DeepSeek
Traffic This Month: 1.14T tokens

Context
1M tokens
Parameters
1.6T / 49B (MoE)
Input
$0.435/M
Output
$0.87/M
Modalities
Text + Image
Released
Apr 2026
🧠 The heavy hitter — largest model in the top 10 at 1.6T parameters, built for complex agentic work

DeepSeek V4 Pro is the largest model in this ranking at 1.6 trillion total parameters (49B activated), and it shows. Designed for advanced reasoning, long-horizon agent workflows, and full-codebase analysis, it's the choice when Hermes Agent needs maximum intelligence rather than raw throughput.

It shares V4 Flash's architecture — MoE with hybrid attention — but at a much larger scale. Pricing at $0.435/$0.87 per M tokens is still reasonable for a model of this capability, and it supports the same high/xhigh reasoning modes. Agents typically route here for deep research, complex code architecture, and strategic planning tasks.

FeatureValue
Tool/Function Calling✅ Excellent
Reasoning✅ High / xhigh selectable
1M Context✅ Yes
Code Analysis✅ Full-codebase support
Cost✅ Moderate ($0.435/$0.87 per M)
Best ForDeep research, complex reasoning, long-horizon agents

4

Nemotron 3 Super

Model ID: nvidia/nemotron-3-super-120b-a12b · NVIDIA · Free
Traffic This Month: 723B tokens

Context
1M tokens
Max Output
262K tokens
Parameters
120B / 12B (MoE)
Input
Free
Output
Free
Released
Mar 2026
🆓 The free heavyweight — 120B parameters, 1M context, and completely free on OpenRouter

NVIDIA Nemotron 3 Super is a 120B-parameter open model using a hybrid Mamba-Transformer MoE architecture with multi-token prediction. It activates just 12B parameters for cost-efficient inference, yet delivers over 50% higher token generation than leading open models.

It's open under the NVIDIA Open License and free on OpenRouter — making it an excellent choice for budget-conscious agents. Strong performance on AIME 2025, TerminalBench, and SWE-Bench Verified means it punches well above its weight class for a free model.

FeatureValue
ArchitectureHybrid Mamba-Transformer MoE
Multi-Token Prediction✅ Yes (50% faster generation)
1M Context✅ Yes
Open Weights✅ NVIDIA Open License
Cost✅ Free
Best ForBudget agents, coding, free-tier workflows

5

MiniMax M3

Model ID: minimax/minimax-m3 · MiniMax
Traffic This Month: 674B tokens

Context
1M tokens
Max Output
512K tokens
Parameters
674B
Input
$0.30/M
Output
$1.20/M
Released
May 2026
🎬 The multimodal workhorse — text, image, and video input with a massive 512K output window

MiniMax M3 is the most multimodal model in the top 10, supporting text, image, video, and file inputs. Its MiniMax Sparse Attention (MSA) technology replaces full attention with KV-block selection, cutting per-token compute at long context to roughly 1/20 the cost of the previous generation.

Its 512K maximum output — the largest in this ranking — makes it uniquely suited for agents that need to generate extended responses: full documents, code files, or structured data in a single call. The 1M context window means the agent can process an enormous amount of context alongside it.

FeatureValue
Tool/Function Calling✅ Excellent
Vision✅ Image + Video + File
1M Context✅ Yes
Max Output✅ 512K (largest in top 10)
Cost✅ Low ($0.30/$1.20 per M)
Best ForMulti-modal agents, long output generation

6

Step 3.7 Flash

Model ID: stepfun/step-3.7-flash · StepFun
Traffic This Month: 572B tokens

Context
256K tokens
Max Output
256K tokens
Parameters
196B / ~11B (MoE)
Input
$0.20/M
Output
$1.15/M
Released
May 2026
🚀 The speed champion — 48 tokens/sec with adjustable reasoning levels and native video understanding

Step 3.7 Flash is StepFun's high-efficiency multimodal MoE model with 196B parameters but only ~11B activated per token. It achieves 48 tokens/sec throughput — one of the fastest models on OpenRouter — while still supporting selectable reasoning levels (high/medium/low) for trading off speed, cost, and reasoning depth.

Its native image and video understanding (via a vision encoder) makes it a rare choice among fast models that can process multimedia inputs. At $0.20/$1.15 per M tokens, it's priced for high-throughput agent workloads that need to be both fast and cheap.

<>✅ Low ($0.20/$1.15 per M)
FeatureValue
Tool/Function Calling✅ Excellent
Reasoning Levels✅ High / Medium / Low
Throughput✅ 48 tok/s
Vision✅ Image + Video
Cost
Best ForFast agent responses, multimedia, coding

7

Claude Sonnet 4.6

Model ID: anthropic/claude-sonnet-4.6 · Anthropic · Premium
Traffic This Month: 429B tokens

Context
1M tokens
Input
$3.00/M
Output
$15.00/M
Modalities
Text + Image + File + Audio
Released
Feb 2026
🏆 The premium choice — Anthropic's most capable Sonnet model with frontier agent performance

Claude Sonnet 4.6 is Anthropic's most capable Sonnet-class model, delivering frontier performance across coding, agents, and professional work. Despite its premium pricing ($3/$15 per M tokens), it remains in the top 10 because agent workloads that require maximum reliability and quality justify the cost.

It excels at iterative development, complex codebase navigation, end-to-end project management with memory, and confident computer use for web QA and workflow automation. Available through multiple providers including Amazon Bedrock, Claude on AWS, and Google Vertex.

FeatureValue
Tool/Function Calling✅ Best-in-class
Computer Use✅ Confident web QA
1M Context✅ Yes
Structured Outputs✅ Supported
ProvidersAWS, Bedrock, Vertex
Best ForPremium agent tasks, codebase navigation

8

MiniMax M2.7

Model ID: minimax/minimax-m2.7 · MiniMax
Traffic This Month: 408B tokens

Context
205K tokens
Max Output
131K tokens
Parameters
408B
Input
$0.30/M
Output
$1.20/M
Released
Mar 2026
💼 The productivity specialist — multi-agent collaboration, live debugging, full document generation

MiniMax M2.7 is designed for autonomous, real-world productivity. It integrates multi-agent collaboration for workflows like live debugging, root cause analysis, financial modeling, and full document generation across Word, Excel, and PowerPoint.

With 56.2% on SWE-Pro and 57.0% on Terminal Bench 2, it demonstrates strong agentic coding capabilities. It's available through multiple providers (NovitaAI, Morph, Fireworks) and continues to improve with each update.

FeatureValue
Multi-Agent✅ Collaboration support
Tool/Function Calling✅ Excellent
Vision✅ Image support
SWE-Pro✅ 56.2%
Cost✅ Low ($0.30/$1.20 per M)
Best ForProductivity agents, document generation

9

Qwen3.6 Plus

Model ID: qwen/qwen3.6-plus · Alibaba
Traffic This Month: 323B tokens

Context
1M tokens
Max Output
65K tokens
Architecture
Hybrid linear attention + sparse MoE
Input
$0.325/M
Output
$1.95/M
Released
Apr 2026
🇨🇳 The vibe coding champion — hybrid architecture with major gains in agentic coding and front-end development

Qwen 3.6 Plus uses a hybrid architecture combining efficient linear attention with sparse MoE routing, enabling strong scalability and high-performance inference. It delivers major gains in agentic coding, front-end development, and 3D scene generation.

With a 78.8 score on SWE-bench Verified and multi-modal interactive agent capability (perceiving real-world scenes and interacting with GUIs), it's a strong choice for agents that need to navigate real-world interfaces. It also supports audio input for multimodal workflows.

FeatureValue
SWE-bench Verified✅ 78.8 score
1M Context✅ Yes
Vision + Audio✅ Multimodal
GUI Interaction✅ Native agent capability
Cost✅ Low ($0.325/$1.95 per M)
Best ForAgentic coding, GUI interaction, vibe coding

10

Kimi K2.6

Model ID: moonshotai/kimi-k2.6 · Moonshot AI
Traffic This Month: 193B tokens

Context
262K tokens
Max Output
262K tokens
Parameters
193B
Input
$0.68/M
Output
$3.41/M
Released
Apr 2026
🐝 The agent swarm — scales to hundreds of parallel sub-agents, delivers complete projects from a single prompt

Kimi K2.6 is Moonshot AI's next-generation model with an agent swarm architecture that scales to hundreds of parallel sub-agents for autonomous task decomposition. It can deliver complete documents, websites, and spreadsheets in a single run without human oversight.

Its long-horizon coding capabilities across Python, Rust, and Go — combined with coding-driven UI/UX generation from prompts and visual inputs — make it uniquely suited for agents that need to build complete applications end-to-end. Available through io.net, Baidu Qianfan, and Inceptron.

FeatureValue
Agent Swarm✅ Hundreds of parallel sub-agents
Tool/Function Calling✅ Excellent
Vision✅ Image + File
Multi-Language✅ Python, Rust, Go
Cost✅ Moderate ($0.68/$3.41 per M)
Best ForLong-horizon coding, project generation

Usage Distribution

How Hermes Agent API traffic breaks down across provider families this month:

CategoryEst. TrafficModels
DeepSeek Family5.04T (46%)V4 Flash, V4 Pro
OpenRouter / Owl4.08T (37%)Owl Alpha
MiniMax Family1.08T (10%)M3, M2.7
NVIDIA723B (7%)Nemotron 3 Super
Anthropic429B (4%)Claude Sonnet 4.6
StepFun572B (5%)Step 3.7 Flash
Alibaba / Moonshot516B (5%)Qwen3.6 Plus, Kimi K2.6

Pricing Comparison

A quick reference for cost-conscious model selection:

RankModelInput/MOutput/MContextCost Tier
🥇 1Owl AlphaFreeFree1.05MFree
🥈 2DeepSeek V4 Flash$0.098$0.1971.05MUltra Budget
🥉 3DeepSeek V4 Pro$0.435$0.871MBudget
4Nemotron 3 SuperFreeFree1MFree
5MiniMax M3$0.30$1.201MBudget
6Step 3.7 Flash$0.20$1.15256KBudget
7Claude Sonnet 4.6$3.00$15.001MPremium
8MiniMax M2.7$0.30$1.20205KBudget
9Qwen3.6 Plus$0.325$1.951MBudget
10Kimi K2.6$0.68$3.41262KModerate

My Recommendation for Hermes Agent

👑 Primary Model: Owl Alpha

Free, 1.05M context, and the most used model by a wide margin. Start here for any agent task and you won't go wrong.

🧠 Deep Reasoning: DeepSeek V4 Pro

When you need maximum intelligence — complex code analysis, strategic planning, or deep research — V4 Pro's 1.6T parameters deliver the best results.

⚡ Speed + Budget: DeepSeek V4 Flash

Nearly as much traffic as #1 but at 1/500th the cost of Claude. Perfect for high-throughput batch operations where speed matters.

🆓 Best Free Option: Nemotron 3 Super

When Owl Alpha isn't suitable, Nemotron 3 Super offers 120B parameters, 1M context, and strong benchmarks — all free.

Conclusion

The most used models for Hermes Agent on OpenRouter this month tell a clear story: free and budget-friendly models dominate, with DeepSeek and OpenRouter's own Owl Alpha capturing 83% of all API traffic. Premium models like Claude Sonnet 4.6 maintain a presence but represent a small fraction of total usage.

The trend is clear: Hermes Agent users are optimizing for throughput and cost-effectiveness, reserving premium models for tasks that genuinely need maximum intelligence. The gap between #1 (Owl Alpha at 4.08T tokens) and the rest is enormous — nearly 4 trillion more tokens than any other model.