Models Jun 09, 2026 · Official recommendations · 7 sections

Recommended Models for Hermes Agent
Based on Hermes Agent Creator

The official model recommendations from Nous Research's Hermes Agent — ranked by tier (frontier, fast, free, local), with configuration tips and cost optimization strategies from the creators themselves.

Hermes Agent Nous Research Official Configuration June 2026

What the Creators Say

Hermes Agent, built by Nous Research, has a built-in model catalog and configuration system that defines exactly which models work best for different agent tasks. This article compiles the official recommendations from the Hermes Agent documentation, model catalog, and configuration guides.

Hermes Agent uses a two-model architecture: a main model for reasoning and a set of auxiliary models (up to 11 slots) for side-jobs like vision, compression, title generation, and more. Each can be configured independently.

4
Model Tiers
300+
Models (Portal)
11
Auxiliary Slots
18
Providers
"Claude Sonnet 4.6 is the 'best general-purpose agentic model.' GPT-5.5 Pro for 'strong reasoning + tool calling.' Gemini 3 Pro for 'huge context window.' DeepSeek V4 Pro for 'cost-effective coder.'" — Hermes Agent Creator Documentation

Model Tiers — Official Recommendations

Hermes Agent Creator categorizes models into 4 tiers based on use case

Tier 1 — Frontier Agentic

Best for Complex Reasoning & Multi-Step Tool-Calling

These are the recommended models for agent work when you need maximum intelligence. Use these as your main model.

# Recommended frontier models per Hermes Agent Creator
# Best general-purpose agentic model:
anthropic/claude-sonnet-4.6

# Strong reasoning + tool calling:
openai/gpt-5.5-pro

# Huge context window:
google/gemini-3-pro-preview

# Cost-effective coder:
deepseek/deepseek-v4-pro

# Additional frontier options:
anthropic/claude-opus-4.8
openai/gpt-5.5
moonshotai/kimi-k2.6
x-ai/grok-4.3

Minimum context requirement: 64K tokens (128K recommended for optimal multi-step tool-calling workflows).

Tier 2 — Fast Economy

Faster, Cheaper Models for Simple Tasks

Recommended for simple tasks like formatting, renaming, boilerplate generation, and auxiliary side-jobs. Use /model to switch from frontier to fast models mid-session.

# Fast economy models per Hermes Agent Creator
openai/gpt-5.4-mini
google/gemini-3.5-flash
anthropic/claude-haiku-4.5
deepseek/deepseek-v4-flash
google/gemini-3.1-pro-preview
qwen/qwen3.7-plus
Tier 3 — Free Tier

Free Models for Cost-Effective Experimentation

Available through OpenRouter and Nous Portal free tier. Great for experimentation and lightweight tasks.

# Free models per Hermes Agent Creator
openrouter/elephant-alpha
openrouter/owl-alpha
poolside/laguna-m.1:free
tencent/hy3-preview:free
nvidia/nemotron-3-super-120b-a12b:free
nvidia/nemotron-3-ultra-550b-a55b:free
inclusionai/ring-2.6-1t:free

Note: Nemotron 3 Ultra was offered free on Nous Portal June 4-18, 2026. Owl Alpha is currently free on OpenRouter — use it while it lasts.

Tier 4 — Local Models

Self-Hosted for Privacy & Zero API Costs

Hermes Agent Creator's recommended local models for running on your own hardware:

# Best local model (8-16 GB machines):
Qwen3.5-9B (Q4_K_M GGUF)
# Size: 5.3 GB · RAM: ~10 GB · Context: 128K
# Backend: llama.cpp

# Best for Apple Silicon:
Qwen3.5-9B (mlx-lm MXP4)
# Size: ~5 GB · RAM: ~12 GB
# Backend: omlx (Apple MLX) — 37% faster than llama.cpp

Additional local options: Qwen3.5-4B-MTP (minimal RAM), Qwen3.5:397b (Ollama Cloud), Qwen3-Coder:480b (Ollama Cloud), Mistral-Large-3:675b (Ollama Cloud).

Important local model flag: Set --ctx-size 65536 for llama.cpp or -c 65536 for Ollama to meet the minimum context requirement.

⚠️ Important: Models NOT Recommended Inside Hermes Agent

Hermes-4-70B / Hermes-4-405B — NOT for Inside Agent

Nous Research's own models are NOT recommended for use INSIDE Hermes Agent. They are frontier hybrid-reasoning chat models tuned for chat and reasoning, not the rapid-fire tool-calling loop the agent relies on.

Use them for Nous Chat, research workflows, or via subscription proxy — but not as your agent's main model.

Auxiliary Models — The 11 Side-Job Slots

Hermes Agent uses auxiliary (smaller) models for side-jobs. Each has its own slot and can be overridden independently from the main model. This is where you save money.

📝 Title Gen

A cheap flash model writes session titles as well as Opus. google/gemini-3-flash-preview on OpenRouter.

👁️ Vision

When main model lacks vision. Point at google/gemini-2.5-flash or gpt-4o-mini for image analysis.

📦 Compression

When burning reasoning tokens on Opus just to summarize context. A fast chat model does the job at 1/50th the cost.

✅ Approval

For approval_mode: smart. A fast/cheap model (Haiku, Flash, GPT-5-mini) decides whether to auto-approve low-risk commands.

🌐 Web Extract

When using web_extract heavily. Summarization doesn't need reasoning — use a cheap flash model.

🔧 Skills Hub

Usually fine at auto (use main model). hermes skills search uses this slot.

🔌 MCP

Usually fine at auto (use main model). MCP tool routing.

🔀 Triage Specifier

A cheap, capable model works well. Routes Kanban triage specifier.

📋 Kanban Decomposer

Routes Kanban task decomposition — splits triage into child tasks.

👤 Profile Describer

Short, cheap call. Profile-description generation.

🧹 Curator

Can run for minutes on reasoning models, so a cheaper aux model is often worthwhile. Routes the curator skill-usage review pass.

"Override auxiliary tasks with cheaper models to reduce costs by up to 50x for summarization tasks. A fast chat model does compression at 1/50th the cost of Opus." — Hermes Agent Creator Documentation

Configuration — How to Set It Up

Here's how to configure your recommended models in ~/.hermes/config.yaml:

# Main model configuration
model:
  provider: "nous"
  default: "anthropic/claude-sonnet-4.6"
  base_url: "https://inference-api.nousresearch.com/v1"
  api_mode: "chat_completions"

# Auxiliary model overrides (cost optimization)
auxiliary:
  title_gen:
    provider: "openrouter"
    model: "google/gemini-3-flash-preview"
  vision:
    provider: "openrouter"
    model: "google/gemini-2.5-flash"
  compression:
    provider: "openrouter"
    model: "deepseek/deepseek-v4-flash"
  approval:
    provider: "openrouter"
    model: "anthropic/claude-haiku-4.5"

Recommended Providers

🏆 Nous Portal — RECOMMENDED

One OAuth login covers 300+ frontier agentic models plus the Tool Gateway (web search, image generation, TTS, browser automation). 10% off token-billed providers.

hermes setup --portal

🔄 OpenRouter — Most Models

400+ models with multi-provider routing. Supports provider routing for cost/speed optimization. Set OPENROUTER_API_KEY in ~/.hermes/.env.

All supported providers: Nous Portal, OpenRouter, OpenAI Codex, Anthropic, Google Gemini, GitHub Copilot, DeepSeek, Alibaba/DashScope, Z.AI/GLM, Kimi/Moonshot, MiniMax, xAI/Grok, AWS Bedrock, Azure AI Foundry, NVIDIA NIM, HuggingFace, Ollama Cloud, LM Studio, and Custom Endpoints.

Cost Optimization Strategies

From the Hermes Agent Creator documentation:

  1. Use auxiliary models — Override auxiliary tasks with cheaper flash models. Compression can run at 1/50th the cost of the main model.
  2. Use fast models for simple tasks — Switch to faster models for formatting, renaming, or boilerplate generation.
  3. Use free tier — Free models available through OpenRouter and Nous Portal free tier.
  4. Run local models — Zero API costs with local deployment. Best for privacy and high-volume usage.
  5. Compress long sessions — Run /compress before hitting token limits to summarize conversation history.
  6. Delegate for parallel work — Use delegate_task for parallel subtasks to reduce main conversation token usage.
  7. Use execute_code — Write Python scripts for batch operations instead of running terminal commands one at a time.

Conclusion

The Hermes Agent Creator's official recommendations are clear: Claude Sonnet 4.6 is the best general-purpose agentic model, GPT-5.5 Pro for reasoning, Gemini 3 Pro for context, and DeepSeek V4 Pro for coding. For budget users, the free tier models and local Qwen3.5-9B offer excellent value.

The biggest cost savings come from the auxiliary model system — overriding compression, vision, and title generation with cheap flash models can reduce costs by up to 50x. And Nous Portal's bundled approach (300+ models + Tool Gateway with 10% off) is the recommended way to run Hermes Agent.

Remember: don't use Hermes-4-70B/405B inside the agent — they're tuned for chat, not the rapid-fire tool-calling loop. Use them for Nous Chat instead.