Skip to content

Optimizing token usage

amazeeClaw runs on OpenClaw, an agent framework that sends conversation history, system instructions, tool definitions, and workspace files to the AI model on every turn. This means token consumption grows with conversation length and the number of tools available — not just the length of your messages.

This page covers practical steps to reduce token spend without losing functionality.


Why agents use more tokens than chatbots

A standard chatbot sends your message and gets a response. An agent does more:

The system prompt (instructions, personality, workspace files like AGENTS.md and SOUL.md) is sent on every turn. Tool schemas add another 8,000–15,000 tokens of JSON definitions for things like exec, browser, and web search. The full session transcript — your messages, the agent's replies, and every tool call and its output — is replayed on each turn. When the agent runs a command or fetches a web page, that output enters the context and stays there for the rest of the session.

With a higher-end model like Claude Sonnet, a single turn in a long session with large tool outputs can cost $0.10–0.50. Multiply that by the back-and-forth of a working session and $10 in 30 minutes is plausible.


1. Choose a cheaper model

Model choice has the single largest impact on cost. Claude Sonnet is the default model in most regions but also one of the most expensive options.

To switch models, either:

  • Ask your agent: "Switch to claude-haiku"
  • Change the model in your OpenClaw gateway settings

Cheaper alternatives with function calling support:

These models all support the tool use that amazeeClaw needs. Cost savings are approximate and based on current pricing.

Model Relative cost Good for
Mistral Small ~30× cheaper than Sonnet Straightforward tasks, Q&A, writing
Qwen3 32B ~20× cheaper than Sonnet Reasoning, multilingual, general use
GPT-4.1 Mini ~7× cheaper than Sonnet All-round tasks, coding
DeepSeek V3.2 ~5× cheaper than Sonnet Reasoning, coding
Claude Haiku ~3× cheaper than Sonnet Best drop-in replacement if you want to stay on Claude

Exact pricing depends on your region. Check your amazee.ai portal for current per-token rates, or see Available Models.

!!! tip "Start with Claude Haiku or Qwen3" If you want to stay on Anthropic, Claude Haiku is the safest switch. If you're open to other providers, Qwen3 32B gives you strong function calling at a fraction of the cost.

!!! note "Model availability varies by region" Not every model is available in every region. Ask your agent "What models can I use?" or check the models list in your portal.

What about Gemini Flash?

Gemini 2.5 Flash is available in some regions and does support function calling. If tool calls fail on Gemini Flash in your region, try Mistral Small or Qwen3 32B instead.


2. Start new sessions instead of continuing long ones

Every message in a session stays in the context window. A 50-message conversation means the model processes all 50 messages plus their tool results on turn 51. This is the main driver of cost growth over time.

What to do:

  • Type /new to start a fresh session when you shift to a different task. (This is an OpenClaw gateway command — if it doesn't work, ask your agent "Start a new session.")
  • Don't treat your agent like a single continuous thread. Treat each topic as a separate session.
  • After a long research or coding session, start fresh before asking your next question.

The rule of thumb: if you've been chatting for 20+ minutes on the same topic, /new will save you money on every subsequent turn.


3. Compress history with /compact

When you don't want to lose session context but the conversation is getting long, type:

/compact

This is an OpenClaw command that summarizes older messages into a compressed entry and keeps only recent messages intact. The full history stays on disk — compaction only changes what the model processes on the next turn.

You can guide the summary:

/compact Keep the API design decisions and code examples

If /compact isn't available in your setup, you can ask your agent: "Summarize our conversation so far and start fresh with just the summary."


4. Write concise prompts

Token consumption is proportional to both input and output length. Shorter questions tend to produce shorter answers.

Instead of:

"Can you please help me write a comprehensive blog post about the benefits of containerization for modern application development, covering topics like portability, scalability, consistency across environments, and DevOps integration?"

Try:

"Write a blog post: benefits of containerization. Cover portability, scalability, environment consistency, DevOps integration. 800 words."

Specifying a word count or format prevents the model from generating more output than you need.


5. Avoid re-explaining context the agent already has

If you told your agent about your project setup in message 3, you don't need to repeat it in message 15. The agent has the full session history.

Restating context doesn't cost much on input (it's a few hundred tokens), but it often prompts the agent to restate things back to you in its reply — and output tokens are 3–5× more expensive than input tokens.


6. Check your usage

Type /status in your agent chat to see how full your context window is and how many tokens the current session has used. Type /context list for a breakdown of what's consuming space. These are OpenClaw gateway commands — if they aren't available, ask your agent how much context it's using.

Your usage meter is also visible in the amazee.ai portal.


7. Disable tools you don't need

Every tool the agent has access to adds its JSON schema to every API call. If you don't use browser automation, code execution, or image generation, disabling those tools in your OpenClaw gateway settings removes their schemas from the context and saves tokens per turn.

Check /context detail (if available) to see which tool schemas are largest.


8. Keep workspace files small

OpenClaw injects workspace files (AGENTS.md, SOUL.md, TOOLS.md, etc.) into the system prompt on every turn. Large files eat into your context budget.

  • Keep AGENTS.md under 2,000 characters.
  • If TOOLS.md is being truncated (check /context list for TRUNCATED), consider splitting it or removing sections you don't actively use.
  • Remove SOUL.md if you haven't customized it — the default agent behavior works without one.

Quick cost comparison

For a typical 30-minute working session (roughly 25 back-and-forth turns), switching models changes the bill dramatically:

Model Approximate session cost Relative to Sonnet
Claude Sonnet $$$ Baseline
Claude Haiku $ ~3× cheaper
Qwen3 32B ¢ ~20× cheaper

Heavy tool use (long command outputs, large file reads, web fetches) multiplies costs in any model. The relative savings from switching stay roughly the same.

For current per-token rates, check the models table in your amazee.ai portal.

!!! warning "Tool results are the hidden cost multiplier" A single exec call that returns 10,000 characters of output adds ~2,500 tokens to every subsequent turn. If your agent runs commands that produce verbose output, that output stays in the context for the rest of the session unless you /compact or /new.


Quick reference

Action Effort Impact
Switch to a cheaper model 30 seconds High: 3× to 30× cheaper
Start new sessions per topic None High: prevents context snowball
Use /compact 5 seconds Medium: compresses accumulated history
Write concise prompts Low Medium: less input, less output
Disable unused tools One-time setup Low-medium: depends on tool count
Keep workspace files lean One-time cleanup Low: saves on every turn

Questions? Email ai.support@amazee.io