Optimizing token usage
amazeeClaw runs on OpenClaw, an agent framework that sends conversation history, system instructions, tool definitions, and workspace files to the AI model on every turn. This means token consumption grows with conversation length and the number of tools available — not just the length of your messages.
This page covers practical steps to reduce token spend without losing functionality.
Why agents use more tokens than chatbots
A standard chatbot sends your message and gets a response. An agent does more:
The system prompt (instructions, personality, workspace files like AGENTS.md and SOUL.md) is sent on every turn. Tool schemas add another 8,000–15,000 tokens of JSON definitions for things like exec, browser, and web search. The full session transcript — your messages, the agent's replies, and every tool call and its output — is replayed on each turn. When the agent runs a command or fetches a web page, that output enters the context and stays there for the rest of the session.
With a higher-end model like Claude Sonnet, a single turn in a long session with large tool outputs can cost $0.10–0.50. Multiply that by the back-and-forth of a working session and $10 in 30 minutes is plausible.
1. Choose a cheaper model
Model choice has the single largest impact on cost. Claude Sonnet is the default model in most regions but also one of the most expensive options.
To switch models, either:
- Ask your agent: "Switch to claude-haiku"
- Change the model in your OpenClaw gateway settings
Cheaper alternatives with function calling support:
These models all support the tool use that amazeeClaw needs. Cost savings are approximate and based on current pricing.
| Model | Relative cost | Good for |
|---|---|---|
| Mistral Small | ~30× cheaper than Sonnet | Straightforward tasks, Q&A, writing |
| Qwen3 32B | ~20× cheaper than Sonnet | Reasoning, multilingual, general use |
| GPT-4.1 Mini | ~7× cheaper than Sonnet | All-round tasks, coding |
| DeepSeek V3.2 | ~5× cheaper than Sonnet | Reasoning, coding |
| Claude Haiku | ~3× cheaper than Sonnet | Best drop-in replacement if you want to stay on Claude |
Exact pricing depends on your region. Check your amazee.ai portal for current per-token rates, or see Available Models.
!!! tip "Start with Claude Haiku or Qwen3" If you want to stay on Anthropic, Claude Haiku is the safest switch. If you're open to other providers, Qwen3 32B gives you strong function calling at a fraction of the cost.
!!! note "Model availability varies by region" Not every model is available in every region. Ask your agent "What models can I use?" or check the models list in your portal.
What about Gemini Flash?
Gemini 2.5 Flash is available in some regions and does support function calling. If tool calls fail on Gemini Flash in your region, try Mistral Small or Qwen3 32B instead.
2. Start new sessions instead of continuing long ones
Every message in a session stays in the context window. A 50-message conversation means the model processes all 50 messages plus their tool results on turn 51. This is the main driver of cost growth over time.
What to do:
- Type
/newto start a fresh session when you shift to a different task. (This is an OpenClaw gateway command — if it doesn't work, ask your agent "Start a new session.") - Don't treat your agent like a single continuous thread. Treat each topic as a separate session.
- After a long research or coding session, start fresh before asking your next question.
The rule of thumb: if you've been chatting for 20+ minutes on the same topic, /new will save you money on every subsequent turn.
3. Compress history with /compact
When you don't want to lose session context but the conversation is getting long, type:
/compact
This is an OpenClaw command that summarizes older messages into a compressed entry and keeps only recent messages intact. The full history stays on disk — compaction only changes what the model processes on the next turn.
You can guide the summary:
/compact Keep the API design decisions and code examples
If /compact isn't available in your setup, you can ask your agent: "Summarize our conversation so far and start fresh with just the summary."
4. Write concise prompts
Token consumption is proportional to both input and output length. Shorter questions tend to produce shorter answers.
Instead of:
"Can you please help me write a comprehensive blog post about the benefits of containerization for modern application development, covering topics like portability, scalability, consistency across environments, and DevOps integration?"
Try:
"Write a blog post: benefits of containerization. Cover portability, scalability, environment consistency, DevOps integration. 800 words."
Specifying a word count or format prevents the model from generating more output than you need.
5. Avoid re-explaining context the agent already has
If you told your agent about your project setup in message 3, you don't need to repeat it in message 15. The agent has the full session history.
Restating context doesn't cost much on input (it's a few hundred tokens), but it often prompts the agent to restate things back to you in its reply — and output tokens are 3–5× more expensive than input tokens.
6. Check your usage
Type /status in your agent chat to see how full your context window is and how many tokens the current session has used. Type /context list for a breakdown of what's consuming space. These are OpenClaw gateway commands — if they aren't available, ask your agent how much context it's using.
Your usage meter is also visible in the amazee.ai portal.
7. Disable tools you don't need
Every tool the agent has access to adds its JSON schema to every API call. If you don't use browser automation, code execution, or image generation, disabling those tools in your OpenClaw gateway settings removes their schemas from the context and saves tokens per turn.
Check /context detail (if available) to see which tool schemas are largest.
8. Keep workspace files small
OpenClaw injects workspace files (AGENTS.md, SOUL.md, TOOLS.md, etc.) into the system prompt on every turn. Large files eat into your context budget.
- Keep AGENTS.md under 2,000 characters.
- If TOOLS.md is being truncated (check
/context listforTRUNCATED), consider splitting it or removing sections you don't actively use. - Remove SOUL.md if you haven't customized it — the default agent behavior works without one.
Quick cost comparison
For a typical 30-minute working session (roughly 25 back-and-forth turns), switching models changes the bill dramatically:
| Model | Approximate session cost | Relative to Sonnet |
|---|---|---|
| Claude Sonnet | $$$ | Baseline |
| Claude Haiku | $ | ~3× cheaper |
| Qwen3 32B | ¢ | ~20× cheaper |
Heavy tool use (long command outputs, large file reads, web fetches) multiplies costs in any model. The relative savings from switching stay roughly the same.
For current per-token rates, check the models table in your amazee.ai portal.
!!! warning "Tool results are the hidden cost multiplier"
A single exec call that returns 10,000 characters of output adds ~2,500 tokens to every subsequent turn. If your agent runs commands that produce verbose output, that output stays in the context for the rest of the session unless you /compact or /new.
Quick reference
| Action | Effort | Impact |
|---|---|---|
| Switch to a cheaper model | 30 seconds | High: 3× to 30× cheaper |
| Start new sessions per topic | None | High: prevents context snowball |
Use /compact |
5 seconds | Medium: compresses accumulated history |
| Write concise prompts | Low | Medium: less input, less output |
| Disable unused tools | One-time setup | Low-medium: depends on tool count |
| Keep workspace files lean | One-time cleanup | Low: saves on every turn |
Questions? Email ai.support@amazee.io