DOC
Token Economy
How to control Claude API costs with ulk — context hygiene, compression tools, and effort levers.
Token Economy
Running 90 agents across a busy development workflow consumes tokens. This document covers the levers ulk provides to reduce that consumption — from free hygiene rules to optional opt-in tools.
All gains below are measured on the ulk project itself. Baseline: ~$353/month at 258M tokens (April 2026 measurement).
Why Token Economy Matters
Claude API pricing has three components: input tokens (including cache writes), cached input tokens (0.1× the input price), and output tokens. The main levers are:
- Reduce context rot — a polluted context forces Claude to re-read what it already processed
- Compress verbose outputs — CLI commands like
gh pr list --jsoncan return 140 KB for a simple query - Avoid redundant file reads — reading a 500-line file to find one function
- Right-size the model and effort — Opus at
xhighburns ~2× the tokens of Sonnet atmediumfor routine tasks
The 5 context hygiene rules (free, zero install) are the foundation. No compression tool compensates for context rot.
The 5 Context Hygiene Rules
These rules are mandatory for all ulk sessions. They apply to the user, not just to agents.
Rule 1 — /rewind instead of correcting
When Claude goes in the wrong direction, do not try to correct it with another message. The failed attempt stays in context and pollutes everything that follows.
/rewind
Then reformulate the request. The bad attempt is gone.
Estimated gain: avoids 1–5K tokens of corrective attempts per deviation.
Rule 2 — /clear when changing tasks
New task = new session. This is the most commonly ignored rule.
Checklist before /clear:
- Changes committed (
git statusclean) docs/todo.mdupdated- Next steps documented
- External state synced (Linear, Notion, GitHub)
/clear
Estimated gain: prevents cross-task context pollution entirely.
Rule 3 — Sub-agents for heavy exploration
When exploring code requires reading many files, do not do it in the main session.
Launch a sub-agent to summarize how the authentication module works
The sub-agent starts with a clean context, reads everything it needs, and returns only the synthesis. The main session context stays intact.
Estimated gain: preserves the main context window; the sub-agent’s context is discarded after the task.
Rule 4 — /compact proactively at 50-60%
The automatic compact triggers at 80%, but by then Claude is often already drifting. Compact manually at 50-60% with explicit instructions on what to preserve.
/compact Preserve: arch decision (option B), files in progress (src/auth.ts), active bug (#A042). Discard: abandoned approaches, initial exploration.
Estimated gain: prevents the drift that accumulates between 50% and 80% context.
Rule 5 — Lock tools and model at session start (Session Lock)
Never add, remove an MCP server, or change model mid-session. Each change invalidates the cache prefix and forces a full re-read of the context (cache miss = 10× the cost of a hit).
Cache economics:
- Cache hit: 0.1× the input price
- Cache write: 1.25× the input price (TTL: 5 minutes)
Lock tools and model before the first request. Do not use /model mid-session.
Recommended settings in ~/.claude/settings.json:
{
"env": {
"CLAUDE_CODE_DISABLE_1M_CONTEXT": "1",
"CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "80"
}
}
CLAUDE_CODE_DISABLE_1M_CONTEXT=1— disables Opus’s 1M context (4.7/4.8), forces 200K (auto-compact at ~155K, more predictable and cheaper)CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=80— triggers compact at 80% rather than at overflow
Apply via: ./install.sh --with-session-defaults (installs the /session-defaults skill)
Concision Heuristics (All Agents)
These anti-patterns add tokens with no informational value. All ulk agents are configured to avoid them. Users should avoid them too.
| Anti-pattern | Token impact | Replacement |
|---|---|---|
| Preamble (“I’m going to do X…”) | +20–80 tok/phase | Do X directly |
| Transitional summary (“I did X, now I’ll do Y…”) | +30–100 tok | Result of X, continue |
| Final recap after commit/diff | +50–200 tok | Nothing — the diff is sufficient |
| Repeating the problem before solving | +40–120 tok | Solve directly |
| Unnecessary hedging (“It would seem that…”) | +10–30 tok | Direct assertion |
| ”As mentioned previously…” | +10–20 tok | Reference by ID or title |
| Affirmation (“Certainly! I’d be happy to…”) | +20–50 tok | Direct response |
When to be verbose (never compress these):
- Architectural decisions — explain the WHY for future readers
- Blocking errors — provide full
file:line:pattern - Security findings — exhaustive details always
RTK — Command Output Compression (Base — Always Active)
RTK (Rust Token Killer) is a CLI proxy that compresses verbose command outputs by 60–90%.
# Explicit usage
rtk proxy git log --stat -20
rtk proxy gh pr list --json
rtk proxy npm test
rtk proxy terraform plan
# View savings statistics
rtk gain
rtk gain --history
RTK is included in the base ulk installation. A PostToolUse hook suggests rtk proxy for commands that exceed the output threshold.
Install (if not already present):
brew install rtk
# or
curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/refs/heads/master/install.sh | sh
Important: if rtk gain fails, you may have the wrong rtk installed (reachingforthejack/rtk — Rust Type Kit). Check with rtk --version.
Measured gain: -60% to -90% on verbose command outputs.
/context-mode — Verbose Output Storage (Opt-in)
A PostToolUse hook intercepts outputs larger than 8 KB from Bash and mcp__github__* calls and stores them in SQLite. The context receives a compact pointer instead of the raw content.
gh pr list --json (140 KB)
↓ hook
[context-mode#784e352c] 140000 bytes (~35000 tokens) stored.
To access: /context-mode query 784e352c
The /context-mode skill provides query access:
/context-mode query <id> # Retrieve a stored result
/context-mode list # See the N most recent
/context-mode stats # Interception rate, tokens saved
/context-mode purge --older-than 7d
Activation:
./install.sh --with-context-mode
Prerequisites: python3 + sqlite3 (present by default on macOS/Linux).
Measured gain: -$8 to -$24/month (~5–14% of baseline) on output tokens.
/symbols — LSP Navigation (Opt-in)
Instead of reading an entire file to find a function, query the TypeScript/JavaScript language server. Returns signatures, types, and references without loading the whole file.
/symbols list <file> # List all symbols
/symbols view <file> <symbol> # Body of the symbol only
/symbols refs <file> <symbol> # References across the codebase
Fallback rule: if documentSymbol returns 0 symbols (barrel/re-export file) → use Read directly.
Activation:
npm install -g typescript-language-server typescript
# The skill is bundled in ~/.claude/skills/symbols/ by default
Measured gains: -43% on a 191-line interface file, -74% on a 221-line function file. No gain on barrel files.
/caveman — Terse Output Mode (Opt-in)
Injects a system prompt that forces terse mode on all Claude reports in the session.
/caveman # Activate for the session
/caveman off # Deactivate
Rule: clean phases → caveman everywhere. Blocking error (typecheck failure) or security finding → revert to normal mode for that phase only.
Activation:
./install.sh --with-caveman-output-skill
Measured gains: -79% on phase reports (2b3, CI Guard, checkpoints). -18% to -24% on full session.
Effort Levels
Reasoning effort is adjusted per-prompt, not per-session. Opus 4.8 defaults to high (Opus 4.7 defaulted to xhigh); xhigh burns ~2× the tokens of medium for most tasks.
/effort low # Mechanical fixes, reformatting, no judgment required
/effort medium # Most prompts — massive savings vs default
/effort high # Default for Opus 4.8 agentic coding
/effort xhigh # Hardest tasks + long async workflows (was the Opus 4.7 default)
/effort max # Diminishing returns — rarely justified (~2× xhigh cost)
Rule: reserve xhigh/max for prompts that genuinely require planning or trade-off decisions. Mechanical tasks in low, most prompts in medium.
Large Codebases (>50K LOC)
At scale, the native ulk strategy (read + grep + /symbols) loads too many tokens per exploration. The decision matrix:
| Project size | Recommended approach |
|---|---|
| < 10K LOC | Native ulk tools (/symbols, RTK, sub-agents) |
| 10–50K LOC | Native ulk + Context Mode + hygiene rules |
| 50–100K LOC | Native ulk + Code Review Graph (dependency graph) |
| > 100K LOC (non-confidential) | Zilliz Cloud (free tier: 2 collections, 1M vectors) + native ulk |
| > 100K LOC (confidential) | Milvus local (Docker) + native ulk |
Code Review Graph:
npx code-review-graph index --path .
npx code-review-graph query "refresh token handling"
Zilliz Cloud (announces -40% session cost on large monorepos):
claude mcp add zilliz-context \
--env ZILLIZ_URI="https://..." \
--env ZILLIZ_TOKEN="..." \
-- npx @zilliz/claude-context-server
npx @zilliz/claude-context-cli index --path . --collection my-codebase
These advanced options are not integrated into the core ulk installation. They are documented in docs/guides/large-codebase.md.
Summary — Levers and Gains
| Lever | Type | Install | Measured gain |
|---|---|---|---|
| Context hygiene (5 rules) | Mandatory | Zero | Foundational |
| Concision heuristics | Mandatory | Zero | -20 to -200 tok/turn |
| RTK | Base (included) | brew install rtk | -60% to -90% on command outputs |
| /context-mode | Opt-in | --with-context-mode | -$8 to -$24/month |
| /symbols | Opt-in (bundled) | npm install -g typescript-language-server typescript | -43% to -74% on TS/JS files > 150L |
| /caveman | Opt-in | --with-caveman-output-skill | -79% on phase reports |
| Session defaults | Opt-in | --with-session-defaults | Prevents cache invalidation |
Optimal stack:
./install.sh --with-context-mode --with-caveman-output-skill --with-session-defaults
npm install -g typescript-language-server typescript
# In session
/caveman
/ulk:gandalf status # monitor context zone
Estimated combined gain: -$40 to -$80/month against the April 2026 baseline.