Skip to content

DOC

Token Economy

How to control Claude API costs with ulk — context hygiene, compression tools, and effort levers.

Token Economy

Running 90 agents across a busy development workflow consumes tokens. This document covers the levers ulk provides to reduce that consumption — from free hygiene rules to optional opt-in tools.

All gains below are measured on the ulk project itself. Baseline: ~$353/month at 258M tokens (April 2026 measurement).


Why Token Economy Matters

Claude API pricing has three components: input tokens (including cache writes), cached input tokens (0.1× the input price), and output tokens. The main levers are:

  1. Reduce context rot — a polluted context forces Claude to re-read what it already processed
  2. Compress verbose outputs — CLI commands like gh pr list --json can return 140 KB for a simple query
  3. Avoid redundant file reads — reading a 500-line file to find one function
  4. Right-size the model and effort — Opus at xhigh burns ~2× the tokens of Sonnet at medium for routine tasks

The 5 context hygiene rules (free, zero install) are the foundation. No compression tool compensates for context rot.


The 5 Context Hygiene Rules

These rules are mandatory for all ulk sessions. They apply to the user, not just to agents.

Rule 1 — /rewind instead of correcting

When Claude goes in the wrong direction, do not try to correct it with another message. The failed attempt stays in context and pollutes everything that follows.

/rewind

Then reformulate the request. The bad attempt is gone.

Estimated gain: avoids 1–5K tokens of corrective attempts per deviation.

Rule 2 — /clear when changing tasks

New task = new session. This is the most commonly ignored rule.

Checklist before /clear:

  • Changes committed (git status clean)
  • docs/todo.md updated
  • Next steps documented
  • External state synced (Linear, Notion, GitHub)
/clear

Estimated gain: prevents cross-task context pollution entirely.

Rule 3 — Sub-agents for heavy exploration

When exploring code requires reading many files, do not do it in the main session.

Launch a sub-agent to summarize how the authentication module works

The sub-agent starts with a clean context, reads everything it needs, and returns only the synthesis. The main session context stays intact.

Estimated gain: preserves the main context window; the sub-agent’s context is discarded after the task.

Rule 4 — /compact proactively at 50-60%

The automatic compact triggers at 80%, but by then Claude is often already drifting. Compact manually at 50-60% with explicit instructions on what to preserve.

/compact Preserve: arch decision (option B), files in progress (src/auth.ts), active bug (#A042). Discard: abandoned approaches, initial exploration.

Estimated gain: prevents the drift that accumulates between 50% and 80% context.

Rule 5 — Lock tools and model at session start (Session Lock)

Never add, remove an MCP server, or change model mid-session. Each change invalidates the cache prefix and forces a full re-read of the context (cache miss = 10× the cost of a hit).

Cache economics:

  • Cache hit: 0.1× the input price
  • Cache write: 1.25× the input price (TTL: 5 minutes)

Lock tools and model before the first request. Do not use /model mid-session.

Recommended settings in ~/.claude/settings.json:

{
  "env": {
    "CLAUDE_CODE_DISABLE_1M_CONTEXT": "1",
    "CLAUDE_AUTOCOMPACT_PCT_OVERRIDE": "80"
  }
}
  • CLAUDE_CODE_DISABLE_1M_CONTEXT=1 — disables Opus’s 1M context (4.7/4.8), forces 200K (auto-compact at ~155K, more predictable and cheaper)
  • CLAUDE_AUTOCOMPACT_PCT_OVERRIDE=80 — triggers compact at 80% rather than at overflow

Apply via: ./install.sh --with-session-defaults (installs the /session-defaults skill)


Concision Heuristics (All Agents)

These anti-patterns add tokens with no informational value. All ulk agents are configured to avoid them. Users should avoid them too.

Anti-patternToken impactReplacement
Preamble (“I’m going to do X…”)+20–80 tok/phaseDo X directly
Transitional summary (“I did X, now I’ll do Y…”)+30–100 tokResult of X, continue
Final recap after commit/diff+50–200 tokNothing — the diff is sufficient
Repeating the problem before solving+40–120 tokSolve directly
Unnecessary hedging (“It would seem that…”)+10–30 tokDirect assertion
”As mentioned previously…”+10–20 tokReference by ID or title
Affirmation (“Certainly! I’d be happy to…”)+20–50 tokDirect response

When to be verbose (never compress these):

  • Architectural decisions — explain the WHY for future readers
  • Blocking errors — provide full file:line:pattern
  • Security findings — exhaustive details always

RTK — Command Output Compression (Base — Always Active)

RTK (Rust Token Killer) is a CLI proxy that compresses verbose command outputs by 60–90%.

# Explicit usage
rtk proxy git log --stat -20
rtk proxy gh pr list --json
rtk proxy npm test
rtk proxy terraform plan

# View savings statistics
rtk gain
rtk gain --history

RTK is included in the base ulk installation. A PostToolUse hook suggests rtk proxy for commands that exceed the output threshold.

Install (if not already present):

brew install rtk
# or
curl -fsSL https://raw.githubusercontent.com/rtk-ai/rtk/refs/heads/master/install.sh | sh

Important: if rtk gain fails, you may have the wrong rtk installed (reachingforthejack/rtk — Rust Type Kit). Check with rtk --version.

Measured gain: -60% to -90% on verbose command outputs.


/context-mode — Verbose Output Storage (Opt-in)

A PostToolUse hook intercepts outputs larger than 8 KB from Bash and mcp__github__* calls and stores them in SQLite. The context receives a compact pointer instead of the raw content.

gh pr list --json (140 KB)
    ↓ hook
[context-mode#784e352c] 140000 bytes (~35000 tokens) stored.
To access: /context-mode query 784e352c

The /context-mode skill provides query access:

/context-mode query <id>         # Retrieve a stored result
/context-mode list               # See the N most recent
/context-mode stats              # Interception rate, tokens saved
/context-mode purge --older-than 7d

Activation:

./install.sh --with-context-mode

Prerequisites: python3 + sqlite3 (present by default on macOS/Linux).

Measured gain: -$8 to -$24/month (~5–14% of baseline) on output tokens.


/symbols — LSP Navigation (Opt-in)

Instead of reading an entire file to find a function, query the TypeScript/JavaScript language server. Returns signatures, types, and references without loading the whole file.

/symbols list <file>              # List all symbols
/symbols view <file> <symbol>     # Body of the symbol only
/symbols refs <file> <symbol>     # References across the codebase

Fallback rule: if documentSymbol returns 0 symbols (barrel/re-export file) → use Read directly.

Activation:

npm install -g typescript-language-server typescript
# The skill is bundled in ~/.claude/skills/symbols/ by default

Measured gains: -43% on a 191-line interface file, -74% on a 221-line function file. No gain on barrel files.


/caveman — Terse Output Mode (Opt-in)

Injects a system prompt that forces terse mode on all Claude reports in the session.

/caveman        # Activate for the session
/caveman off    # Deactivate

Rule: clean phases → caveman everywhere. Blocking error (typecheck failure) or security finding → revert to normal mode for that phase only.

Activation:

./install.sh --with-caveman-output-skill

Measured gains: -79% on phase reports (2b3, CI Guard, checkpoints). -18% to -24% on full session.


Effort Levels

Reasoning effort is adjusted per-prompt, not per-session. Opus 4.8 defaults to high (Opus 4.7 defaulted to xhigh); xhigh burns ~2× the tokens of medium for most tasks.

/effort low     # Mechanical fixes, reformatting, no judgment required
/effort medium  # Most prompts — massive savings vs default
/effort high    # Default for Opus 4.8 agentic coding
/effort xhigh   # Hardest tasks + long async workflows (was the Opus 4.7 default)
/effort max     # Diminishing returns — rarely justified (~2× xhigh cost)

Rule: reserve xhigh/max for prompts that genuinely require planning or trade-off decisions. Mechanical tasks in low, most prompts in medium.


Large Codebases (>50K LOC)

At scale, the native ulk strategy (read + grep + /symbols) loads too many tokens per exploration. The decision matrix:

Project sizeRecommended approach
< 10K LOCNative ulk tools (/symbols, RTK, sub-agents)
10–50K LOCNative ulk + Context Mode + hygiene rules
50–100K LOCNative ulk + Code Review Graph (dependency graph)
> 100K LOC (non-confidential)Zilliz Cloud (free tier: 2 collections, 1M vectors) + native ulk
> 100K LOC (confidential)Milvus local (Docker) + native ulk

Code Review Graph:

npx code-review-graph index --path .
npx code-review-graph query "refresh token handling"

Zilliz Cloud (announces -40% session cost on large monorepos):

claude mcp add zilliz-context \
  --env ZILLIZ_URI="https://..." \
  --env ZILLIZ_TOKEN="..." \
  -- npx @zilliz/claude-context-server
npx @zilliz/claude-context-cli index --path . --collection my-codebase

These advanced options are not integrated into the core ulk installation. They are documented in docs/guides/large-codebase.md.


Summary — Levers and Gains

LeverTypeInstallMeasured gain
Context hygiene (5 rules)MandatoryZeroFoundational
Concision heuristicsMandatoryZero-20 to -200 tok/turn
RTKBase (included)brew install rtk-60% to -90% on command outputs
/context-modeOpt-in--with-context-mode-$8 to -$24/month
/symbolsOpt-in (bundled)npm install -g typescript-language-server typescript-43% to -74% on TS/JS files > 150L
/cavemanOpt-in--with-caveman-output-skill-79% on phase reports
Session defaultsOpt-in--with-session-defaultsPrevents cache invalidation

Optimal stack:

./install.sh --with-context-mode --with-caveman-output-skill --with-session-defaults
npm install -g typescript-language-server typescript

# In session
/caveman
/ulk:gandalf status    # monitor context zone

Estimated combined gain: -$40 to -$80/month against the April 2026 baseline.