Public Service Announcement

Tokens are
currency.
Spend wisely.

Every word you type to an AI costs money. Most people burn through tokens without knowing why — or how to stop.

SCROLL TO LEARN
"Hello, how are you?" = 6 tokens System prompts re-sent with every message Conversation history compounds on every turn GPT-4o Input: $2.50/M tokens · Output: $10/M tokens Claude Sonnet 4: $3/M input · $15/M output Gemini 1.5 Pro: $1.25/M input · $5/M output Whitespace counts · Punctuation counts · Everything counts "Hello, how are you?" = 6 tokens System prompts re-sent with every message Conversation history compounds on every turn GPT-4o Input: $2.50/M tokens · Output: $10/M tokens Claude Sonnet 4: $3/M input · $15/M output Gemini 1.5 Pro: $1.25/M input · $5/M output Whitespace counts · Punctuation counts · Everything counts
01 — Fundamentals

What even
is a token?

Tokens are the chunks AI models break text into — roughly 4 characters or ¾ of a word. They're not words. Not characters. Something in between. Type anything below and watch it get sliced.

0 Tokens
0 Characters
0 Words
0 Chars/Token
02 — Model Breakdown

How each model
handles tokens

Claude, Gemini Pro, and ChatGPT/Codex all tokenize and price differently. Here's how they actually compare.

Anthropic
Claude Sonnet 4
Context window 200K tokens
Input cost $3.00 / 1M
Output cost $15.00 / 1M
Tokenizer BPE (custom)
~Chars/token ~3.5–4.5
Multimodal ✓ Images, PDFs
Google
Gemini 1.5 Pro
Context window 1M tokens
Input cost $1.25 / 1M
Output cost $5.00 / 1M
Tokenizer SentencePiece
~Chars/token ~3.0–4.0
Multimodal ✓ Video, Audio, Images
OpenAI
GPT-4o / Codex
Context window 128K tokens
Input cost $2.50 / 1M
Output cost $10.00 / 1M
Tokenizer tiktoken (cl100k)
~Chars/token ~4.0–4.5
Multimodal ✓ Images

Important: Output tokens cost 3–5× more than input tokens across all models. That verbose AI response you love? It's costing 5x more per token than your question did. The model "thinking out loud" in chain-of-thought reasoning also burns output tokens silently before it gives you the final answer.

03 — Hidden Costs

Token vampires:
what's draining you

These are the silent token consumers most people never think about. Click each to reveal how bad it really is.

👻
System Prompts
▲ RESENT EVERY SINGLE MESSAGE
Your system prompt isn't sent once — it's attached to every single API call. A 500-token system prompt on 1,000 daily API calls = 500,000 extra tokens per day. That's $1.50/day just in system prompt overhead with Claude Sonnet.
// 500 token system prompt × 1,000 calls/day = 500,000 extra tokens/day = ~$1.50/day just in overhead = ~$547/year in wasted system prompts
Fix: Keep system prompts lean. Move static reference docs to retrieval (RAG) instead of stuffing them in the prompt.
📜
Conversation History
▲ GROWS QUADRATICALLY PER CHAT
AI models have no memory. Every message in a chat gets re-sent in full to the API. By message 20, you might be sending 5,000 tokens of old conversation just to ask one new question. A 30-turn conversation can easily run 15,000+ input tokens — even if each message was short.
Turn 1: 100 tokens sent Turn 5: 600 tokens sent (all history) Turn 10: 1,400 tokens sent Turn 20: 4,200 tokens sent Turn 30: 9,800 tokens sent 🔥
Fix: Implement conversation summarization — replace old messages with a compressed summary every N turns.
🖼️
Images & Vision
▲ 1 IMAGE = UP TO 1,700 TOKENS
Uploading an image to a vision model doesn't cost "a little extra." A high-res image with Claude can cost up to 1,700 tokens — just for the image itself, before you've typed a word. Low-res mode can drop this to ~85 tokens, but you often get that tradeoff automatically.
High-res image → up to 1,700 tokens Low-res image → ~85 tokens Full-page PDF → ~1,500+ tokens per page Video (Gemini) → charged per frame extracted
Fix: Resize images before sending. Most tasks don't need full resolution. Use low-res mode when available.
💬
Verbose Prompting
▲ PLEASANTRIES ARE EXPENSIVE
"Hi! I hope you're doing well today. Could you please help me with something? I'm working on a project and I was wondering if you might be able to..." — this preamble costs ~40 tokens and adds zero value. At scale, politeness is pricey.
❌ "Hi! I hope this finds you well. Could you please summarize the following text for me?" ✓ "Summarize:"
Fix: Be direct. AI doesn't need social warmth. Remove preamble, filler, and redundant context.
🔄
Re-asking for Context
▲ COPY-PASTING DOCS REPEATEDLY
When you paste a long document into the chat and ask multiple questions about it in the same session, you're re-sending the entire document with every new message. A 10,000-word document pasted into chat becomes ~13,000 tokens, re-sent on every turn.
Turn 1: Paste 13,000 token document + question Turn 2: Same 13,000 tokens + new question Turn 3: Same 13,000 tokens + new question = 39,000+ tokens instead of 13,000
Fix: Use RAG (retrieval-augmented generation) to fetch only the relevant chunk — not the whole document — for each query.
🤔
Chain-of-Thought / Reasoning
▲ THINKING OUT LOUD COSTS OUTPUT $$$
Extended thinking models (like Claude with extended thinking, or o1/o3) emit reasoning tokens before they answer. These are output tokens — the most expensive kind — and you're charged for the model's internal scratchpad even though you often don't see it in the UI.
Your question: 50 tokens Model thinking: 2,000 output tokens (charged!) Final answer: 300 output tokens Total output: 2,300 tokens charged Cost: ~$0.023 for one question
Fix: Only use extended thinking / reasoning modes when complexity genuinely requires it. For simple tasks, it's overkill.
04 — Common Misconceptions

Myths vs.
reality

Click each statement to reveal the truth.

"Shorter sentences use fewer tokens."
Partially true
Mostly yes — but it's not about sentence length, it's about word complexity. "OK" = 1 token. "Entrepreneurship" = 4 tokens. Rare, long, or technical words get split into many subword tokens. A short sentence with complex vocabulary can cost more than a longer one with common words.
"Spaces and punctuation don't cost tokens."
Myth
Every character counts. Spaces are often baked into tokens (e.g., " the" is one token), but punctuation like commas, periods, quotes, and brackets can each consume a token. JSON, with all its brackets and quotes, is notoriously token-expensive compared to plain text.
"All AI models count tokens the same way."
Myth
Claude, GPT-4, and Gemini use different tokenizers with different vocabularies. The same sentence can have different token counts across models. GPT-4 uses cl100k_base, Claude uses a custom BPE, and Gemini uses SentencePiece. Results can vary by 10–25% for the same text.
"Asking the AI to 'be concise' saves you money."
Barely
It can reduce output tokens — but the instruction itself costs input tokens. More importantly, the model may not always comply. You're better off setting `max_tokens` in the API to hard-cap output length, rather than politely asking.
"Using claude.ai or ChatGPT.com means I don't pay per token."
True (for you)
Subscription users on flat-rate plans ($20/month etc.) don't get charged per token directly. But Anthropic and OpenAI do — which is why these services have rate limits, soft caps, and use cheaper/faster models for heavy users. Someone's always paying the token bill.
"Emojis are basically free — just one character."
Myth
Emojis are multi-byte Unicode characters and often consume 1–3 tokens each. Some complex compound emojis (like skin-tone modifiers 👋🏽) can use even more. That emoji-heavy Slack integration might be surprisingly expensive at scale.
"Code is more token-efficient than prose."
Myth
Code is often token-expensive. Variable names, indentation, brackets, semicolons, and comments all stack up. A 50-line Python function might use 400–600 tokens. JSON config files are particularly dense. Minified code helps, but readability usually matters more.
"The model sees my full conversation history as one coherent memory."
Myth
There is no memory. The model receives a flat text block of the full conversation on every call. It processes it all fresh. That's why very long conversations degrade in quality near context limits — the model struggles to attend to early context when it's buried under thousands of tokens.
05 — Cost Calculator

How much is
your workflow costing?

Estimate your monthly API spend across all three platforms.

Claude Sonnet 4
$0.00
Gemini 1.5 Pro
$0.00
GPT-4o
$0.00
06 — See It In Action

The Fluff Stripper

Pleasantries and rambling cost you money at scale. Paste a verbose prompt below to see how a prompt optimizer reduces your token footprint instantly.

* For demonstration only. The actual API uses advanced NLP context compression, not basic regex.
Your Verbose Prompt ~54 Tokens
Lean Machine Prompt 0 Tokens
Savings: 0%
Tokens Saved: 0

Stop burning tokens
you don't need to.

Small changes in how you write prompts and design AI workflows can cut your token usage — and costs — by 40–70%.

01
Trim your system prompt
Audit your system prompt monthly. Every redundant sentence costs money on every single API call. Keep instructions minimal and precise.
02
Summarize long chats
After 10–15 turns, replace old messages with a condensed summary. Your context stays fresh and your token count stays manageable.
03
Set max_tokens hard caps
Don't just ask for concise — enforce it in the API. Setting max_tokens prevents runaway long responses before they happen.
04
Use RAG for documents
Never paste full documents into chat. Embed them in a vector store and retrieve only the relevant chunks per query. 10x more efficient.
05
Match model to task
GPT-4o and Claude Sonnet are overkill for simple classification or extraction tasks. Use smaller, cheaper models (Haiku, Flash, GPT-4o-mini) when you don't need full capability.
06
Skip the pleasantries
AI doesn't need "please" and "thank you." Get straight to the task. In batch workflows, this alone can trim 5–15% of input token overhead.
08 — What's Next

The Automation Tool

Want early access?

I'm building a tool that does this automatically. It acts as a proxy to strip fluff, compress context, and cache repetitive prompts to instantly cut your API bills without degrading quality.

Help shape the product

I want to build what actually solves your problem. Which of these implementations would you pay for today?

Prompt Optimizer Proxy API 42%
IDE / Chrome Extension 28%
API Usage & Analytics Dashboard 18%
Team Token Budget Alerts (Slack) 12%