Tokens: The Hidden Currency of AI

02 — Model Breakdown

How each model
handles tokens

Claude, Gemini Pro, and ChatGPT/Codex all tokenize and price differently. Here's how they actually compare.

Anthropic

Claude Sonnet 4

Context window 200K tokens

Input cost $3.00 / 1M

Output cost $15.00 / 1M

Tokenizer BPE (custom)

~Chars/token ~3.5–4.5

Multimodal ✓ Images, PDFs

Google

Gemini 1.5 Pro

Context window 1M tokens

Input cost $1.25 / 1M

Output cost $5.00 / 1M

Tokenizer SentencePiece

~Chars/token ~3.0–4.0

Multimodal ✓ Video, Audio, Images

OpenAI

GPT-4o / Codex

Context window 128K tokens

Input cost $2.50 / 1M

Output cost $10.00 / 1M

Tokenizer tiktoken (cl100k)

~Chars/token ~4.0–4.5

Multimodal ✓ Images

Important: Output tokens cost 3–5× more than input tokens across all models. That verbose AI response you love? It's costing 5x more per token than your question did. The model "thinking out loud" in chain-of-thought reasoning also burns output tokens silently before it gives you the final answer.

03 — Hidden Costs

Token vampires:
what's draining you

These are the silent token consumers most people never think about. Click each to reveal how bad it really is.

👻

System Prompts

▲ RESENT EVERY SINGLE MESSAGE

Your system prompt isn't sent once — it's attached to every single API call. A 500-token system prompt on 1,000 daily API calls = 500,000 extra tokens per day. That's $1.50/day just in system prompt overhead with Claude Sonnet.

// 500 token system prompt × 1,000 calls/day = 500,000 extra tokens/day = ~$1.50/day just in overhead = ~$547/year in wasted system prompts

Fix: Keep system prompts lean. Move static reference docs to retrieval (RAG) instead of stuffing them in the prompt.

📜

Conversation History

▲ GROWS QUADRATICALLY PER CHAT

AI models have no memory. Every message in a chat gets re-sent in full to the API. By message 20, you might be sending 5,000 tokens of old conversation just to ask one new question. A 30-turn conversation can easily run 15,000+ input tokens — even if each message was short.

Turn 1: 100 tokens sent Turn 5: 600 tokens sent (all history) Turn 10: 1,400 tokens sent Turn 20: 4,200 tokens sent Turn 30: 9,800 tokens sent 🔥

Fix: Implement conversation summarization — replace old messages with a compressed summary every N turns.

🖼️

Images & Vision

▲ 1 IMAGE = UP TO 1,700 TOKENS

Uploading an image to a vision model doesn't cost "a little extra." A high-res image with Claude can cost up to 1,700 tokens — just for the image itself, before you've typed a word. Low-res mode can drop this to ~85 tokens, but you often get that tradeoff automatically.

High-res image → up to 1,700 tokens Low-res image → ~85 tokens Full-page PDF → ~1,500+ tokens per page Video (Gemini) → charged per frame extracted

Fix: Resize images before sending. Most tasks don't need full resolution. Use low-res mode when available.

💬

Verbose Prompting

▲ PLEASANTRIES ARE EXPENSIVE

"Hi! I hope you're doing well today. Could you please help me with something? I'm working on a project and I was wondering if you might be able to..." — this preamble costs ~40 tokens and adds zero value. At scale, politeness is pricey.

❌ "Hi! I hope this finds you well. Could you please summarize the following text for me?" ✓ "Summarize:"

Fix: Be direct. AI doesn't need social warmth. Remove preamble, filler, and redundant context.

🔄

Re-asking for Context

▲ COPY-PASTING DOCS REPEATEDLY

When you paste a long document into the chat and ask multiple questions about it in the same session, you're re-sending the entire document with every new message. A 10,000-word document pasted into chat becomes ~13,000 tokens, re-sent on every turn.

Turn 1: Paste 13,000 token document + question Turn 2: Same 13,000 tokens + new question Turn 3: Same 13,000 tokens + new question = 39,000+ tokens instead of 13,000

Fix: Use RAG (retrieval-augmented generation) to fetch only the relevant chunk — not the whole document — for each query.

🤔

Chain-of-Thought / Reasoning

▲ THINKING OUT LOUD COSTS OUTPUT $$$

Extended thinking models (like Claude with extended thinking, or o1/o3) emit reasoning tokens before they answer. These are output tokens — the most expensive kind — and you're charged for the model's internal scratchpad even though you often don't see it in the UI.

Your question: 50 tokens Model thinking: 2,000 output tokens (charged!) Final answer: 300 output tokens Total output: 2,300 tokens charged Cost: ~$0.023 for one question

Fix: Only use extended thinking / reasoning modes when complexity genuinely requires it. For simple tasks, it's overkill.

04 — Common Misconceptions

Myths vs.
reality

Click each statement to reveal the truth.

"Shorter sentences use fewer tokens."

Partially true

Mostly yes — but it's not about sentence length, it's about word complexity. "OK" = 1 token. "Entrepreneurship" = 4 tokens. Rare, long, or technical words get split into many subword tokens. A short sentence with complex vocabulary can cost more than a longer one with common words.

"Spaces and punctuation don't cost tokens."

Myth

Every character counts. Spaces are often baked into tokens (e.g., " the" is one token), but punctuation like commas, periods, quotes, and brackets can each consume a token. JSON, with all its brackets and quotes, is notoriously token-expensive compared to plain text.

"All AI models count tokens the same way."

Myth

Claude, GPT-4, and Gemini use different tokenizers with different vocabularies. The same sentence can have different token counts across models. GPT-4 uses cl100k_base, Claude uses a custom BPE, and Gemini uses SentencePiece. Results can vary by 10–25% for the same text.

"Asking the AI to 'be concise' saves you money."

Barely

It can reduce output tokens — but the instruction itself costs input tokens. More importantly, the model may not always comply. You're better off setting `max_tokens` in the API to hard-cap output length, rather than politely asking.

"Using claude.ai or ChatGPT.com means I don't pay per token."

True (for you)

Subscription users on flat-rate plans ($20/month etc.) don't get charged per token directly. But Anthropic and OpenAI do — which is why these services have rate limits, soft caps, and use cheaper/faster models for heavy users. Someone's always paying the token bill.

"Emojis are basically free — just one character."

Myth

Emojis are multi-byte Unicode characters and often consume 1–3 tokens each. Some complex compound emojis (like skin-tone modifiers 👋🏽) can use even more. That emoji-heavy Slack integration might be surprisingly expensive at scale.

"Code is more token-efficient than prose."

Myth

Code is often token-expensive. Variable names, indentation, brackets, semicolons, and comments all stack up. A 50-line Python function might use 400–600 tokens. JSON config files are particularly dense. Minified code helps, but readability usually matters more.

"The model sees my full conversation history as one coherent memory."

Myth

There is no memory. The model receives a flat text block of the full conversation on every call. It processes it all fresh. That's why very long conversations degrade in quality near context limits — the model struggles to attend to early context when it's buried under thousands of tokens.

07 — What You Can Do

Stop burning tokens
you don't need to.

Small changes in how you write prompts and design AI workflows can cut your token usage — and costs — by 40–70%.

Trim your system prompt

Audit your system prompt monthly. Every redundant sentence costs money on every single API call. Keep instructions minimal and precise.

Summarize long chats

After 10–15 turns, replace old messages with a condensed summary. Your context stays fresh and your token count stays manageable.

Set max_tokens hard caps

Don't just ask for concise — enforce it in the API. Setting max_tokens prevents runaway long responses before they happen.

Use RAG for documents

Never paste full documents into chat. Embed them in a vector store and retrieve only the relevant chunks per query. 10x more efficient.

Match model to task

GPT-4o and Claude Sonnet are overkill for simple classification or extraction tasks. Use smaller, cheaper models (Haiku, Flash, GPT-4o-mini) when you don't need full capability.

Skip the pleasantries

AI doesn't need "please" and "thank you." Get straight to the task. In batch workflows, this alone can trim 5–15% of input token overhead.

Tokens are
currency.
Spend wisely.

What even
is a token?

How each model
handles tokens

Token vampires:
what's draining you

Myths vs.
reality

How much is
your workflow costing?

The Fluff Stripper

Stop burning tokens
you don't need to.

The Automation Tool

Want early access?

Help shape the product

Tokens arecurrency.Spend wisely.

What evenis a token?

How each modelhandles tokens

Token vampires:what's draining you

Myths vs.reality

How much isyour workflow costing?

The Fluff Stripper

Stop burning tokensyou don't need to.

The Automation Tool

Want early access?

Help shape the product

Tokens are
currency.
Spend wisely.

What even
is a token?

How each model
handles tokens

Token vampires:
what's draining you

Myths vs.
reality

How much is
your workflow costing?

Stop burning tokens
you don't need to.