How LLMs Count Tokens: Vocabulary, Encoding & Costs

Explore how the vocabulary-based tokenization process in LLMs and how it impacts prompt design, context length, and overall system performance.

If you’ve worked with large language models (LLMs), you’ve probably asked:

“Why am I paying for tokens and not words?”

At first, it sounds odd. But once you understand how these models work, it becomes clear that tokens aren’t just a billing unit. Tokens are the fundamental building blocks LLMs actually use to read, understand, and generate text.

Understanding tokenization is essential for optimizing prompts, managing costs, and avoiding truncation issues in your LLM applications.

Key Takeaways

Show

Tokens, not words, are the currency of LLMs — text must be converted into numeric token IDs before an LLM can process it, making tokenization the foundational step in all LLM workflows.
Subword tokenization (BPE) strikes the balance — full-word vocabularies would require 500K+ entries; character-level models lose semantic context; BPE with 32K–100K tokens efficiently handles multilingual, technical, and messy real-world text.
Vocabulary size directly affects token count and cost — GPT-4’s 100K vocabulary encodes more per token than Llama 2’s 32K vocabulary, meaning the same prompt can cost meaningfully different amounts across models.
Token count ≠ word count — technical jargon, code snippets, and non-Latin scripts generate significantly more tokens per word, making rare or complex language disproportionately expensive in API billing.
Token management affects more than cost — exceeding context window limits causes truncation in RAG pipelines and multi-turn chats, and more tokens mean larger embedding vectors, higher memory usage, and slower inference.

Why LLMs Use Tokens Instead of Words

Tokens exist because machines don’t understand words the way humans do. LLMs don’t process text like we read it they only work with numbers. So before text enters the model, it must be converted into a form the model can handle: tokens.

Full words are messy for machines. Tokens solve three major problems:

1. Languages are different

English uses spaces, Chinese doesn’t, and German often glues words together.
Words can’t be consistently separated across all languages.

2. Word vocabulary explodes

Variations like run, runs, ran, running, runner are countless.
Treating each word as a separate unit would require a massive vocabulary. Tokens let the model reuse subword pieces.

3. Real-world text is messy

Usernames, URLs, typos, and code identifiers aren’t standard words but still need to be processed.

Example

unhappiness = un + happy + ness (3 tokens)
xK93LmQp_auth = x + K93 + Lm + Qp + _ + auth (6 tokens)

Key reason LLMs use subword tokens instead of full words:

Full-word vocabularies would require 500K+ entries to cover English alone
Character-level models lose semantic context and create extremely long sequences
Subword tokenization (BPE) balances coverage and efficiency with 32K–100K tokens

What Exactly is a Token?

Tokens are the atomic units LLMs understand. They can be:

Whole words: “cat”, “house”
Subwords: “running” = “run” + “ning”
Punctuation or spaces: “!” or ” “

Text Unit	Example	Approximate Tokens
Common short word	“the”, “is”, “run”	1 token
Average English word	“important”, “system”	1–2 tokens
Technical/rare word	“tokenization”, “embeddings”	2–3 tokens
Code snippet	def calculate():	~4–6 tokens
Non-Latin script (Chinese/Japanese)	3-character phrase	3–6 tokens

Why split like this? LLMs have a fixed-size vocabulary, usually around 50k–100k tokens. Each token has an embedding in the model. If the model tried to treat every word separately, the vocabulary would become unmanageably large.

How Every LLM Vocabulary Works

Every LLM comes with a vocabulary file — essentially a lookup table mapping strings to token IDs. When text is input:

The tokenizer scans the text
Finds the longest matches in the vocabulary
Maps them to token IDs

Without a vocabulary, the model wouldn’t know how to represent text internally, and token counting would be impossible.

Model	Tokenizer	Vocabulary Size	Notes
GPT-4 / GPT-4o	cl100k_base (tiktoken)	~100,000	Larger vocab = fewer tokens per prompt
GPT-3.5	cl100k_base	~100,000	Same tokenizer as GPT-4
Claude 3 / Claude 3.5 / Claude 4	Anthropic (BPE-based)	~100,000+	Exact size undisclosed; context window up to 200K tokens
Llama 2	SentencePiece	32,000	Smaller vocab; more tokens per equivalent text
Llama 3 / Llama 3.1	tiktoken-based	128,000	Significantly larger than Llama 2
Gemini 1.5 / 2.0	SentencePiece	~256,000	Optimized for multilingual coverage
Mistral 7B	SentencePiece	32,000	Efficient for low-resource deployment

Did you know?

The same text can produce different token counts depending on which model you use. GPT-4’s 100K vocabulary encodes more information per token than Llama 2’s 32K vocabulary. Always use the model-specific tokenizer when counting tokens for cost estimation.

The Role of Vocabulary in Token Counting

1. Vocabulary is the Token Dictionary

Every LLM comes with a vocabulary file essentially a lookup table mapping strings to token IDs. When text is input:

The tokenizer scans the text.
Finds the longest matches in the vocabulary.
Maps them to token IDs.

Example:

Text:"I love programming!"
Tokens: ["I"," love"," program","ming","!"]

Without a vocabulary, the model wouldn’t know how to represent text internally, and token counting would be impossible.

2. Vocabulary Determines Token Count

Token count doesn’t equal word count. It depends on how the tokenizer matches text to its vocabulary:

Common words – usually 1 token
Rare words – split into multiple tokens
Spaces/punctuation – separate tokens

Example:

Text:"I love AI!" = ["I"," love"," AI","!"] - 4 tokens
Text:"Antidisestablishmentarianism" = ["Anti","dis","establish","ment","arian","ism"] - 6 tokens

The vocabulary design affects tokenization:

More granular vocabulary – fewer tokens per rare word, larger embedding tables
Coarser vocabulary – more tokens per rare word, smaller embedding tables

Why Token Counting Matters

1. Cost Control

Most LLM APIs (OpenAI, Azure, Anthropic) charge per token, not per word. That means:

A short prompt with rare words can cost more than a longer prompt with common words.
Miscounting tokens can blow up your bill, especially in batch processing or embeddings.

Example:

PromptA:"I love AI" - 4 tokens
Prompt B:"Antidisestablishmentarianism rocks" - 7 tokens

Even though Prompt B looks similar in length, the rare word splits into multiple tokens almost doubling the cost.

2. Prompt Length Management

Every LLM has a maximum token limit (context window). If your prompt + completion exceeds this:

Input may be truncated
Outputs can be cut off unexpectedly

Accurate token counting helps you:

Split long text into chunks to fit the context window
Design prompts to ensure the model sees all necessary information

This is especially critical for RAG pipelines, summarization, and multi-turn chat applications.

3. Embedding Size & Memory

Each token corresponds to an embedding vector in the model. That means:

More tokens – more vectors – higher memory usage
Large token sequences can slow inference or even cause out-of-memory errors

Example: Generating embeddings for a 10k-token document vs. a 2k-token document can be 5x heavier on memory, even if both have similar word counts.

Efficient token management ensures:

Faster inference
Lower memory footprint
Predictable performance

How Token Count Affects API Costs

Token count doesn’t equal word count. It depends on how the tokenizer matches text to its vocabulary.

Scenario	Word Count	Token Count	Estimated Cost (GPT-4o, $2.50/M input)
Simple English prompt	100 words	~133 tokens	~$0.0003
Technical prompt with jargon	100 words	~150–200 tokens	~$0.0004–0.0005
Code-heavy prompt	100 words	~200–300 tokens	~$0.0005–0.0008
Chinese/Japanese text	100 characters	~150–300 tokens	~$0.0004–0.0008

Conclusion

Tokens aren’t just a billing detail they directly impact cost, performance, and context management. For developers building LLM applications, thinking in tokens instead of words is essential. Understanding your model’s vocabulary and tokenization behavior allows you to:

Optimize prompt length
Avoid unexpected truncation
Control costs effectively
Ensure predictable memory usage

By mastering tokens, you can build efficient, reliable, and cost-effective LLM-powered applications.

Struggling with LLM Costs or Token Limits? Let’s Fix It

Book a free 45-minute consultation with Intuz’s AI architects to review your token usage, prompt design, and LLM pipeline. We’ll help you reduce costs, prevent truncation issues, and design scalable, production-ready AI systems.

FAQs

What role does vocabulary size play in token counting?

LLMs use fixed vocabularies (30k-100k tokens) built via Byte-Pair Encoding (BPE), where frequent byte pairs merge into tokens during training. Larger vocabularies reduce subword splits for common words, lowering token counts (e.g., GPT-4’s 100k vocab packs more info per token), but increase embedding matrix size and compute. Optimal size balances coverage and efficiency.

How does BPE tokenization convert text to tokens?

BPE starts with characters, iteratively merging the most frequent pairs from training corpora until reaching vocabulary size. Out-of-vocabulary words split into known subwords, enabling counting via greedy longest-match lookup. This handles morphology efficiently, as shown in Sennrich et al. (2016), outperforming word-level tokenization.

Why do LLMs struggle with exact letter counting?

Tokenization aggregates subwords into vocabulary units, discarding positional letter info. Post-token, transformers process embeddings without raw strings, leading to failures in reversal tasks. Research shows mid-layer FFNs attempt detokenization via prefix aggregation, peaking at layer 7 before degradation.

What is the difference between WordPiece and BPE for token counts?

WordPiece (BERT) maximizes likelihood by merging high-probability symbol pairs; BPE (GPT/Claude) greedily merges raw frequencies. BPE yields compact counts for morphologically rich languages; WordPiece better handles rare terms. Both achieve 75% word coverage at 30k vocab, but BPE dominates modern LLMs for OOV robustness.

How do LLMs internally reconstruct words from subword tokens?

Models aggregate prefix token representations into the final token’s hidden state, then FFN layers retrieve full-word concepts. Retrieval peaks mid-layers, indicating emergent de-tokenization beyond explicit tokenize.

How can my team reduce LLM token costs in a production AI application?

The fastest wins are prompt caching, retrieval optimization, and model tiering. Prompt caching (available via Anthropic and OpenAI) eliminates repeated processing of static system prompts — reducing input costs by up to 90% on cached portions. In RAG pipelines, reducing top-K retrieval from 10 chunks to 3–5 chunks typically cuts per-query token usage by 50–70% with minimal accuracy loss. Model tiering routes simpler queries to cheaper models (GPT-4o Mini, Claude Haiku) and reserves frontier models for complex reasoning. Structured data compression formats like TOON can reduce token usage by 40–50% for structured reference data in prompts. Together, these strategies can reduce total token costs by 60–80% in high-volume enterprise deployments.

Does Intuz help businesses optimize LLM token usage and reduce AI infrastructure costs?

Yes. Intuz works with B2B teams to audit and redesign LLM pipelines — including tokenization strategy, RAG architecture, prompt compression, and model selection. If your AI system is consuming more tokens than expected, or if you’re scaling an LLM application and want to control costs before they compound, our team can review your highest-cost workflows and identify immediate optimization opportunities. Most enterprise teams see 40–60% token cost reduction after a structured pipeline review. Start with a consultation to benchmark your current token consumption against best-practice architecture patterns.

Kamal Rupareliya

Director of Products

Kamal Rupareliya, a Director of Products at Intuz, focuses on innovation through technology such as IoT, JAMStack, and Serverless Computing. He is an expert in IoT, Mobile Design, and Product Strategy, and he loves applying inventive ways to utilize technology and empathy towards creating remarkable digital software products.

Artificial Intelligence Generative AI & LLMs

AI

Software

IoT

Cloud

How LLMs Use Vocabulary for Token Counting: BPE, Context Limits & API Costs Explained

Why LLMs Use Tokens Instead of Words

1. Languages are different

2. Word vocabulary explodes

3. Real-world text is messy

Key reason LLMs use subword tokens instead of full words:

What Exactly is a Token?

How Every LLM Vocabulary Works

The Role of Vocabulary in Token Counting

1. Vocabulary is the Token Dictionary

2. Vocabulary Determines Token Count

Why Token Counting Matters

1. Cost Control

2. Prompt Length Management

3. Embedding Size & Memory

How Token Count Affects API Costs

Conclusion

FAQs

What role does vocabulary size play in token counting?

How does BPE tokenization convert text to tokens?

Why do LLMs struggle with exact letter counting?

What is the difference between WordPiece and BPE for token counts?

How do LLMs internally reconstruct words from subword tokens?

How can my team reduce LLM token costs in a production AI application?

Does Intuz help businesses optimize LLM token usage and reduce AI infrastructure costs?

Kamal Rupareliya

Tell us what
can't fail.

General Inquiry

New Project

Content Partnership

Job Application

Selling a Product/Service

Section Break

Why LLMs Use Tokens Instead of Words

1. Languages are different

2. Word vocabulary explodes

3. Real-world text is messy

Key reason LLMs use subword tokens instead of full words:

What Exactly is a Token?

How Every LLM Vocabulary Works

The Role of Vocabulary in Token Counting

1. Vocabulary is the Token Dictionary

2. Vocabulary Determines Token Count

Why Token Counting Matters

1. Cost Control

2. Prompt Length Management

3. Embedding Size & Memory

How Token Count Affects API Costs

Conclusion

FAQs

What role does vocabulary size play in token counting?

How does BPE tokenization convert text to tokens?

Why do LLMs struggle with exact letter counting?

What is the difference between WordPiece and BPE for token counts?

How do LLMs internally reconstruct words from subword tokens?

How can my team reduce LLM token costs in a production AI application?

Does Intuz help businesses optimize LLM token usage and reduce AI infrastructure costs?

Kamal Rupareliya

Proof Before Praise

Reduce LLM Token Costs 40–50% Using TOON Format

How to Build an AI Agents on On-Premises Data With RAG and Private LLMs

How to Build an Agentic RAG System on Private Enterprise Data

Tell us what can't fail.

Tell us what
can't fail.