This blog explains how TOON format helps enterprises cut LLM token usage by 40–50%, significantly reducing AI costs without compromising output quality. Learn a practical, production-ready approach to optimize prompts, pipelines, and large document processing for scalable, cost-efficient AI systems.
LLM token consumption is one of those problems that sneaks up on you. You start with a simple use case, JSON seems perfectly fine, and then your bill shows up. If you’re sending structured reference data in prompts such as product catalogs, entity lists, and reference libraries, you’re probably paying 40-60% more than necessary.
Enter TOON (Token-Oriented Object Notation). It encodes the same JSON data model but uses CSV-style rows for arrays, declaring field names once instead of repeating them endlessly.
The culprit? JSON repeats field names for every single object. That’s fine for APIs and databases, but terrible for LLM token based pricing.
Show
- TOON (Token-Oriented Object Notation) combines YAML’s readability with CSV’s compactness — it declares field names once instead of repeating them per object like JSON does
- Switching from JSON to TOON for structured reference data (e.g., a 50-KPI library) reduces token count by ~47–50%, directly cutting API costs at scale
- Output tokens cost more than input tokens in most APIs — using TOON for LLM output extraction yields an additional 40% savings on responses
- The recommended hybrid approach: store everything as JSON (databases, APIs, logic), convert to TOON only at LLM boundaries, then convert back to JSON immediately
- TOON only delivers gains with 3+ objects having 4+ fields each — below that threshold, the header overhead outweighs the savings
Problem: JSON is Token-Expensive
Consider this common scenario: extracting KPIs from sustainability reports. You need to send a reference library so the LLM knows what to look for.
Standard JSON format
{
"kpi_categories": {
"Environment": [
{
"name": "Total GHG Emissions",
"keywords": ["emissions", "carbon", "CO2"]
},
{
"name": "Renewable Energy Percentage",
"keywords": ["renewable", "clean energy", "solar"]
},
{
"name": "Water Consumption",
"keywords": ["water usage", "consumption", "m3"]
}
]
}
}
Token Count : 127 tokens (Claude tokenizer)
TOON Format
kpi_categories:
Environment[3]{name,keywords}:
Total GHG Emissions,"emissions,carbon,CO2"
Renewable Energy Percentage,"renewable,clean energy,solar"
Water Consumption,"water usage,consumption,m3"
Token Count : 61 tokens (Claude tokenizer)
Savings : ~50 % reduction. Scale this across 50 API calls per document and hundreds of documents, and you’re looking at real money.
What is TOON?
TOON combines YAML’s readability with CSV’s compactness. The key innovation: arrays of objects declare field names once, then list values row by row.
JSON (repetitive) :
[
{"id": 1, "name": "Alice", "role": "admin"},
{"id": 2, "name": "Bob", "role": "user"}
]
TOON (tabular) :
users[2]{id,name,role}:
1,Alice,admin
2,Bob,user
TOON Syntax Essentials
Objects: YAML-style indentation
user: name: Alice role: admin
Primitive Arrays:
tags[3]: javascript,python,rust
Arrays of Objects (where TOON shines):
products[2]{id,name,price,stock}:
101,Laptop,1200,true
102,Mouse,25,false
The header products[2]{id,name,price,stock}: specifies:
- products – field name
- [2] – array length (helps detect truncation)
- {id,name,price,stock} – field order
Pros
- 30-50% fewer tokens for uniform arrays
- Self-documenting (explicit structure)
- LLMs parse it reliably
- Built-in validation via length declarations
Cons
- Requires uniform structure (all objects need the same fields)
- Smaller ecosystem than JSON
- Team learning curve
Pro Tips
TOON only helps with 3+ objects having 4+ fields each. Below that, header overhead dominates.
Real-World Use Case: KPI Extraction
Let’s examine a concrete problem: extracting Key Performance Indicators from documents using an LLM.
Problem 1: Sending Reference Data
The LLM needs a KPI library showing what metrics to extract and how to classify them.
Before (JSON): A 50-KPI library in formatted JSON: ~4,400 tokens.
After (TOON):
def create_kpi_reference_toon(kpi_library):
lines = ["kpi_categories:"]
for category, kpis in kpi_library["kpi_categories"].items():
count = len(kpis)
lines.append(f" {category}[{count}]{{name,keywords}}:")
for kpi in kpis:
keywords = ",".join(kpi["keywords"])
lines.append(f' {kpi["name"]},"{keywords}"')
return "\\n".join(lines)
Same 50-KPI library in TOON: ~2,300 tokens. 47.7% reduction
Pros
- Massive token savings on reference data
- Clear tabular structure
- Length validation prevents truncation
Cons
- One-time conversion overhead
- Need to maintain conversion function
Problem 2: Receiving Structured Output
When the LLM returns extracted KPIs, JSON is also wasteful.
JSON output (10 KPIs): ~1,500 tokens
TOON output (10 KPIs):
kpis[10]{name,value,unit,year,category,type}:
Total GHG Emissions,125000,tCO2e,2024,Environment,actual
Renewable Energy,45,percent,2024,Environment,actual
Water Consumption,15000,m3,2024,Environment,actual
Token count: ~900 tokens. 40% reduction.
Pros
- Output tokens cost more than input in most APIs
- LLMs follow structure more consistently
- Easy validation (count rows vs header)
Cons
- Need parser to convert back to JSON
- Must show LLM an example in prompt
Pro Tips
Always provide a TOON template in your prompt. Don’t just describe – show the exact format.
Simple Conversion Code
You don’t need complex libraries for basic use cases.
JSON to TOON:
def to_toon(data):
lines = []
for key, value in data.items():
if isinstance(value, list) and value and isinstance(value[0], dict):
# Array of objects - tabular
fields = ",".join(value[0].keys())
lines.append(f"{key}[{len(value)}]{{{fields}}}:")
for obj in value:
vals = [f'"{v}"' if "," in str(v) else str(v)
for v in obj.values()]
lines.append(f" {','.join(vals)}")
else:
lines.append(f"{key}: {value}")
return "\\n".join(lines)
TOON to JSON:
def from_toon(toon_str):
lines = toon_str.strip().split("\\n")
header = lines[0]
# Extract fields from header
fields = header[header.index("{")+1:header.index("}")].split(",")
# Parse rows
result = []
for line in lines[1:]:
if line.strip():
values = line.strip().split(",")
obj = {f: v.strip('"') for f, v in zip(fields, values)}
result.append(obj)
return result
These handle 80% of cases. For production, use the official TOON library.
Storage Strategy
Critical rule: Use the right format for the right job.
Database: Always JSON
# GOOD
def save_data(data_json):
db.execute("INSERT INTO table (...) VALUES (...)", data_json)
# BAD - don't store TOON
def save_data(data_toon_string):
db.execute("INSERT INTO table (data) VALUES (?)", [data_toon_string])
LLM Prompts: Use TOON
def extract with_llm(reference_json):
# Convert to TOON for prompt
reference_toon = to_toon(reference_json)
prompt = f"""
Reference data:
```toon
{reference_toon}
Extract entities in TOON format."""
response = call_llm(prompt)
# Convert back to JSON for storage
return from_toon(response)
Principle: JSON for storage/processing, TOON only at LLM boundaries.
When to Use TOON vs JSON
Use TOON When:
- Sending reference datasets to LLMs (catalogs, libraries, entity lists)
- Receiving structured arrays from LLMs
- Data is uniform with 4+ objects and 4+ fields
- Token costs are significant (high volume, large contexts)
Use JSON When:
- Storing in databases or calling APIs
- Data is deeply nested or non-uniform
- Arrays are small (< 3 objects)
- Team velocity matters more than cost
Hybrid Approach (Recommended):
- Store everything as JSON
- Convert to TOON before LLM calls
- Request TOON from LLM for structured outputs
- Convert back to JSON immediately
- Keep business logic in JSON
Benchmarks and Savings
TOON benchmarks (Claude tokenizer):
- Mixed structures: 22% average savings vs JSON (Although JSON preferred for complex structure)
- Tabular data: 40-50% savings vs JSON
- Nested configs: 30-35% savings vs JSON
Practical Implementation Tips
1. Wrap TOON in code block :
prompt = f"""
Task refering the tabular structured data :
{toon_data}
Your task: ...
"""
2. Show Examples, don’t describe
```python
prompt = f"""
Return format:
results[N]{{id,name,score}}:
1,Alice,95
2,Bob,87
Your task: Actual task refering the TOON format """
3. Validate structure
def validate(response):
header = response.split("\\n")[0]
count = int(header[header.index("[")+1:header.index("]")])
rows = [l for l in response.split("\\n")[1:] if l.strip()]
assert len(rows) == count
4. Keep conversion at boundaries
def process(data_json):
# Business logic uses JSON
processed = transform(data_json)
# TOON only for LLM
toon = to_toon(processed)
llm_response = call_llm(toon)
return from_toon(llm_response)
Conclusion: Solve the Token Cost Problem
TOON addresses a specific problem: JSON is token-inefficient for structured data in LLM prompts.
TOON is effective for:
- Reference data with uniform structure
- Structured LLM outputs (arrays of objects)
- Data that’s 30-70% tabular
Skip TOON for:
- Deeply nested unique objects
- Tiny datasets (< 3 objects)
- When team velocity > cost optimization
Implementation strategy:
- Keep JSON everywhere (storage, APIs, logic)
- Use TOON at LLM boundaries only
- Start with highest token-usage prompt
- Measure savings and reliability
- Expand gradually
The token savings are measurable: 40-50% reduction on prompts with structured reference data. For high-volume LLM applications, that translates to real cost reduction. More importantly, TOON’s explicit structure often improves LLM response reliability – fewer malformed outputs, easier validation.
The conversion code is simple (20-30 lines). If you’re spending significant money on LLM tokens and working with structured data, TOON is worth testing on your highest-cost prompt.
Want to cut your LLM costs by 40–50% in production?
Intuz helps teams redesign prompts, data formats, and AI pipelines for real, measurable savings.
Book a free 45-minute consultation to review your highest-cost LLM workflows and identify immediate optimization opportunities.
FAQs
How does the TOON format reduce LLM token usage by 40–50%?
TOON reduces token usage by replacing verbose JSON and repeated keys with a compact, schema-driven structure. It eliminates redundant labels, compresses nested objects, and standardizes value positions. This allows LLMs to process the same information with significantly fewer tokens while maintaining semantic clarity and output accuracy.
Is TOON better than JSON for enterprise LLM pipelines?
Yes, especially for high-volume enterprise pipelines. JSON is human-readable but token-inefficient for LLMs. TOON is optimized for machine consumption, making it ideal for prompts, agent memory, tool responses, and intermediate outputs. Enterprises processing millions of tokens daily see immediate cost and latency improvements using TOON.
Will using TOON impact LLM accuracy or response quality?
No, when implemented correctly. TOON preserves semantic meaning while reducing structural overhead. In many cases, response quality improves because shorter prompts reduce context noise and truncation risks. Enterprises often observe more consistent outputs, especially in agent workflows, retrieval-augmented generation (RAG), and multi-step reasoning pipelines.
Where does TOON deliver the highest cost savings in AI systems?
TOON delivers maximum savings in prompt construction, agent-to-agent communication, tool outputs, and RAG contexts. These layers repeatedly pass structured data to LLMs, making them prime candidates for token optimization. Enterprises running chatbots, autonomous agents, document processing, or analytics pipelines benefit the most.
How difficult is it to migrate existing LLM pipelines to TOON?
Migration is straightforward and low-risk. TOON acts as a drop-in replacement for structured inputs and outputs, requiring minimal changes to business logic. Most teams convert schemas, update prompt templates, and add a lightweight encoder/decoder. Enterprises typically complete migration in days—not weeks.