Reduce LLM Token Costs 50% Using TOON Format

This blog explains how TOON format helps enterprises cut LLM token usage by 40–50%, significantly reducing AI costs without compromising output quality. Learn a practical, production-ready approach to optimize prompts, pipelines, and large document processing for scalable, cost-efficient AI systems.

LLM token consumption is one of those problems that sneaks up on you. You start with a simple use case, JSON seems perfectly fine, and then your bill shows up. If you’re sending structured reference data in prompts such as product catalogs, entity lists, and reference libraries, you’re probably paying 40-60% more than necessary.

Enter TOON (Token-Oriented Object Notation). It encodes the same JSON data model but uses CSV-style rows for arrays, declaring field names once instead of repeating them endlessly.

The culprit? JSON repeats field names for every single object. That’s fine for APIs and databases, but terrible for LLM token based pricing.

Key Takeaways

Show

TOON (Token-Oriented Object Notation) combines YAML’s readability with CSV’s compactness — it declares field names once instead of repeating them per object like JSON does
Switching from JSON to TOON for structured reference data (e.g., a 50-KPI library) reduces token count by ~47–50%, directly cutting API costs at scale
Output tokens cost more than input tokens in most APIs — using TOON for LLM output extraction yields an additional 40% savings on responses
The recommended hybrid approach: store everything as JSON (databases, APIs, logic), convert to TOON only at LLM boundaries, then convert back to JSON immediately
TOON only delivers gains with 3+ objects having 4+ fields each — below that threshold, the header overhead outweighs the savings

Problem: JSON is Token-Expensive

Consider this common scenario: extracting KPIs from sustainability reports. You need to send a reference library so the LLM knows what to look for.

Standard JSON format

{

"kpi_categories": {

"Environment": [

{

"name": "Total GHG Emissions",

"keywords": ["emissions", "carbon", "CO2"]

},

{

"name": "Renewable Energy Percentage",

"keywords": ["renewable", "clean energy", "solar"]

},

{

"name": "Water Consumption",

"keywords": ["water usage", "consumption", "m3"]

}

]

}

}

Token Count : 127 tokens (Claude tokenizer)

TOON Format

kpi_categories:

Environment[3]{name,keywords}:

Total GHG Emissions,"emissions,carbon,CO2"

Renewable Energy Percentage,"renewable,clean energy,solar"

Water Consumption,"water usage,consumption,m3"

Token Count : 61 tokens (Claude tokenizer)

Savings : ~50 % reduction. Scale this across 50 API calls per document and hundreds of documents, and you’re looking at real money.

What is TOON?

TOON combines YAML’s readability with CSV’s compactness. The key innovation: arrays of objects declare field names once, then list values row by row.

JSON (repetitive) :

[

{"id": 1, "name": "Alice", "role": "admin"},

{"id": 2, "name": "Bob", "role": "user"}

]

TOON (tabular) :

users[2]{id,name,role}:

1,Alice,admin

2,Bob,user

TOON Syntax Essentials

Objects: YAML-style indentation

user:

name: Alice

role: admin

Primitive Arrays:

tags[3]:

javascript,python,rust

Arrays of Objects (where TOON shines):

products[2]{id,name,price,stock}:

101,Laptop,1200,true

102,Mouse,25,false

The header products[2]{id,name,price,stock}: specifies:

products – field name
[2] – array length (helps detect truncation)
{id,name,price,stock} – field order

Pros

30-50% fewer tokens for uniform arrays
Self-documenting (explicit structure)
LLMs parse it reliably
Built-in validation via length declarations

Cons

Requires uniform structure (all objects need the same fields)
Smaller ecosystem than JSON
Team learning curve

Pro Tips

TOON only helps with 3+ objects having 4+ fields each. Below that, header overhead dominates.

Real-World Use Case: KPI Extraction

Let’s examine a concrete problem: extracting Key Performance Indicators from documents using an LLM.

Problem 1: Sending Reference Data

The LLM needs a KPI library showing what metrics to extract and how to classify them.

Before (JSON): A 50-KPI library in formatted JSON: ~4,400 tokens.

After (TOON):

def create_kpi_reference_toon(kpi_library):

lines = ["kpi_categories:"]

for category, kpis in kpi_library["kpi_categories"].items():

count = len(kpis)

lines.append(f" {category}[{count}]{{name,keywords}}:")

for kpi in kpis:

keywords = ",".join(kpi["keywords"])

lines.append(f' {kpi["name"]},"{keywords}"')

return "\\n".join(lines)

Same 50-KPI library in TOON: ~2,300 tokens. 47.7% reduction

Pros

Massive token savings on reference data
Clear tabular structure
Length validation prevents truncation

Cons

One-time conversion overhead
Need to maintain conversion function

Problem 2: Receiving Structured Output

When the LLM returns extracted KPIs, JSON is also wasteful.

JSON output (10 KPIs): ~1,500 tokens

TOON output (10 KPIs):

kpis[10]{name,value,unit,year,category,type}:

Total GHG Emissions,125000,tCO2e,2024,Environment,actual

Renewable Energy,45,percent,2024,Environment,actual

Water Consumption,15000,m3,2024,Environment,actual

Token count: ~900 tokens. 40% reduction.

Pros

Output tokens cost more than input in most APIs
LLMs follow structure more consistently
Easy validation (count rows vs header)

Cons

Need parser to convert back to JSON
Must show LLM an example in prompt

Pro Tips

Always provide a TOON template in your prompt. Don’t just describe – show the exact format.

Simple Conversion Code

You don’t need complex libraries for basic use cases.

JSON to TOON:

def to_toon(data):
    lines = []
    for key, value in data.items():
        if isinstance(value, list) and value and isinstance(value[0], dict):
            # Array of objects - tabular
            fields = ",".join(value[0].keys())
            lines.append(f"{key}[{len(value)}]{{{fields}}}:")
            for obj in value:
                vals = [f'"{v}"' if "," in str(v) else str(v)
                        for v in obj.values()]
                lines.append(f"  {','.join(vals)}")
        else:
            lines.append(f"{key}: {value}")
    return "\\n".join(lines)

TOON to JSON:

def from_toon(toon_str):

lines = toon_str.strip().split("\\n")

header = lines[0]

# Extract fields from header

fields = header[header.index("{")+1:header.index("}")].split(",")

# Parse rows

result = []

for line in lines[1:]:

if line.strip():

values = line.strip().split(",")

obj = {f: v.strip('"') for f, v in zip(fields, values)}

result.append(obj)

return result

These handle 80% of cases. For production, use the official TOON library.

Storage Strategy

Critical rule: Use the right format for the right job.

Database: Always JSON

# GOOD
def save_data(data_json):
    db.execute("INSERT INTO table (...) VALUES (...)", data_json)
# BAD - don't store TOON
def save_data(data_toon_string):
    db.execute("INSERT INTO table (data) VALUES (?)", [data_toon_string])

LLM Prompts: Use TOON

def extract with_llm(reference_json):

# Convert to TOON for prompt

reference_toon = to_toon(reference_json)

prompt = f"""

Reference data:

```toon

{reference_toon}

Extract entities in TOON format."""

response = call_llm(prompt)

# Convert back to JSON for storage

return from_toon(response)

Principle: JSON for storage/processing, TOON only at LLM boundaries.

When to Use TOON vs JSON

Use TOON When:

Sending reference datasets to LLMs (catalogs, libraries, entity lists)
Receiving structured arrays from LLMs
Data is uniform with 4+ objects and 4+ fields
Token costs are significant (high volume, large contexts)

Use JSON When:

Storing in databases or calling APIs
Data is deeply nested or non-uniform
Arrays are small (< 3 objects)
Team velocity matters more than cost

Hybrid Approach (Recommended):

Store everything as JSON
Convert to TOON before LLM calls
Request TOON from LLM for structured outputs
Convert back to JSON immediately
Keep business logic in JSON

Benchmarks and Savings

TOON benchmarks (Claude tokenizer):

Mixed structures: 22% average savings vs JSON (Although JSON preferred for complex structure)
Tabular data: 40-50% savings vs JSON
Nested configs: 30-35% savings vs JSON

Practical Implementation Tips

1. Wrap TOON in code block :

prompt = f"""
Task refering the tabular structured data :
{toon_data}
Your task: ...
"""

2. Show Examples, don’t describe

```python
prompt = f"""
Return format:
results[N]{{id,name,score}}:
  1,Alice,95
  2,Bob,87
Your task: Actual task refering the TOON format """

3. Validate structure

def validate(response):
    header = response.split("\\n")[0]
    count = int(header[header.index("[")+1:header.index("]")])
    rows = [l for l in response.split("\\n")[1:] if l.strip()]
    assert len(rows) == count

4. Keep conversion at boundaries

def process(data_json):
    # Business logic uses JSON
    processed = transform(data_json)
    # TOON only for LLM
    toon = to_toon(processed)
    llm_response = call_llm(toon)
    return from_toon(llm_response)

Conclusion: Solve the Token Cost Problem

TOON addresses a specific problem: JSON is token-inefficient for structured data in LLM prompts.

TOON is effective for:

Reference data with uniform structure
Structured LLM outputs (arrays of objects)
Data that’s 30-70% tabular

Skip TOON for:

Deeply nested unique objects
Tiny datasets (< 3 objects)
When team velocity > cost optimization

Implementation strategy:

Keep JSON everywhere (storage, APIs, logic)
Use TOON at LLM boundaries only
Start with highest token-usage prompt
Measure savings and reliability
Expand gradually

The token savings are measurable: 40-50% reduction on prompts with structured reference data. For high-volume LLM applications, that translates to real cost reduction. More importantly, TOON’s explicit structure often improves LLM response reliability – fewer malformed outputs, easier validation.

The conversion code is simple (20-30 lines). If you’re spending significant money on LLM tokens and working with structured data, TOON is worth testing on your highest-cost prompt.

Want to cut your LLM costs by 40–50% in production?

Intuz helps teams redesign prompts, data formats, and AI pipelines for real, measurable savings.

Book a free 45-minute consultation to review your highest-cost LLM workflows and identify immediate optimization opportunities.

FAQs

How does the TOON format reduce LLM token usage by 40–50%?

TOON reduces token usage by replacing verbose JSON and repeated keys with a compact, schema-driven structure. It eliminates redundant labels, compresses nested objects, and standardizes value positions. This allows LLMs to process the same information with significantly fewer tokens while maintaining semantic clarity and output accuracy.

Is TOON better than JSON for enterprise LLM pipelines?

Yes, especially for high-volume enterprise pipelines. JSON is human-readable but token-inefficient for LLMs. TOON is optimized for machine consumption, making it ideal for prompts, agent memory, tool responses, and intermediate outputs. Enterprises processing millions of tokens daily see immediate cost and latency improvements using TOON.

Will using TOON impact LLM accuracy or response quality?

No, when implemented correctly. TOON preserves semantic meaning while reducing structural overhead. In many cases, response quality improves because shorter prompts reduce context noise and truncation risks. Enterprises often observe more consistent outputs, especially in agent workflows, retrieval-augmented generation (RAG), and multi-step reasoning pipelines.

Where does TOON deliver the highest cost savings in AI systems?

TOON delivers maximum savings in prompt construction, agent-to-agent communication, tool outputs, and RAG contexts. These layers repeatedly pass structured data to LLMs, making them prime candidates for token optimization. Enterprises running chatbots, autonomous agents, document processing, or analytics pipelines benefit the most.

How difficult is it to migrate existing LLM pipelines to TOON?

Migration is straightforward and low-risk. TOON acts as a drop-in replacement for structured inputs and outputs, requiring minimal changes to business logic. Most teams convert schemas, update prompt templates, and add a lightweight encoder/decoder. Enterprises typically complete migration in days—not weeks.

Kamal Rupareliya

Director of Products

Kamal Rupareliya, a Director of Products at Intuz, focuses on innovation through technology such as IoT, JAMStack, and Serverless Computing. He is an expert in IoT, Mobile Design, and Product Strategy, and he loves applying inventive ways to utilize technology and empathy towards creating remarkable digital software products.

Artificial Intelligence Generative AI & LLMs

Problem: JSON is Token-Expensive

What is TOON?

TOON Syntax Essentials

Real-World Use Case: KPI Extraction

Problem 1: Sending Reference Data

Problem 2: Receiving Structured Output

Simple Conversion Code

Storage Strategy

When to Use TOON vs JSON

Use TOON When:

Use JSON When:

Hybrid Approach (Recommended):

Benchmarks and Savings

Practical Implementation Tips

1. Wrap TOON in code block :

2. Show Examples, don’t describe

3. Validate structure

4. Keep conversion at boundaries

Conclusion: Solve the Token Cost Problem

TOON is effective for:

Skip TOON for:

Implementation strategy:

Want to cut your LLM costs by 40–50% in production?

FAQs

How does the TOON format reduce LLM token usage by 40–50%?

Is TOON better than JSON for enterprise LLM pipelines?

Will using TOON impact LLM accuracy or response quality?

Where does TOON deliver the highest cost savings in AI systems?

How difficult is it to migrate existing LLM pipelines to TOON?

Kamal Rupareliya

Proof Before Praise

How to Build an AI Agents on On-Premises Data With RAG and Private LLMs

How to Build Multi-Agent Workflows Using LangChain

How LLMs Use Vocabulary for Token Counting: BPE, Context Limits & API Costs Explained

Tell us what can't fail.

Tell us what
can't fail.