Web Analytics Made Easy - Statcounter

Artificial Intelligence

Reduce LLM Token Costs 40–50% Using TOON Format

8 minutes

This blog explains how TOON format helps enterprises cut LLM token usage by 40–50%, significantly reducing AI costs without compromising output quality. Learn a practical, production-ready approach to optimize prompts, pipelines, and large document processing for scalable, cost-efficient AI systems.

LLM token consumption is one of those problems that sneaks up on you. You start with a simple use case, JSON seems perfectly fine, and then your bill shows up. If you’re sending structured reference data in prompts such as product catalogs, entity lists, and reference libraries, you’re probably paying 40-60% more than necessary.

Enter TOON (Token-Oriented Object Notation). It encodes the same JSON data model but uses CSV-style rows for arrays, declaring field names once instead of repeating them endlessly.

The culprit? JSON repeats field names for every single object. That’s fine for APIs and databases, but terrible for LLM token based pricing.

Key Takeaways

Show

  • TOON (Token-Oriented Object Notation) combines YAML’s readability with CSV’s compactness — it declares field names once instead of repeating them per object like JSON does
  • Switching from JSON to TOON for structured reference data (e.g., a 50-KPI library) reduces token count by ~47–50%, directly cutting API costs at scale
  • Output tokens cost more than input tokens in most APIs — using TOON for LLM output extraction yields an additional 40% savings on responses
  • The recommended hybrid approach: store everything as JSON (databases, APIs, logic), convert to TOON only at LLM boundaries, then convert back to JSON immediately
  • TOON only delivers gains with 3+ objects having 4+ fields each — below that threshold, the header overhead outweighs the savings

Problem: JSON is Token-Expensive

Consider this common scenario: extracting KPIs from sustainability reports. You need to send a reference library so the LLM knows what to look for.

Standard JSON format

{

"kpi_categories": {

"Environment": [

{

"name": "Total GHG Emissions",

"keywords": ["emissions", "carbon", "CO2"]

},

{

"name": "Renewable Energy Percentage",

"keywords": ["renewable", "clean energy", "solar"]

},

{

"name": "Water Consumption",

"keywords": ["water usage", "consumption", "m3"]

}

]

}

}

Token Count : 127 tokens (Claude tokenizer)

TOON Format

kpi_categories:

Environment[3]{name,keywords}:

Total GHG Emissions,"emissions,carbon,CO2"

Renewable Energy Percentage,"renewable,clean energy,solar"

Water Consumption,"water usage,consumption,m3"

Token Count : 61 tokens (Claude tokenizer)

Savings : ~50 % reduction. Scale this across 50 API calls per document and hundreds of documents, and you’re looking at real money.

What is TOON?

TOON combines YAML’s readability with CSV’s compactness. The key innovation: arrays of objects declare field names once, then list values row by row.

JSON (repetitive) :

[

{"id": 1, "name": "Alice", "role": "admin"},

{"id": 2, "name": "Bob", "role": "user"}

]

TOON (tabular) :

users[2]{id,name,role}:

1,Alice,admin

2,Bob,user

TOON Syntax Essentials

Objects: YAML-style indentation

user:

name: Alice

role: admin

Primitive Arrays:

tags[3]:

javascript,python,rust

Arrays of Objects (where TOON shines):

products[2]{id,name,price,stock}:

101,Laptop,1200,true

102,Mouse,25,false

The header products[2]{id,name,price,stock}: specifies:

  • products – field name
  • [2] – array length (helps detect truncation)
  • {id,name,price,stock} – field order

Pros

  • 30-50% fewer tokens for uniform arrays
  • Self-documenting (explicit structure)
  • LLMs parse it reliably
  • Built-in validation via length declarations

Cons

  • Requires uniform structure (all objects need the same fields)
  • Smaller ecosystem than JSON
  • Team learning curve

Pro Tips

TOON only helps with 3+ objects having 4+ fields each. Below that, header overhead dominates.

Real-World Use Case: KPI Extraction

Let’s examine a concrete problem: extracting Key Performance Indicators from documents using an LLM.

Problem 1: Sending Reference Data

The LLM needs a KPI library showing what metrics to extract and how to classify them.

Before (JSON): A 50-KPI library in formatted JSON: ~4,400 tokens.

After (TOON):

def create_kpi_reference_toon(kpi_library):

lines = ["kpi_categories:"]

for category, kpis in kpi_library["kpi_categories"].items():

count = len(kpis)

lines.append(f" {category}[{count}]{{name,keywords}}:")

for kpi in kpis:

keywords = ",".join(kpi["keywords"])

lines.append(f' {kpi["name"]},"{keywords}"')

return "\\n".join(lines)

Same 50-KPI library in TOON: ~2,300 tokens. 47.7% reduction

Pros

  • Massive token savings on reference data
  • Clear tabular structure
  • Length validation prevents truncation

Cons

  • One-time conversion overhead
  • Need to maintain conversion function

Problem 2: Receiving Structured Output

When the LLM returns extracted KPIs, JSON is also wasteful.

JSON output (10 KPIs): ~1,500 tokens

TOON output (10 KPIs):

kpis[10]{name,value,unit,year,category,type}:

Total GHG Emissions,125000,tCO2e,2024,Environment,actual

Renewable Energy,45,percent,2024,Environment,actual

Water Consumption,15000,m3,2024,Environment,actual

Token count: ~900 tokens. 40% reduction.

Pros

  • Output tokens cost more than input in most APIs
  • LLMs follow structure more consistently
  • Easy validation (count rows vs header)

Cons

  • Need parser to convert back to JSON
  • Must show LLM an example in prompt

Pro Tips

Always provide a TOON template in your prompt. Don’t just describe – show the exact format.

Simple Conversion Code

You don’t need complex libraries for basic use cases.

JSON to TOON:

def to_toon(data):
    lines = []
    for key, value in data.items():
        if isinstance(value, list) and value and isinstance(value[0], dict):
            # Array of objects - tabular
            fields = ",".join(value[0].keys())
            lines.append(f"{key}[{len(value)}]{{{fields}}}:")
            for obj in value:
                vals = [f'"{v}"' if "," in str(v) else str(v)
                        for v in obj.values()]
                lines.append(f"  {','.join(vals)}")
        else:
            lines.append(f"{key}: {value}")
    return "\\n".join(lines)

TOON to JSON:

def from_toon(toon_str):

lines = toon_str.strip().split("\\n")

header = lines[0]

# Extract fields from header

fields = header[header.index("{")+1:header.index("}")].split(",")

# Parse rows

result = []

for line in lines[1:]:

if line.strip():

values = line.strip().split(",")

obj = {f: v.strip('"') for f, v in zip(fields, values)}

result.append(obj)

return result

These handle 80% of cases. For production, use the official TOON library.

Storage Strategy

Critical rule: Use the right format for the right job.

Database: Always JSON

# GOOD
def save_data(data_json):
    db.execute("INSERT INTO table (...) VALUES (...)", data_json)
# BAD - don't store TOON
def save_data(data_toon_string):
    db.execute("INSERT INTO table (data) VALUES (?)", [data_toon_string])

LLM Prompts: Use TOON

def extract with_llm(reference_json):

# Convert to TOON for prompt

reference_toon = to_toon(reference_json)

prompt = f"""

Reference data:

```toon

{reference_toon}

Extract entities in TOON format."""

response = call_llm(prompt)

# Convert back to JSON for storage

return from_toon(response)

Principle: JSON for storage/processing, TOON only at LLM boundaries.

When to Use TOON vs JSON

Use TOON When:

  • Sending reference datasets to LLMs (catalogs, libraries, entity lists)
  • Receiving structured arrays from LLMs
  • Data is uniform with 4+ objects and 4+ fields
  • Token costs are significant (high volume, large contexts)

Use JSON When:

  • Storing in databases or calling APIs
  • Data is deeply nested or non-uniform
  • Arrays are small (< 3 objects)
  • Team velocity matters more than cost
  1. Store everything as JSON
  2. Convert to TOON before LLM calls
  3. Request TOON from LLM for structured outputs
  4. Convert back to JSON immediately
  5. Keep business logic in JSON

Benchmarks and Savings

TOON benchmarks (Claude tokenizer):

  • Mixed structures: 22% average savings vs JSON (Although JSON preferred for complex structure)
  • Tabular data: 40-50% savings vs JSON
  • Nested configs: 30-35% savings vs JSON

Practical Implementation Tips

1. Wrap TOON in code block :

prompt = f"""
Task refering the tabular structured data :
{toon_data}
Your task: ...
"""

2. Show Examples, don’t describe

```python
prompt = f"""
Return format:
results[N]{{id,name,score}}:
  1,Alice,95
  2,Bob,87
Your task: Actual task refering the TOON format """

3. Validate structure

def validate(response):
    header = response.split("\\n")[0]
    count = int(header[header.index("[")+1:header.index("]")])
    rows = [l for l in response.split("\\n")[1:] if l.strip()]
    assert len(rows) == count

4. Keep conversion at boundaries

def process(data_json):
    # Business logic uses JSON
    processed = transform(data_json)
    # TOON only for LLM
    toon = to_toon(processed)
    llm_response = call_llm(toon)
    return from_toon(llm_response)

Conclusion: Solve the Token Cost Problem

TOON addresses a specific problem: JSON is token-inefficient for structured data in LLM prompts.

TOON is effective for:

  • Reference data with uniform structure
  • Structured LLM outputs (arrays of objects)
  • Data that’s 30-70% tabular

Skip TOON for:

  • Deeply nested unique objects
  • Tiny datasets (< 3 objects)
  • When team velocity > cost optimization

Implementation strategy:

  1. Keep JSON everywhere (storage, APIs, logic)
  2. Use TOON at LLM boundaries only
  3. Start with highest token-usage prompt
  4. Measure savings and reliability
  5. Expand gradually

The token savings are measurable: 40-50% reduction on prompts with structured reference data. For high-volume LLM applications, that translates to real cost reduction. More importantly, TOON’s explicit structure often improves LLM response reliability – fewer malformed outputs, easier validation.

The conversion code is simple (20-30 lines). If you’re spending significant money on LLM tokens and working with structured data, TOON is worth testing on your highest-cost prompt.

Want to cut your LLM costs by 40–50% in production?

Intuz helps teams redesign prompts, data formats, and AI pipelines for real, measurable savings.

Book a free 45-minute consultation to review your highest-cost LLM workflows and identify immediate optimization opportunities.

FAQs

How does the TOON format reduce LLM token usage by 40–50%?

TOON reduces token usage by replacing verbose JSON and repeated keys with a compact, schema-driven structure. It eliminates redundant labels, compresses nested objects, and standardizes value positions. This allows LLMs to process the same information with significantly fewer tokens while maintaining semantic clarity and output accuracy.

Is TOON better than JSON for enterprise LLM pipelines?

Yes, especially for high-volume enterprise pipelines. JSON is human-readable but token-inefficient for LLMs. TOON is optimized for machine consumption, making it ideal for prompts, agent memory, tool responses, and intermediate outputs. Enterprises processing millions of tokens daily see immediate cost and latency improvements using TOON.

Will using TOON impact LLM accuracy or response quality?

No, when implemented correctly. TOON preserves semantic meaning while reducing structural overhead. In many cases, response quality improves because shorter prompts reduce context noise and truncation risks. Enterprises often observe more consistent outputs, especially in agent workflows, retrieval-augmented generation (RAG), and multi-step reasoning pipelines.

Where does TOON deliver the highest cost savings in AI systems?

TOON delivers maximum savings in prompt construction, agent-to-agent communication, tool outputs, and RAG contexts. These layers repeatedly pass structured data to LLMs, making them prime candidates for token optimization. Enterprises running chatbots, autonomous agents, document processing, or analytics pipelines benefit the most.

How difficult is it to migrate existing LLM pipelines to TOON?

Migration is straightforward and low-risk. TOON acts as a drop-in replacement for structured inputs and outputs, requiring minimal changes to business logic. Most teams convert schemas, update prompt templates, and add a lightweight encoder/decoder. Enterprises typically complete migration in days—not weeks.

Insights

Proof Before Praise

Guides, benchmarks, and the math behind our claims.

How to Build an AI Agents on On-Premises Data With RAG and Private LLMs

Article

Guide

Artificial Intelligence

How to Build an AI Agents on On-Premises Data With RAG and Private LLMs

May 2026

11 min read
How to Build Multi-Agent Workflows Using LangChain

Article

Guide

Artificial Intelligence

How to Build Multi-Agent Workflows Using LangChain

Apr 2026

14 min read
How LLMs Use Vocabulary for Token Counting

Article

Guide

Artificial Intelligence

How LLMs Use Vocabulary for Token Counting: BPE, Context Limits & API Costs Explained

May 2026

9 min read
See all Articles