How to Reduce LLM API Costs by 40% with TOON Format
TOON format cuts LLM token usage by 30-60% compared to JSON. Learn what TOON is, how it works, when to use it, and how to convert your data instantly.
- toon
- llm
- ai
- api-costs
- tokens
title: "How to Reduce LLM API Costs by 40% with TOON Format" description: "TOON format cuts LLM token usage by 30-60% compared to JSON. Learn what TOON is, how it works, when to use it, and how to convert your data instantly." date: "2026-05-29" tags: ["toon", "llm", "ai", "api-costs", "tokens"] relatedTool: "json-toon-converter" published: true
LLM APIs charge by the token. Every brace, quote, and repeated key in your JSON prompts costs money. TOON (Token-Oriented Object Notation) is a format designed specifically to reduce token count when sending structured data to LLMs — typically saving 30–60% compared to JSON. This guide shows you what TOON is, where the savings come from, and how to convert your data without changing your application logic.
The Problem: JSON Is Wasteful for LLMs
JSON was designed for machines to parse, not for token-priced models to consume. Every object in an array repeats every key — send 100 users and the literal word name shows up 100 times. Braces, brackets, quotes, and colons all consume tokens while carrying no data. A typical GPT-style API call that costs 1,000 tokens of structured JSON context will often land at 400–600 tokens once converted to TOON.
The cost adds up faster than people expect. Using approximate GPT-4o input pricing of ~$2.50 per 1M tokens (as of mid-2026):
- 10M tokens/month of structured JSON in prompts ≈ $25/month in input costs for the JSON overhead.
- The same data in TOON: ~6M tokens ≈ $15/month, a $10/month saving on input alone.
- At GPT-4 Turbo–tier pricing (~$10 per 1M input tokens), the same workload saves around $40/month.
- Scale to 100M tokens/month and you're saving $100–400/month, depending on model tier.
These are input-token costs only. Output tokens are typically priced 3–5× higher than input tokens, so the savings compound when the model also returns structured data in TOON instead of JSON — and for agent loops where the same data round-trips through multiple LLM calls, that compounding becomes the dominant cost factor.
What Is TOON Format?
TOON stands for Token-Oriented Object Notation. It's a lossless data format that represents the same data as JSON using fewer tokens. It does that by declaring array schemas once in a header, dropping the per-object braces, and using indentation instead of structural characters. Anything encoded in TOON can be decoded back to identical JSON — no information is lost.
Here's the same data in both formats.
JSON (a uniform array of three records, ~58 tokens):
[
{"id": 1, "name": "Alice", "role": "admin", "active": true},
{"id": 2, "name": "Bob", "role": "user", "active": false},
{"id": 3, "name": "Carol", "role": "editor", "active": true}
]
TOON equivalent (~24 tokens):
[3]{id,name,role,active}:
1,Alice,admin,true
2,Bob,user,false
3,Carol,editor,true
What disappeared: id, name, role, and active were written three times each in JSON — they appear once in the TOON header. Every brace is gone, every quote around a key is gone, every colon inside an object is gone. The explicit row count ([3]) and the schema ({id,name,role,active}) also help the LLM reason about the data — it knows up front how many rows to expect and what fields each row has.
When TOON Saves the Most Tokens
| Data shape | Typical savings | Example | | --- | --- | --- | | Uniform array of objects (5+ items) | 40–60% | User lists, product catalogs, log entries | | Mixed object with some arrays | 25–40% | API responses with metadata + data array | | Simple flat object | 15–25% | Config objects, single records | | Deeply nested, non-uniform | 0–10% | Complex nested trees, recursive structures |
TOON's biggest wins come from arrays of objects with the same shape — the more items in the array, the bigger the percentage savings, because each item no longer repeats the key names. The first row of a TOON tabular array roughly breaks even with JSON; every subsequent row is nearly pure payload. For deeply nested or non-uniform data, where most of the bytes are structural rather than repeated, JSON can actually be more compact, and the converter will tell you so with a warning rather than silently producing worse output.
When NOT to Use TOON
TOON is a specialized tool, not a JSON replacement. Reach for it only when LLM tokens are part of your cost equation. Skip it for:
- Traditional REST APIs between services. JSON is universally supported; TOON adds a conversion step with no upside if no LLM is involved.
- Client-server communication that doesn't touch an LLM. Token savings don't matter if you aren't paying per token.
- Deeply nested, non-uniform data. Savings shrink and readability can suffer.
- When the LLM needs to parse the format reliably. Verify your model handles TOON well. GPT-4, Claude, and Gemini do; smaller models may not. Always test on a representative prompt before switching production traffic.
How to Convert JSON to TOON
- Paste your JSON into the CodeScrub JSON ↔ TOON Converter.
- Read the token savings bar — it shows exactly how many tokens you save for that specific payload.
- Copy the TOON output and drop it into your LLM prompt where the JSON used to live.
The converter runs entirely in the browser. Your data is never uploaded, stored, or logged anywhere.
Using TOON in Your Code
For programmatic conversion, the @toon-format/toon npm package handles encode and decode in a few lines:
import { encode, decode } from '@toon-format/toon';
// Convert JSON data to TOON before sending to the LLM
const data = [
{ id: 1, name: 'Alice', role: 'admin' },
{ id: 2, name: 'Bob', role: 'user' },
];
const toonString = encode(data);
// Pass toonString in your prompt instead of JSON.stringify(data)
// Decode TOON responses back to JSON when the model replies
const parsed = decode(toonString);
That's the whole integration. No prompt rewrites, no schema changes, no application logic to refactor — just swap JSON.stringify(data) for encode(data) in the call site that builds your prompt.
Real-World Savings Estimate
Three steps to estimate your own savings:
- Open your LLM API dashboard and find your monthly input-token usage.
- Estimate what share of those tokens is structured data (JSON in prompts). For data-heavy applications — agents, RAG pipelines, function-calling-heavy apps — this is typically 30–60%.
- Multiply that share by 0.4–0.6 (the typical TOON savings rate) to get your estimated reduction in monthly input tokens.
A worked example: 50M total input tokens/month × 40% structured = 20M structured tokens × 45% TOON savings = 9M tokens saved. At ~$2.50 per 1M input tokens, that's about $22.50/month back in your pocket. At enterprise scale (500M tokens/month with the same mix), the same math lands near $225/month — and that's before counting output-token reductions if the model also returns TOON. If you're running an agent framework where every loop iteration round-trips structured state through the LLM, multiply your estimate by the average number of agent steps per task to get a more realistic figure.
The exact number isn't the point. The point is that the savings scale linearly with traffic, require no model change, and need only a one-line conversion in your prompt-building code.
Bottom line
TOON won't replace JSON everywhere, and it shouldn't try to. But for the specific use case of sending structured data to LLMs, the token savings are real, measurable, and free — they require zero changes to your application logic beyond a conversion step. Run a representative payload through the CodeScrub JSON ↔ TOON Converter and see your exact savings on real data before deciding whether to ship it.