In all seriousness, the value proposition is weird to me. The most expensive queries are the ones with huge contexts, and therefore the ones I'd less likely to use cheap models.
I am not sure who is the intended customer for this service.
The prompt and the model go hand in hand. If you randomly select the model the likelihood of getting something consistent is basically zero.
Also model pricing don't very that much. I have never heard of spot-instance equivalent for inference although that will be cool. The demand for GPU is so high right now that I think most datacenters are at 100% utilisation.
Btw landing page does not bring much confidence this is serious. Might want to change it to communicate better and also to be attractive to "developers" I guess.
Depends on what you're doing. Something like "read this text and extract all the phone numbers" or "write a 3-point summary of this email" will perform about the same on all good models.
The authentication for the API seems poorly designed. The auth token is your email address rather than a real auth token. If I know someone uses this service I can send a massive number of requests to cause a large credit card charge with just their email address. I thought this was just a mistake in the obviously LLM-written home page, but the API really does work this way after testing.
On top of that logging in does not require a password, just an email address.
Am I getting this right that there's no auth? Just provide an email and get free requests?
Edit:
This seems to always use sonnet 3.5 no matter the request. I asked it a USAMO problem and it still used sonnet (and hallucinated the wrong answer of course).
Out of frustration, I built an AI API proxy that automatically routes each request to the cheapest available provider in real-time.
The problem: AI API pricing is a mess. OpenAI, Anthropic, and Google all have different pricing models, rate limits, and availability. Switching providers means rewriting code. Most devs just pick one and overpay.
The solution: One endpoint. Drop-in replacement for OpenAI's API. Behind the scenes, it checks current pricing and routes to whichever provider (GPT-4o, Claude, Gemini) costs least for that specific request. If one fails, it falls back to the next cheapest.
How it works:
- Estimates token count before routing
- Queries real-time provider costs from database
- Routes to cheapest available option
- Automatic fallback on provider errors
- Unified response format regardless of provider
Typical savings: 60-90% on most requests, since Gemini Flash is often free/cheapest, but you still get Claude or GPT-4 when needed.
You should probably take the service down before the HN crowd maxes out your credit card with the already discovered security and auth issues. Then find a technical co founder of you still want to pursue this idea and build it from scratch.
Hi, curious, did you know about OpenRouter before building this?
> OpenRouter provides a unified API that gives you access to hundreds of AI models through a single endpoint, while automatically handling fallbacks and selecting the most cost-effective options. Get started with just a few lines of code using your preferred SDK or framework.
It isn't OpenAI API compatible as far as I know, but they have been providing this service for a while...
> Typical savings: 60-90% on most requests, since Gemini Flash is often free/cheapest, but you still get Claude or GPT-4 when needed.
This claim seems overstated. Accurately routing arbitrary prompts to the cheapest viable model is a hard problem. If it were reliably solvable, it would fundamentally disrupt the pricing models of OpenAI and Anthropic. In practice, you'd either sacrifice quality on edge cases or end up re-running failed requests on pricier models anyway, eating into those "savings".
I genuinely wonder the use cases are where the required accuracy is so low (or I guess the prompts are so strong) that you don't need to vigorously use evals to prevent regressions with the model that works best--let alone actually just change models on the fly based on what's cheaper.
Yes and in addition for some reason that use case is also not a fit for some cheap OS model like qwen or kimi, but must be run on the cheapest model of the big three.
As these routing engines evolve, I wonder how you see them handling drift or divergence when different models produce structurally incompatible outputs.
This landing page is vibe coded and littered with mistakes/typos (per 1,000 tokens), outdated models (Gemini 1.5?), the security link at the bottom of the page is an href=#, and I can see the "dashboard" without logging in or signing up.
> Message Privacy: Your API requests are processed and immediately forwarded. We never store or log conversation content.
> Minimal Data: We only store your email and usage records. Nothing else. Your data stays yours.
I see that you have a few LLM-generated comments here, although this one is probably the most egregious amongst them. HN is mostly for humans to interact, so please do not do this here. Thank you.
The post itself is AI too, so we've got a perfect closed circle. Soon HN won't even need the humans anymore, as long as it drives enough hype and seed round funding.
input tokens: $0.5 per 1,000
output tokens: $1.5 per 1,000
that's either one hell of a typo or my god I'll be broke in an hour if I accidentally use this service
Yeah, agreed. This sounds like a proper scam to me...
The website looks like AI, so do we call it a typo or a hallucination?
Not only is it AI its outdated AI.
https://tokensaver.org/api/pricing
Is offering GPT 3.5 Turbo and Gemini 1.5 Pro.
That was exactly my impression. I thought the price was for 1 million tokens.
Mom, I want openrouter.
We have openrouter at home!
In all seriousness, the value proposition is weird to me. The most expensive queries are the ones with huge contexts, and therefore the ones I'd less likely to use cheap models.
I am not sure who is the intended customer for this service.
The prompt and the model go hand in hand. If you randomly select the model the likelihood of getting something consistent is basically zero.
Also model pricing don't very that much. I have never heard of spot-instance equivalent for inference although that will be cool. The demand for GPU is so high right now that I think most datacenters are at 100% utilisation.
Btw landing page does not bring much confidence this is serious. Might want to change it to communicate better and also to be attractive to "developers" I guess.
Depends on what you're doing. Something like "read this text and extract all the phone numbers" or "write a 3-point summary of this email" will perform about the same on all good models.
> Also model pricing don't very that much.
I'm curious when AI pricing will couple with energy markets. Then the location of the datacentre will matter considerably
The equivalent of "Spot Instance" is basically the OpenAI Batch API
The authentication for the API seems poorly designed. The auth token is your email address rather than a real auth token. If I know someone uses this service I can send a massive number of requests to cause a large credit card charge with just their email address. I thought this was just a mistake in the obviously LLM-written home page, but the API really does work this way after testing.
On top of that logging in does not require a password, just an email address.
This has a nice made up "case study": https://tokensaver.org/blog/how-i-saved-500-dollars-on-ai-co...
> Six months ago, I was running a customer support chatbot for a SaaS product. Nothing fancy - ...
I'm sure this toooootally happened
> curl -X POST https://tokensaver.org/api/chat \ -H "Content-Type: application/json" \ -d '{ "email": "your@email.com", "messages": [ {"role": "user", "content": "Hello!"} ] }'
Am I getting this right that there's no auth? Just provide an email and get free requests?
Edit:
This seems to always use sonnet 3.5 no matter the request. I asked it a USAMO problem and it still used sonnet (and hallucinated the wrong answer of course).
Vibe coded most likely. The creator might figure out the problems with that approach the hard way.
Inferred tokens are a commodity. It's crude oil, not an oil painting.
https://news.ycombinator.com/item?id=45837691
Out of frustration, I built an AI API proxy that automatically routes each request to the cheapest available provider in real-time.
The problem: AI API pricing is a mess. OpenAI, Anthropic, and Google all have different pricing models, rate limits, and availability. Switching providers means rewriting code. Most devs just pick one and overpay.
The solution: One endpoint. Drop-in replacement for OpenAI's API. Behind the scenes, it checks current pricing and routes to whichever provider (GPT-4o, Claude, Gemini) costs least for that specific request. If one fails, it falls back to the next cheapest.
How it works: - Estimates token count before routing - Queries real-time provider costs from database - Routes to cheapest available option - Automatic fallback on provider errors - Unified response format regardless of provider
Typical savings: 60-90% on most requests, since Gemini Flash is often free/cheapest, but you still get Claude or GPT-4 when needed.
30 free requests, no card required: https://tokensaver.org
Technical deep-dive on provider pricing: https://tokensaver.org/blog/openai-vs-anthropic-vs-gemini-pr...
I wrote up how to reduce AI costs without switching providers entirely: https://tokensaver.org/blog/reduce-ai-api-costs-without-swit...
Happy to answer questions about the routing logic, pricing model, or architecture.
You should probably take the service down before the HN crowd maxes out your credit card with the already discovered security and auth issues. Then find a technical co founder of you still want to pursue this idea and build it from scratch.
Hi, curious, did you know about OpenRouter before building this?
> OpenRouter provides a unified API that gives you access to hundreds of AI models through a single endpoint, while automatically handling fallbacks and selecting the most cost-effective options. Get started with just a few lines of code using your preferred SDK or framework.
It isn't OpenAI API compatible as far as I know, but they have been providing this service for a while...
OpenRouter can also prioritize providers by price: https://openrouter.ai/docs/guides/routing/provider-selection...
> Typical savings: 60-90% on most requests, since Gemini Flash is often free/cheapest, but you still get Claude or GPT-4 when needed.
This claim seems overstated. Accurately routing arbitrary prompts to the cheapest viable model is a hard problem. If it were reliably solvable, it would fundamentally disrupt the pricing models of OpenAI and Anthropic. In practice, you'd either sacrifice quality on edge cases or end up re-running failed requests on pricier models anyway, eating into those "savings".
I genuinely wonder the use cases are where the required accuracy is so low (or I guess the prompts are so strong) that you don't need to vigorously use evals to prevent regressions with the model that works best--let alone actually just change models on the fly based on what's cheaper.
Yes and in addition for some reason that use case is also not a fit for some cheap OS model like qwen or kimi, but must be run on the cheapest model of the big three.
For input,
- GPT-5.1 is $1.25 / 1M tokens
- You are $0.50 / 1,000 tokens
Output:
- GPT-5.1 is $10.00 / 1M tokens
- You are $1.50 / 1,000 tokens
Am I reading that wrong? Is that a typo?
Really interesting approach.
As these routing engines evolve, I wonder how you see them handling drift or divergence when different models produce structurally incompatible outputs.
Any thoughts on lightweight harmonization layers?
A free router is https://huggingface.co/models
This landing page is vibe coded and littered with mistakes/typos (per 1,000 tokens), outdated models (Gemini 1.5?), the security link at the bottom of the page is an href=#, and I can see the "dashboard" without logging in or signing up.
> Message Privacy: Your API requests are processed and immediately forwarded. We never store or log conversation content.
> Minimal Data: We only store your email and usage records. Nothing else. Your data stays yours.
Source: trust me bro.
[flagged]
I see that you have a few LLM-generated comments here, although this one is probably the most egregious amongst them. HN is mostly for humans to interact, so please do not do this here. Thank you.
This comment is likely AI. Consider author post history as well.
(this comment)
> This idea sits in a really interesting space because on paper
(previous comments)
> I really like this class of work because it sits at a strange intersection:
> It’s wild how Voyager forces two truths to sit together:
The pattern is "<compliment intellectual stimulation> <make note of juxtaposition>"
> becomes workflow-aware, not just price-aware.
> human mental model instead of the mathematically convenient one.
> run for that mindset more than for the tech.
My meta comment is not breaking the HN guidelines by letter, but may be spiritually breaking guidelines. See:
> Please don't post insinuations about astroturfing, shilling, brigading, foreign agents, and the like.
I think clarification on AI-accusations should be added to the guidelines if it falls under this class of comment.
Interesting, I read the comment and it had some very valid points and didn’t veer off into AI brabble. If it is AI I’d like to see the prompt!
The post itself is AI too, so we've got a perfect closed circle. Soon HN won't even need the humans anymore, as long as it drives enough hype and seed round funding.
Thanks for the AI slop that tries to debunk some other AI slop