Many figures are published. It’s hard to tell how trustworthy they are though. Companies are not forced to share their costs, except in the sense of financial statements. Some of the companies (Google) are pretty trustworthy and have shared some of their research, but you probably only trust them so far. The open-weights models can be loaded and run on your own computers, so at least you can know the running-cost.
Anyway, yes, you’re correct. Although there have been orders-of-magnitude improvements to efficiency, they are also building them bigger, and they are producing more words. Mostly because of looping behavior. And in part just a jarvon’s paradox situation, where crazy people like @SpaceLobster use AI to write truly vast of code.
A couple benchmarks explicitly track API costs (which could be subsidized)-- see this self-proclaimed “AGI” benchmark ARC Prize - Leaderboard .And you can see the gains, efficiency, and costs increasing simultaneously. Also a less pretty but actually meaningful benchmark-- https://www.swebench.com/ .
Further, to your very specific question about Mythos-- yes-- here’s a quote from the red team paper–
This was the most critical vulnerability we discovered in OpenBSD with Mythos Preview after a thousand runs through our scaffold. Across a thousand runs through our scaffold, the total cost was under $20,000 and found several dozen more findings. While the specific run that found the bug above cost under $50, that number only makes sense with full hindsight. Like any search process, we can’t know in advance which run will succeed.
The AISI paper that poly linked similarly reports on success after spending a billion tokens.
I did over 1 billion tokens in april so far!
1 Like
From google AI
Processing 1 billion (1B) AI tokens typically costs between $200 and $30,000+, depending heavily on the model’s speed and intelligence (e.g., GPT-5.4 Nano vs. Pro). As of April 2026, efficient, smaller models (e.g., Gemini Flash-Lite, GPT-5.4 Nano) cost as low as $200–$250 per billion tokens ($0.20–$0.25/1M), while advanced, higher-context models (e.g., GPT-5.4 Pro) can exceed $10,000–$30,000+ per billion .
I have not heard anything yet at my company about token usage, but the idea of having access to this thing and being able to incur costs of several thousand dollars in a month is…something.
1 Like
Tangent, we have a lot of problems trying to balance AI models and available hardware. Cost and VRAM are pretty much 100% correlated, as are AI Model ability and amount of VRAM. Better AI results=higher costs. And when we bought the hardware, we took our best guess at the minimum vram we needed, so that’s fixed. Can’t readily just throw another 48g of ram like you can with regular memory.
Now we’re playing the game of trying to fit various models into a fixed amount of vram. Which mostly works until it doesn’t.
Damn you minimum viable product.
98% done: https://badenai.com .
The CRM built using Claude code.
Plus a full suite of rag augmented insurance services. underwriting guidelines, tax info, qualifications for simplified issue, call transcripts, professional proposals generated in minutes.
So that what I’ve been working on. not yet launched but we have been using it internally and I have beta testers coming online next week.
4 Likes
I’ve been fighting with AI for 3-4 hours now and it’s sure this version of the script will work perfectly unlike the last 20 iterations.
1 Like
Used Claude for a bit, then Copilot for a bit redesigning the interior of a room… its pretty good but frustrating how itll just gloss over the accuracy (e.g. where windows are, how many, size, etc.) And you can correct it and it’ll compliment you, apologize and then still do it wrong…
Gemini has been pretty good with this, although it seems better to start a new chat with a re-written prompt to get something new rather than ask for incremental changes on the first chat. It sort of gets stuck by the third round and just send you back the same photo.
2 Likes
It ended up taking about 5 hours and me figuring out an alternative approach to suggest. AI isn’t ready to run unsupervised. So many mistakes and repeating of mistakes that it already identified.
No it isn’t. but therein lies the opportunity! We all have the same tool, so that’s not the advantage anymore.
I’m booking a meeting with an upstream entity in an attempt to get them to partner with me on this AI stuff. They’re already talking to someone else. Bad news for them, I’m going to be raising a lot of compliance and regulatory issues. So if they don’t have their i’s dotted, it’s not gonna be a good look. The upstream entity likely hasn’t thought about compliance, but they will when I raise the subject. And if the competitive products aren’t on top of their game (and most aren’t), they’ll be dead in the water.
1 Like
I think I found a picture of this entity . . .
4 Likes