Opus 4.6 came out today. I wonder if that’s part of the stock move from 2 days ago?
It’s a small improvement, but I think small improvements are enough to pass thresholds people care about.
It didn’t improve at all on coding, but they apparently fleshed out its other skills, so now sota at random things like surfing the web and making powerpoints and spreadsheets, as well as an experimental 1M token context window.
There is a benchmark-- “GDPval” that is supposedly a test of whether it can do people’s white-collar jobs. And it apparently is crushing that, so I guess that could affect the stock market. It will be interesting to see what new tasks Opus can do better than us pathetic humans.
This test gives short shrift to the importance of context. The paper on it says as much.
Again, this is the rationalism behind the ai optimism: the idea that there is a single algorithm that knows the “right way to think” regardless of context.
This used to be called the “scientific way” of doing things.
The reality seems to be that the right way of thinking depends on a rich understanding of context, including values involved. This was supposed to be why working in the office was so important.
One kind of work that is carefully designed to not require a deep understanding of business context: software development.
I agree. It’s overselling itself as AI taking jobs. It is a good measure of AI being useful for many well defined tasks, which could certainly impact the job market.
Also, I’m kind of the AI optimist. Well “optimist” might be strong since I’m not convinced that AI is actually a good thing.
But anyway, I think it has a lot of potential simply because it continues to improve, and I don’t think it’s right to assume it will stop improving.
I also think that people are underestimating it in some fields and tasks, while overestimating it in others. There are no benchmarks around creative and social tasks, because they are not objective. We also avoid talking about things like medical advice.
I also think humans do a lot of stupid work. AI doesn’t need to solve Erdos problems to work retail.
I probably largely agree with you. Although i think improvements will have to be of a different kind now that the data is mostly all being used.
What i meant is that i don’t think there is a single optimal way to think (because the world is too radically contingent for that). This implies that “super intelligence” may be an oxymoronic statement.
I also don’t think that predicting the next word is equivalent to thinking.
This makes me tend to emphasize the importance of context, and of values, which i think will limit the applications of these LLMs in ways some people are not currently appreciating. Though there will still be many important applications.
And I certainly don’t believe that these models can optimize themselves for super intelligence. Or that super intelligence can even exist (at least in finite beings).
I’m using AI for my coding work and am finding it’s saving me time. I think the thing we still need to see is if the cost of using AI is cheaper than me figuring it out on my own. These tech companies have massive planned capital spends, but I’m not sure how big the market is for their products or what personal or business consumers are willing to pay.
I think there’s a big mystery about whether the companies will be profitable given both their capital expenditures and their lack of moat. The companies might lose tons of money. Some could go bankrupt. However the models themselves will continue to exist.
And the models are really cheap to run compared to any salary. That’s not to say that AI can be helpful for everything, but it makes financial sense to throw $20 bucks at it, to see if it can help you with your $10,000 task.
I’ve got colleagues who’ve burned through their monthly allocation of tokens in a few days using AI integrated into their programming platfom.
It may be relatively cheap to use it for a costly programming thing, but I wonder how the costs look if you have a bunch of employees who heavily use it in place of Google or just personal knowledge to handle every question they have.
My senior dev guy doesn’t really use AI yet, but I believe he’s thinking about it. The young fella I’m doing this AI thing with, he’s paying $150/month for claude code and using every penny of it.
At the $150 level, I think it’s a gimme even just as a productivy enhancement tool. That’s a reasonable number for professional subscription type of things.
I got curious and asked copilot. It sounds like a 200-300 word response costs from $0.0002 to $0.02/response.
For businesses costs seem to range.feom$100-5,000 per month with full AI agents potentially costing $50k per month. So I guess a lot more affordable than I suspected. On the other hand this is also introductory pricing.
From what I have read, this part is very important.
There is a lot of capital being invested right now that is keeping costs down.
It seems clear these LLMs will have some good applications.
But will they be worth the cost without being subsidized by these huge investments? I have no intuition for that. But I am pretty that I’ve read some people are skeptical.
I’m no expert at pricing stuff, but the big money is being spent on hardware that allows them to create the models. What they’re selling is the completed model. So basically it’s the initial infrastructure that’s expensive. Once that’s built, its almost free money.
I think the predictions are also computationally expensive. Nothing like training, but training is only done a few times while predictions are done all the time.
But in any case, i assume there would also be debt service on the hardware costs.
Yep. And they want you to become dependent on their service, so when they have built enough infrastructure, and have enough % of the market, you can fully expect those introductory prices to be jacked up considerably.
I raised this as a potential risk with our CFO, as we still need to have in-house coding experts to back up using these type of services. You really do not want to create a material dependency on these services in the long-run as a business.
I think you don’t want to become locked into a specific ecosystem, same as any product.
But there are very strong open source models, that, afaict, anyone can run and offer as a cloud service. These provide a price floor.
They may not work exactly as well, or have all the bells and whistles, but it’s reasonable to assume you will always have something similar for a similar price.
Probably the biggest risk is horizontal integration? Ie Microsoft jacking up the price on Microsoft Office specific AI.
Lengthy rambling on my experimentation with AI today
So, today’s installment in my ongoing effort to try to get somewhat caught up with using AI was triggered because my employer’s marketing folks recently tweaked the corporate color scheme…and being the picky kind of person I am, I wanted to update my Office theme colors so that my spreadsheets and powerpoint decks would fit within the new decrees.
However, I like using distinctive colors in my exhibits. Many moons ago, I worked for someone who was color-blind, which trained me to be mindful of my color choices; and the new corporate color scheme, while certainly attractive and appropriate for external publications, just isn’t going to cut it in my normal work products. I’ll obviously comply with the new rules when my exhibits are needed for external documents…but I want any re-work to require as little additional effort as I can manage.
Copilot did an excellent job in helping me set up a color scheme that meets my XLS/PPT aesthetic and works with the new corporate color scheme.
But then since I was on a roll, I moved on to updating my own personal templates/stationary. I also decided to give AI a shot at creating a new logo for me to use for the trade (“dba”) name I sometimes use for correspondence or the very rare side gig.
You can tell that generative AI is heavily influenced by the content its trained on. It took several rounds of prompting to get a logo that I liked which Google image search didn’t ping as being someone else’s trademark.
But Google’s image search also apparently has its limitation. Google didn’t pick up on the fact that the most recent iteration of the AI-generated logo has a similar shape (but different coloring) as the current SOA logo.
(It took me a while to realize that too… but in my defense, I’m a CAS guy, not an SOA guy.)
Building on my monologue from yesterday… I’m also experimenting with using AI to update my avatars for online forums.
I’ve been doing a bit of a Norman Rockwell theme for those updates, but for my persona here…I’m not certain that South Park really translates to a Norman Rockwell theme.