Artificial Intelligence Discussion

This article seems to go against some of the recent statements about AI being cheaper than employees…

I kind of wonder if part of the difference for @SpaceLobster is that he’s hiring a programmer by the task whereas staff programmers have a lot more responsibility and need to deal with long term issues rather than producing one offs.

1 Like

Yes, these days most LLMs have what amounts to an inner monologue that lets them produce somewhat better answers. You can also decide how much “thinking” they do before giving up.

do not seem to scale well with problem complexity. This is analogous to saying that an algorithm’s computation cost increases exponentially with problem size; it will never handle large problems. (An example from introductory CS classes is that an optimal sort algorithm scales like Nlog(N), but this is quite good scaling.) I say analogous because there isn’t the same theoretical understanding (to put it mildly) of this kind of “reasoning scaling” like there is for computational cost of algorithms. The paper criticizes the kinds of tests being done in general.

Yes. The various problems discussed each have literal algorithm solutions with varying degrees of complexity.

They also have varying degrees of output length, which is a separate issue.

The towers of Hanoi problem, for example, is extremely simple, but the solution takes exponentially more steps. And the solution is well known. And the LLMs could easily describe the algorithm, and code it. What they could not do is execute the algorithm flawlessly. At some point, in the thousands of steps, it messed up. Was it because it has some statistical error? Probably-- the models actually work in part by intentionally introducing errors (the amount of error is a parameter you set). Or maybe it was an unintentional error, ie “hallucination”. Or was it because of context window size (ie. working memory?)-- the models only have the working memory of a novella or so, before they start glitching.

The river crossing was a better example. It is complex, and I think the models failed to make much progress. I suspect, from playing with LLMs elsewhere, they weren’t suitably systematic. That’s a real issue, but it’s hard to understand exactly why they are systematic sometimes and not other times. It’s also the kind of issue that can go away when a new model is introduced.

The paper didn’t really dig into the errors that the models were making, so it’s hard to tell what went wrong. They then made some very far reaching conclusions about not really “thinking” or not “scaling” but that’s different from being bad at Towers of Hanoi.

Finally, they note that the model “overthinks” the Towers of Hanoi problem. Suggesting that there are some problems where no amount of thinking helps you. This is not really a surprise though. If you’ve ever actually done a Towers of Hanoi puzzle you know it doesn’t help to “think” about it. It’s a mindless task.

Annnnnnnnnnwyay, similar to “LLMs can’t do chess”, it’s a paper with interesting results, but overly dramatic conclusions.

Anyone who took the conclusions too seriously would have been very surprised to see LLMs chaining together exploits, coding large scale applications, or solving difficult math proofs.

And indeed those were all things that LLMs couldn’t do last year, but can do now.

I think more likely it’s just a clickbait article.

I’ve created an entire AI insurance ecosystem. Pretty much every company in Canada’s insurance policies, insurance strategies, simplified issue, underwriting guidelines, on and on.
And a full crm that integrates with email, calendar AND text messaging. And commission calcs, and and and.Ive certainly used devs for one up tasks before (though we always built them for the future), but the first thing I did with AI is a full blown project by any measure. My dev said it would take about a year of 1-2 devs, and frankly, no way would that have been enough for what I’m building (almost done, maybe today).

Second counterpoint, I spoke to my friend this weekend, about how he’s faring shifting his org of 100’s of software engineers. And he’s building high end electronics across decades, not one up tasks. He said most of his developers are doing the ‘top ten list’ I mentioned earlier. Then the occassional one has fully embraced it - and those that have, are already outperforming their entire team.

Again, I think the article is clickbait. Everyone’s got a long list of reasons with the problems with AI. Meanwhile, some of us don’t care, they’re just doing it and seeing results that are 1 maybe 2 orders of magnitude bigger and faster than what a person can do.

2 Likes

And my friend agrees, as do I, with whoever posted in this thread about being a domain authority. That’s what makes this successful. If you know a bit about software, and you’re expert at the area you’re working on, then it works well. If you don’t know what you’re talking about, well, results will be tailored accordinly.

Since actuaries are by default domain experts, it’s a perfect fit once you boomers get your head around the idea that it works.

1 Like

This. AI articles are very likely to be trash. Many are like a report about a tweet about a medium article about a stupid thing a tech bro CEO said in an interview about out of date scientific study, taken out of context 4 times and also wrong in the first place.

There is overwhelming demand for hot-takes right now.

Are you talking about the articles that are saying how good AI is or how bad AI is?

Both.

2 Likes

The youngest baby boomer is 61 years old, so they’re running out of time to get their head around it (for work purposes).

2 Likes

FWIW, there is a mature conversation to be had around how much companies spend on compute, what they are getting out of it, how much it is subsidized by investors, and how is inference vs training… But that conversation really needs to be data-driven, and not anecdata from tech bro interviews.

On a basic level, developers often spend something like $100/month on AI, and receive something like $10,000/month in compensation. But it’s possible to spend a lot more or be paid a lot less.

I’ve been seeing some stuff online about software people being pushed to burn tokens as it’s good for stock price.

There’s some very clickbaity stuff of like AI CEO saying something like “you need to use more AI or you’re not AIing hard enough!” As a starting place, I’d recommend ignoring everything an AI executive says. It’s pretty dumb to begin with, and 2ndary sources are even dumber.

By contrast, I’m spending I think about $1000/month? Doing some hours daily. But I’m doing interpreted code. My buddy is doing compiled code, I dunno but I suspect he burns a lot more tokens than I do, because it’s compiling while he’s waiting.
My cost is via API, maybe it would be cheaper if I was on a plan.

2 Likes

Oof. So the forum earned it’s keep today. Ive spent 1200 loonies in april. I asked claude if I should do a different plan than the api, and it said yes, max pro 20X plan would cost me $300/month, idiot.

Well, the idiot part I inferred.

4 Likes

From what I’ve heard, Claude has had some pretty hourly tight limits on its subscription plans, so there’s a pretty good chance you’d be spending a lot on API anyway. Especially these days where they appear to be trying to cut compute.

…They had a backup system in place.

1 Like

Possibly. It did say I might get metered, but I’m not in a rush. We’ll see, if it’s a pita, then it’s just a cost tradeoff.

Claude (and western AI in general) is being banned in HK

Are there figures published on the energy costs associated with these sort of jumps? As in it seems like most of these advances seem to be the result of adding layers of logical loops within a larger decision tree, or creating agents to run routines as part of a larger instruction. Our cloud grid for running larger actuarial models seems surprisingly expensive and I know part of this is that it runs on CPUs rather than GPUs, but we can incur a large cost for running models with junk inputs and that is just part of the cost of business (some error rate exists).

Feels like you could incur an even bigger runaway cost if the instructions to the AI are unclear, especially as we are all just learning how to use them. We can’t look at just the cost of a single AI query - that R&D aspect also needs to be captured, along with some frequency of bad queries being sent to the model.

1 Like

:waving_hand: