Artificial Intelligence Discussion

SredniVashtar · November 18, 2025, 3:15pm

Gemini 3.0 is dropping today. There’s good progress on the leaked benchmarks. I’m impressed.

There’s this conversation about whether LLMs are going to become “AGI” or “ASI” soon or are they plateauing? Or do we need to build ‘world models’ for LLMs to understand reality? Or do we need some other breakthroughs, and are those near? Or perhaps the whole project is hopeless?

It all comes down to the fact that every time an LLM improves, you expect it to stop improving next time. So it is always surprising, exciting, scary, that it improves again.

I’m not exactly into the singularity thing. But it really does feel like every few months we are inventing a new car that goes 5% faster than all the other cars.

Vorian_Atreides · November 18, 2025, 3:34pm

I’m constantly amazed at the general expectation that artificial neural net learning should be much, much faster than human neural net learning (assuming that the neural net theory is appropriately “accurate” for describe human learning).

SredniVashtar · November 18, 2025, 3:44pm

There’s an old quote about AI science that I love:

The quest for ‘artificial flight’ succeeded when the Wright brothers and others stopped imitating birds and started using wind tunnels and learning about aerodynamics. Aeronautical engineering texts do not define the goal of their field as making ‘machines that fly so exactly like pigeons that they can fool even other pigeons.

It’s partially about how silly the Turing test is. Partially about we should approach the engineering problem. But the other thing I like about it, is how it quietly suggests how big and scary AI could become.

Like, yes we never invented anything as artful as the sparrow. No plane can dart through the branches on a battery of seeds. But instead we invented 400 ton machines that fly 30x as fast.

CuriousGeorge · November 18, 2025, 5:25pm

In many, many cases it is much, much faster, so I don’t find that expectation odd at all. In 24 hours of training, I can train a neural net that can play at the world class level. A human barely knows how to not leave pieces hanging after 24 hours.

Vorian_Atreides · November 18, 2025, 6:09pm

I assume that you’re referring to playing a game of chess . . . and the learning there is understanding the rules of movement, which I don’t think that a human with an awareness of “rules of games” will take that much longer than a computer to understand.

The rest of what you describe is not really related to NN so much as a computer’s ability to perform a large number of scenarios and determining which move has the best probability of resulting in a win . . . to an optimization problem which I agree that computers are generally better than most humans (at least are far more objective about (depending on its programming)).

However, take the realm of LLM’s . . . that is where there are still significant gaps that a human with a few decades of broad experience is likely to outperform most LLM’s (apart from speed of pulling together “material” that appear to match a search criteria).

magillaG · November 19, 2025, 3:41am

Flight is about overcoming gravity. What you are flying above, or through, is not the point.

Intelligence is a kind of observation. It is, arguably, all about connecting with what you are knowing. To understand the sun is radically different from understanding a sparrow, from understanding the set of all integers, from understanding chess.

For that reason, I do not think it is an engineering problem at all.

But I don’t know for sure because there is no agreement on what intelligence is.

My worry, not entirely disagreeing with your post, is instead of getting AGI, we will instead end up with a method of control that we just call intelligence.

Cooke · November 19, 2025, 2:54pm

magillaG · November 19, 2025, 5:36pm

I skimmed this and it seems to emphasize more traditional “ways of knowing” compared to modern, western, industrial ones.

That’s a good thing to bring up. But it isn’t the only problem.

The basic assumption of a common interpretation of these models is that you can “know” all there is to know about, say, dancing, by reading books about dancing. Of course not! The way to know about dancing is to get somebody to teach you to dance! Then you might be able to be helped by book. But practically everything is like dancing. It is something you do, not just a set of things you write down.

Take science. From what i have read, part of the difficulty in assessing the “replication” crisis is that these papers can never write enough down to reproduce, say, a psychology experiment. You have to call the research and have a dialogue about all the little “tricks” they did in the experiment.

Of course, with programming, what is written is also what is done. (though not the development process or its use in business.) So i think that the developers and engineers who are developing and selling these models may be less sensitive to these realities.

Actuary321 · November 19, 2025, 6:25pm

A friend asked Gemini to create a 5x7 poster that he could use to explain that because of the government is no longer minting pennies that they would be rounding cash to the nearest nickle.

It looked pretty good, except that it had a couple of obvious misspellings. One in particular was that it misspelled ‘penny’ as ‘penis’. He wasn’t pleased with the look of the logo on it either so he told it it had some misspellings and asked that it fix the misspellings and change it so it was had more contrast so the logo stood out better. It changed the contrast but left the misspellings as they were.

SredniVashtar · November 19, 2025, 6:34pm

Do you mean the newest model? I’m curious how it’s going to perform at that kind of thing.

I’d say before today Gemini was not the model to use. And the best model on the market was still pretty bad at anything visual.

The new Gemini is supposedly much better at screen reading, but I don’t know if that translates to spelling for example.

Actuary321 · November 19, 2025, 6:38pm

I don’t know what version he was using. But this was last night. His company uses Google for most of their IT stuff so Gemini was the AI they were supposed to use.

He did tell it that if he used that he would probably get fired because of what it wrote. It then gave a big explanation of why the things it put down were wrong and would not be appropriate. I didn’t read all of that but we basically determined that we didn’t think that we were all that worried about AI taking our jobs yet.

SteveGrondin · November 19, 2025, 8:28pm

Also

Fish_Actuary · November 19, 2025, 8:28pm

Try asking an AI what percentage warmer is a day with a temperature of 9° than a day with a temperature of -1° (you choose the units). See if it gets it right. To do so, it will have to account for absolute zero.

Fish_Actuary · November 19, 2025, 8:31pm

Just trying it now, if Google is using the new version of Gemini, it’s gotten better at it. Copilot got it right as well. Not sure if it’s improved AI or a better prompt.

dr_t_non-fan · November 19, 2025, 8:33pm

Should first check all questions for trickery.
“Is the Set of all sets contained within itself?”
“Determine the Truthfulness of this sentence: ‘This sentence is a lie.’”

quoll · November 19, 2025, 9:42pm

I think AI can only do Truthiness.

CuriousGeorge · November 20, 2025, 12:49am

??
This is exactly what neural net learning is - taking in data, evaluating it against a loss function, and training the neural net to minimize that loss function. For chess, once the training is done, the net itself has plenty of skill to outplay most humans, even without any search at all.

Maybe you intended to limit your comment to LLMs, but that is just a tiny portion of neural networks.

magillaG · November 20, 2025, 5:42am

My guess is that it still does a kind of search. But the possible moves are weighted in some way so that better moves are searched first.

That seems to be how humans master chess too. By reading lots and lots of prior chess games.

Chess still is a tiny universe of possibilities compared to the full human experience though.

Snikelfritz · November 20, 2025, 6:30am

AI today seems very computationally intense compared to people. Won’t rule the world until there’s a different approach to processing.

now_samantha · November 20, 2025, 1:53pm

Friends don’t let friends us AI to make maps. Not sure what the best part is, the confusion over where the Bronx and Brooklyn go? The double Queens? Actually, pretty sure it’s the second Staten Island in the middle of the water, b/c the first Staten Island was really Atlantis.