Artificial Intelligence Discussion

SredniVashtar · January 6, 2025, 6:56pm

What’s funny is that they aren’t just quietly uncertain. They think there’s really good odds that what we are building is garbage, humanlike, godlike, or the devil.

magillaG · January 6, 2025, 7:05pm

I think part of the reason for this is that the skills needed to do the “next thing” in AI research are very different from the skills needed to see a path to artificial general intelligence.

The big gains from 10 years ago came from figuring out how to optimized deep neural nets. This involves engineering tricks, like using a much simpler function to “wrap” each neural net node’s output.

Similarly, the big gain for large language models, the “transformer”, seems to be figuring out a computationally efficient way to optimize which context words to focus on for predicting the next word.

As far as I can tell, these are more engineering-focused people. And engineers tend to see problems as engineering problems. In other words, as problems that can be worked out with enough money and elbow grease.

SredniVashtar · January 6, 2025, 7:19pm

Yes. Agree completely. Every new model is a new optimization, data-structure, feedback loop, gpu, or just building bigger.

Then after they build it, they find out if it’s better.

SredniVashtar · January 8, 2025, 1:56pm

I think they are referring to GPQA. I agree it’s a stretch to call it “graduate level reasoning”, and certainly not the same as doing new research.

It’s also not what I would say is the most cutting edge benchmark.
These are:
SWE verified, MLE-bench, SciCode-- real world coding problems.
Re-Bench, Frontier Math-- current research examples.

These are benchmarks we’ve made up this year, and LLMs have not passed them yet. (They also haven’t yet passed the Putnam and Math Olympiad contests.)

To me, if AIs pass this last horizon, AIs will be able to do professional level work. But maybe you disagree?

Do you think there’s any possible test that an AI could pass that would prove it can do some professional science work?

I think a lot of people want to say “AI can not ever do X”, and then in order to protect that idea, they add “there is no possible way to test X”.

SredniVashtar · January 8, 2025, 3:46pm

(sorry I edited that post like 80 times. I’m very sleepy. lol)

dr_t_non-fan · January 8, 2025, 3:53pm

You know there is this new thing called “AI” that could have made your post for you. Never gets sleepy, never makes a typo,…

magillaG · January 8, 2025, 3:57pm

I would say the “test” for whether AI can do professional work is that it needs to do it. I haven’t seen that yet. But I also haven’t seen any indications that it can do it anytime soon.

As a crude analogy, suppose we are trying to compare two major league baseball pitchers.

One thing we could look at is the speed of their fastball. That is not a bad metric within a particular context, namely all the other capabilities a person must have to be a pitcher in the first place. This does not mean that we could build a pitching machine that could pitch 200mph fast balls, and conclude it would be the best major league. The context is entirely different. We cannot count on the pitching machine being able to do other things a pitcher must be able to do, like bat, or run around the bases.

Here is a more realistic example. Suppose I score in the 90% on the bar exam. I should know a lot about law, right? Surely when filing a brief, I would not simply make up a prior case. That would be true of a person who has common sense. But ChatGPT did do well on the bar exam, and also made up cases. It scoring well on the bar does not mean it can do the same things a person doing well on the bar could do.

I’d also argue there is an even deeper problem. There is a positivist theory of science that says it consists of two parts: first collecting primitive (or basic) observations, and second, mechanically applying logic and math to that data. This would imply that with access to more primitive observations and more computing power, you can mechanically do science better. This seems to be the view of science, and learning more generally, that is implicit in a lot of these predictions for these AI models.

However, this view of science was pretty much abandoned by the 1960s. Science cannot be understood in terms of mechanically applying logical rules to primitive observations. In particular, theory has a lot of influence on what data is collected and on how that data is interpreted. And data usually under-determines theory, which itself often depends on other criteria.

Additionally, scientific research is an embodied activity. You are constantly interacting with your environment. This is especially true for experimental work, but also true for theory (or for any real job). These models cannot perform that embodied activity, with all the contextual information that it gives.

Vorian_Atreides · January 8, 2025, 4:23pm

Unless you tell it to do so

dr_t_non-fan · January 8, 2025, 4:32pm

It might consider that an insult to its intelligence and override you.

MIT website:

Discusses the “AI overview” feature on Google. I am not that impressed, and I want to see sources. 99% of the world could not care any less unless negative caring is allowed.

SredniVashtar · January 8, 2025, 4:54pm

Yeah. The thing is google is already great. And the “AI overview” is stupid.

I did use an AI for searching recently. I was trying to remember the name of a book and only had a vague description, but that was pretty random.

In general, it’s useless as a search engine, unless you’re a programmer.

dr_t_non-fan · January 8, 2025, 5:00pm

It sometimes tells me what people on reddit post as an answer to my question.

stochiki · January 8, 2025, 5:02pm

How do you think it trained itself to solve those problems? It has to see a similiar problem many many times. This is simply not how PhDs, well at least good ones, study the material.

stochiki · January 8, 2025, 5:07pm

I’ve tested chatgpt on the preliminary exams. So basically it is very good at exam P questions, to the point where I think it can pass the exam. But it is horrible at other exams, despite the fact that the questions are sometimes much easier than what is on exam P. That’s because it hasn’t trained on certain types of questions, it simply incapable of “transferring” learning. Every mathematician out there understands the power of being able to apply one concept/approach to another set of problems.

SredniVashtar · January 8, 2025, 5:11pm

Like, I said, I dont think GPQA is really “graduate level reasoning”. I do think it’s fair to conclude it is doing something both useful and difficult if it can readily answer problems that human researchers (with google) need hours to work out.

SredniVashtar · January 8, 2025, 5:16pm

You need to specify version here. 3.5? 4? 4o? 4o-mini? o1-preview? o1-pro?

I know it sounds silly but they change a lot.

In a way, that’s why people are excited by AI.

stochiki · January 8, 2025, 5:16pm

Very well said. Like I mentioned before, I think science is a very slow process whereby scientists reach consensus about assumptions and theories. For me AI is just a data fitting tool. I don’t see much value in terms of actual scientific progress due to the black box nature of predictive models. In a way, it’s probably counterproductive to science. That being said, if you start with the premise that the problems are simply too complicated then a computational approach might be useful for applications.

stochiki · January 8, 2025, 5:18pm

chatgpt 4 I think.

SredniVashtar · January 8, 2025, 5:28pm

Thanks, maybe I’ll throw some to the current generation and see what comes up. They are still pretty awful at abstract logic, with the possible exception of the expensive ones.

magillaG · January 8, 2025, 5:58pm

And I would think that (perhaps unlike a person) abstract reason would be the easiest thing for a computer to do.

As an example, Mathematica has existed since the early 1980s or so. It was originally designed to apply logical rules to Feynman diagrams, but now does all sorts of mathematical transformations.

However, it applies them through simple brute force. As an example, I might give it an equation to solve, and it applies different valid transformations, in different orders, to try to arrive at a solution. I think this must be analogous to games like Chess or Go, and have the potential to be enormously improved by a deep neural net would could use reinforcement learning to learn which transformations to apply, and in what order. I suppose “winning” is not always well defined. I assume the economics does not support developing this application first.

Much harder for a disembodied machine is to apply reasoning to concrete situations. This is what has sometimes been called “prudence”. Often we do not consciously know what rules we are applying. These vary by situation.

But let’s keep ourselves to abstract mathematics. It is more than simply applying rules. Take an act of genius performed by Paul Dirac. His genius was to combine relativity and quantum mechanics. He found a mathematical solution to the differential equations with negative time. But the final part of his genius was to interpret that solution as a particle traveling backwards in time, thereby predicting anti-matter. I don’t know how we think these AI models are anywhere close to doing something like that. Admittedly this is a once-in-a-generation act of genius, but some are claiming we are on the road to inventing super-intelligence.

now_samantha · January 8, 2025, 6:06pm

Perhaps more to the point, part of the value of a scientific theory lies in its ability to make predictions for novel situations, and potentially be falsified. The process of fitting a model purely on training data makes it less reliable in novel scenarios