Artificial Intelligence Discussion

There’s zero reason they would need an LLM to review claims. Predictive models (e.g., decision trees/forests on top of some light natural language processing) have been used to review P&C claims (flagging fraud, etc.) for maybe 10 years or so? (Though I don’t know how widespread acrosss the industry.) Mechanically this doesn’t sound much different, though there may be a lot less human participation in the process.

LLMs are not the only model risk bogeyman out there, they’ve just been getting all the ink lately.

1 Like

I’d like to know more about health denials. How are they managed and regulated? What do we do about errors? What about bias? I work in health but we never talk about it much.

I would assume most denials are mechanical? Like the date is missing from form xyz, or you need to use generic drug a before brand b, unless the doctor says c. Things that could be handled with rules rather than any kind of model.

I could see using an AI or even an LLM to flag suspicious looking claims for things like fraud. But auto-denying claims that appear fraudish seems like not a great idea. And any use seems like it could lend itself to unintended implicit bias.

I’d think that, at this stage at least, you’d want a human to review all decisions before they become final.

What I could want (and what I think would be acceptable) is for an AI to classify claims/applications, provide a few points for its rationale, and submit them to appropriate humans to review. Hopefully the humans reviewing would spend more time reviewing adverse recommended decisions, and could quickly review/spot-check favorable recommended decisions, with reversals being used to further train the AI.

It’s not crazy hard to put together one of these models. Once you have the data together, you can often fit the model in a couple of hours or less. You would need a separate model for each claim code.

Insurance companies are swimming in data, so it should be straightforward for them to do this kind of work. The biggest issue is going to be data cleaning and making sure that the claims and denials were appropriate in the first place. The model fitting and validation process could be scripted which would make that part straightforward/repeatable.

Just for information purposes, here are a couple of tutorials for building SVMs and K nearest neighbour models in R.

https://www.datacamp.com/tutorial/support-vector-machines-r

https://www.datacamp.com/tutorial/k-nearest-neighbors-knn-classification-with-r-tutorial?utm_cid=19589720830&utm_aid=192320612808&utm_campaign=230119_1-ps-other~dsa-tofu~tutorial_2-b2c_3-nam_4-prc_5-na_6-na_7-le_8-pdsh-go_9-nb-e_10-na_11-na&utm_loc=9001333-&utm_mtd=-m&utm_kw=&utm_source=google&utm_medium=paid_search&utm_content=ps-other~nam-en~dsa~tofu~tutorial~r&gad_source=1&gad_campaignid=19589720830&gbraid=0AAAAADQ9WsHN-OJndbT66-OYM09c98CGn&gclid=CjwKCAiAu67KBhAkEiwAY0jAldsGeGXuiVEGpRMdER93WNtTb6GMShHXbcMRSd_XwBTatFSINz7FNxoC8mUQAvD_BwE

This is probably the first time in my career I’m wistful that I didn’t become an actuary. This AI modelling stuff, I could totally do that 8 hours a day knee deep.
The other stuff like I was doing before, not so much.

1 Like

It’s interesting work and hopefully will help me stay employed as the government turns more heavily towards using AI. The difficulty is explaining to the managers who want us to use AI what AI actually is.

I also find it interesting that the math behind this never seems much more complex than upper year stats, at least for what I’ve seen in insurance modeling.

2 Likes

Then this would be my question: if a human is going to fully review each decision, why use ai at all? Some times, like in radiology, error is so expensive that it is worth getting a fully “second opinion” from a model.

Probably, though, there is some part that you do not want the person checking, and this is the added value. Then you are not really solving the problem of trusting the model by adding a person. At worst, you might be obscuring that the problem still exists.

Remember, too, that if the model is an LLM and gives a “reason”, that is not really the “reason” used by the model. The model does not “reason” about the objects referred to by the words like people do. Instead it calculates using the words themselves, and the frequencies with which they correlate with other words. This means the person checking the model output is in no way ensuring that the model is “working correctly” beyond checking that the predicted answer is correct. However, the “reason” provided by the model can be considered a second prediction that can make it easier for the human to check the primary prediction.

One thing you have to remember about insurance data compared to data in the natural sciences: the data is always much more heterogenous. For example, all of those claims are going to be for different individual people with complicated correlations between them. And the insurance environment itself is going to be changing over time, sometimes in an intelligent way in response to the actions of the insurers. All this makes data harder to come by in important ways, and also worth less.

You haven’t dealt with ecological data…

We have year effects interacting with cohort effects interacting with spatial correlations interacting with vessel effects interacting with day-night behavioural differences, seasonal affects and species interactions.

Your data sounds way easier to deal with. You just have to deal.with one species and you can actually examine them while they’re alive.

Are they going to have them review each decision or each rejection?

Haha, fair enough. You are right, i have not.

I probably should have been more modest and simply said that the use of data and models is so different between fields that it is hard to have a good intuition for what is hard and what is not in a strange field.

A good example of this is the bayesian approach to prediction developed during the early and mid 20th century in insurance, oil/mineral prospecting, and animal breeding.

It is very different from the “classical” statistics developed by Fisher and others for, say, agriculture, during the same period because the needs were so different.

Agree with this. When I bailed out 20 yrs ago there were several reasons, but mainly it just looked like really boring spreadsheet/database stuff in my future, plus all the business aspect which didn’t interest me at all.

Going back even further, I wish I had stuck with computer science instead of switching to physics. In my mind I just didn’t want to get stuck staring at a computer all day, but look where we all ended up anyway.

1 Like

What’s funny is I think both are being used to some extent interchangeably as people turn away from Fisherian statistics as the things we look at get more complicated.

I think I probably didn’t frame my thought quite right.

I think at the current time, if an AI indicated that an adverse action should be taken against a (potential) customer, such an indication needs to be reviewed by a human, due to the documented instances of AI errors. The suggested action can be described and reasons provided, and a human can review the summary against the actual information, and hopefully confirm the indication. I would expect in cases where the AI is making a good decision, there would be time saved because the human can focus on the specific information triggering the review, rather than necessarily reviewing a complete file.

For the cases where an AI is indicating no adverse action, hopefully the human review can be more of a spot-check, perhaps a random audit of the decisions, producing time-savings by virtue of not needing a human to touch each and every file.

It’s probably worth mentioning here that I am a P&C/General Insurance guy whose career has been mostly on the pricing/product/underwriting side of the world….. In my days working with a particular E&S product that had a number of medium-to-large sized accounts, I spent quite a bit of time trying to find ways to identify/profile accounts so that underwriters could better prioritize their work queues and hopefully process more business effectively.

Hell, if we had had AI tools just to parse and organize the inconsistently-presented submission data, that would have been a significant efficiency boost (if done accurately).

Yesterday, GPT 5.2 apparently proved an open Erdos problem.

I think it will take some reviewing to make sure the result is complete and original.

This doesn’t mean omg super intelligence since some of the problems are not impossibly hard. And right now they can just point AI at everything and it will solve the easiest of the bunch.

Let’s assume this is true.

When a person “proves” something, we really mean the person asserts that it is true. For example, i can prove that the square root of 2 is irrational even though i obviously did not invent or discover the proof. One way to understand this is that I have an intellect that, when focused on the ideas of that proof, can see that it is necessarily true, at least given the assumptions.

There is something in the structure of my mind that sees this necessary truth and assents to it. This has often been called the highest part of intelligence.

I don’t think the model did that. Instead it probabilistically generated a series of statements that a human intellect then might feel compelled to assent is true (in a way that is not probabilistic).

In other words: saying something isn’t the same as knowing or proving it. A textbook does not “know” math proofs either. It contains instructions for a person to prove them.

2 Likes

I appreciate your point of view here. I disagree, but it’s really just a gut disagreement. I probably disagree with most people about these things.

I majored in math in college, so my lectures, homework, and tests were all proofs. If you had asked me what I was doing, I would have told you that I was learning how to generate a list of ideas that each sound true on their own, and end in the answer.

I would not have said the process is deterministic. It feels more like there is a random idea generator in my brain that comes up with all sorts of ideas that might contribute to the proof. That’s what creativity means to me-- literally being random-- paired with some kind intuition that limits the scope of randomness. And then after the ideas are generated, they are scored for usefulness and validity by another part of my brain.

Maybe your point is that there is a feeling of truth associated with each step. And the LLM can’t have a feeling. I guess that’s fair? But I think it’s doing the same thing without the emotion.