I recently listened to Mr. Birnbaum’s presentation to the CAS Ratemaking Seminar on the disparate impact on race in insurance pricing.
Mr. Birnbaum proposed a multiple regression model with race as a binary predictor variable.
I’m wondering if anyone has, or would be willing to create, a sample, hypothetical dataset that you can share for use in a multiple regression with a control variable for race? I emailed Mr. Birnbaum, and he does not.
Hi John. I would just like to see an example of this in action. I am retired, and I will not be using the results in a rate filing. I also no longer have access to any real-world ratemaking-type data.
I don’t think you need to use examples to understand how this works, it’s a pretty simple concept.
Assume your current model is Y ~ aX1 + bX2, where a and b are significant coefficients, all you need to do is throw race in as a third predictor
Y ~ cX1 + dX2 + eX3 where X3 = race
If c and/or d become insignificant, there’s something going on with race where it’s siphoning away the predictive powers of X1 and X2.
The big problem, of course, is that there are loads of things that correlate by race that are captured in U/W data, but you never capture race in the applications (for obvious reasons).
It’s pretty tough for insurers to demonstrate that they’re not pricing by race, even when they don’t have that data.
Related: regulators really don’t like it when insurers use credit scores to price, even when you can show stronger correlations between credit scores and experience than any other variable. Life insurers are just starting to understand this.
To give you something to digest over the weekend (fun!), check out this blog post from a researcher who is disturbed by what his AI system can do – it can detect race where humans can’t - in noisy X-rays: