Standardized values, do you actually use them in practice?

So, in preparation for an exam i’m introduced or reintroduced to standardized variables.

The context is you’re doing an analysis of your own data assessing for things like missing, or incorrect data, as well as big picture assessments like dispersed is my numeric data and where are the outliers.

So one of the useful tools, allegedly, for becoming familiar with your data is to standardize is.

Specifically, z=\frac{x_i - mean(x_i)}{\sigma(x)}

so that sounds good, but in my career i’ve never seen anyone actually use standardized values in a one way analysis or GLM, it’s still always the data, straight out of the data tables that gets used in analysis.

Has anyone personally used standardized values in actual actuarial work?

Till all are one,

Epistemus

Fitted parameters are often presented this way, and called the “z score”. That is, z = fitted - mean / standard error.

But to transform the “X” values: some kinds of models benefit from it. Not GLMs so much. Regularized GLMs benefit; as i recall, the software often automatically does this transformation. Neural nets benefit too.

Standardizing becomes very important when your input variables are on very different scales.

For example, if one input variable for a model (say a model to identify potential fraudulent claims) is the number of claims from the insured address and another is the (max/avg/etc.) size of the claims from the insured address, the parameters are likely going to be very different and you’re not going to easily see which of the two has the greater influence on model results.

But if both are standardized, then their parameters are “on equal footing” and it becomes much easier to compare. In practice, one might just standardize the claim amount variable since the number of claims variable is likely to be in the same “scale range” as Z-scores (but doesn’t allow for negative or zero values; which has its own implications for modeling and model interpretation; which standardizing also “fixes”).

1 Like

Note that in practice, you might find the practice of Weight Of Evidence coding (WOE coding) is also used to address problems in the distribution of an input variable.

I’ve done some work researching a change in credit scoring models, where one model has a notably different distribution/range than the other. For some of that comparison we looked at the population more as a function of the standardized densities/CDFs and percentile matching than changes in raw scores. Other than that though, I can’t say I’ve used it much in practice.

This is a great example of the finding “which predictor has a greater impact” Though i don’t know know how to implement standardization on interactions or other more complex modeling transformations like splines, binary variables, or even categoricals. Maybe standardizing is just for straight forward numerics.

You standardize the variable on its own, then use that in various ways for modeling.