Select distinct and count(*) from each field in big data

epistemus · January 25, 2024, 3:39pm

I have a large data set i’d like to do some EDA on.

For a given variable, i can run

select Variable1, count(*) as counter from bigdata group by variable1 order by counter desc;

But I have to do that for each variable, then look at the results.

Is there any way to create a loop in sql or python or R that will give me for each variable i’m interested in, the distinct values and the counts? So the ideal end data set would be

Variable1 variable1counter Variable2 variable2counter…VariableN VariableNcounter

I do recognize that variable1 may have 5 ditinct values, while variable2 has 3 or VariableN is only 1 or 2.

Any ideas?

Rastiln · January 25, 2024, 4:57pm

Ever done dynamic SQL?

Should be possible to make a list of variables, set them as @VarName, then run ‘Select ’ + @VarName ’ , COUNT(*) FROM…’

Just speculating how to loop through the list appropriately. If you can name them @Var1, @Var2, then you should be able to make it search for @Var+n.

Been a while since I’ve fiddled with dynamic SQL. Or loops in SQL.

Oh, and just GROUP BY to get each level and the count.