Beginner SQL help

1695814 · December 2, 2020, 7:56pm

okay, thank you. At least I’m not alone & not misunderstanding something.

SredniVashtar · December 2, 2020, 8:09pm

Just be glad you’re going from SAS to SQL and not the other way around.
If I have to google wtf an “informat” is one more time I’m going to literally hang myself.

BruteForce · December 2, 2020, 8:12pm

The alias you give it in the Select statement becomes the header of that column in your output. If you put those results into a table, then you would use that alias.

I have the most experience with Microsoft SQL Server (not sure what type of SQL/editor you are using). In that you don’t have to create the alias. If you don’t, the column just shows “column 1” or something like that for the header. I can’t remember exactly what shows up.

BruteForce · December 2, 2020, 8:12pm

I’m going from SQL to SAS at my current job. I’m hopelessly lost and have mostly just fallen to relying on others to pull/manipulate data for me.

SredniVashtar · December 2, 2020, 8:19pm

Remember that PROC SQL is your friend… or frenemy anyway.

1695814 · December 2, 2020, 9:18pm

Ah, yes. Thank you.

1695814 · December 2, 2020, 9:22pm

I encourage you to start a “SAS help” thread. I found SAS easier to understand, but that’s probably because I learned it first.

I took an online webinar on PROC SQL and came to the conclusion that there was nothing there that I didn’t already know how to do in SAS so why bother doing it a different way. I imagine there are good reasons to use PROC SQL but I don’t know what they are.

dr_t_non-fan · December 2, 2020, 9:56pm

How about a “SAS sucks” thread?
PROC SQL is for people who don’t want to learn the other procedural crap that is SAS, and just want to pull data using a much easier (and structured) query language.

1695814 · December 2, 2020, 10:44pm

Thems fightin’ words.

SQL is for people who yoda like talk.

dr_t_non-fan · December 2, 2020, 10:47pm

Oh yeah? Well PROC you!!!

1695814 · December 3, 2020, 2:28am

Your such a FREQ…if you know what I MEANS…NWAY, for all of your help you may move to the head of the CLASS. Now, I’ve got to RUN;

SredniVashtar · December 3, 2020, 2:58am

SAS is for NOOBS.

I would use an SQL insult except that whole language consists of 5 words that are completely ordinary and easy to understand

jraven · December 3, 2020, 3:23am

This is SQL-implementation dependent. I just checked, in case my memory was faulty, and a quick online search indicates that both MySQL & PostgreSQL let you use the aliases assigned in the SELECT statement in the GROUP BY & ORDER BY clauses. Oracle SQL doesn’t seem to let you use them in GROUP BY, though it looks like it might let you use the alias in ORDER BY?

Microsoft’s SQL products, though, well…

1695814 · December 3, 2020, 3:43am

SteveGrondin · December 3, 2020, 1:18pm

I think the absence or persistence of “renaming” in later clauses in various SQL types is adequately discussed in prior posts, so I am not commenting on that, other than to say it would be handy if MS versions did.

This brings up something I’d like to expand upon. The GROUP BY statement means you are creating a summary report. Records that share a commonality will be lumped into a single record in the resulting report/dataset/table or whatever it is you think is coming from your statement. If all records with the same characteristic in field A are dumped into a single row, you have to tell SQL how to combine the values of the other fields which were not GROUPed BY in the SELECT clause, say by SUM() or MAX().

You can do non-cumulative/non-grouping types of math functions (SELECT FieldA + FieldB AS RecTot) without a GROUP BY. By that I mean the math is operating on a single record at a time.

Also, SELECT is a bad English choice for a key word for what I think of as “GIVE ME” instead of the “choosing” sense of the English word “select”. Or you can think of it as I am choosing these fields, and WHERE is the qualifying aspect of which records to choose and which to exclude.

1695814 · March 16, 2021, 9:21pm

HAVING vs WHERE.

I understand that “The difference between the having and where clause in SQL is that the where clause cannot be used with aggregates, but the having clause can.” (Having vs Where Clause in SQL | GeeksforGeeks)

…but my question is, in what instance would you want to wait until your data is aggregated and use the HAVING clause vs just getting rid of the spurious data at the outset by using the WHERE clause.

I could see how this would work in SAS since you can use the same dataset over and over…but it just seems to me in SQL that you read and process/summarize your data all in one big honkin’ step.

Vorian_Atreides · March 16, 2021, 9:48pm

You have transactional data and you only want those customers whose total spending is above a certain threshold. (Or below.)

Marcie · March 16, 2021, 11:50pm

The way I think of it (which may be inaccurate in subtle ways that haven’t bitten me yet) is WHERE works on the source data table(s), filtering rows before SELECTing columns, while HAVING works on the result table, after SELECTion & possible aggregation.

@Vorian_Atreides’s example is a good one & the type of thing that came to mind first. Another example where HAVING might come in handy is when you want to filter on a field that has a lot of possible values but is not indexed in the source table (eg ZIP code). If I put zip_code = 12345 in the WHERE clause it might cause the query to run much longer as it scans a large table, but aggregating the data first & then filtering using HAVING can have better performance. YMMV.

knoath · March 17, 2021, 1:26am

If you’re using a dollar threshold for the having and it turns out that there are a lot of rows (pre-aggregation) with zero dollar amounts, you might want to exclude the zero dollar rows in your where statement as it will have no bearing on your having clause.

1695814 · May 27, 2021, 3:41pm

I have some policy data that has a record for each coverage on a policy.

Some of those coverages have inconsistent data and I’m trying to find them.

Here’s some sample data (assume that all coverages are for the base insured (there are no spousal or child riders here)):
policy,coverage,sex
ABC001,B,M
ABC001,R1,M
ABC001,R2,M
XYZ999,B,M
XYZ999,R1,F
XYZ999,R2,M

As you can see the R1 rider on XYZ999 has a sex of F even though it’s a rider on the base insured who is most certainly an M.

In SAS, I would do something like this (assume it’s actually a flat file and not comma delimited (also, I’m not quite sure of the syntax for some of this anymore)):

DATA BASE RDR;
INFILE POLDATA;
INPUT
@01 POLICY $6.
@07 COVERAGE $2.
@09 SEX $1.
;
IF COVERAGE = 'B' THEN OUTPUT BASE;
ELSE OUTPUT RDR;
RUN;

DATA BADSEX;
MERGE BASE(IN=A KEEP=POLICY SEX RENAME=(SEX=BASESEX)) RDR(IN=B RENAME=(SEX=RDRSEX));
BY POLICY;
IF A AND B AND BASESEX<>RDRSEX;
RUN;
 
PROC PRINT;
ID POLICY;
VAR COVERAGE BASESEX RDRSEX;
TITLE 'MISMATCHED SEX';

The output would (hopefully) give me “XYZ999 R1 M F”.

I need to figure out how to do an inner (?) join for a dataset onto itself.