Beginner SQL help

SteveGrondin · June 10, 2021, 10:23pm

I did a tiny bit of research, outer joins only slightly more expensive than a inner join, and I specifically wanted to find riders with no base. If you trust your data to not have such a monstrosity, then you could use an inner join.

I did get a chuckle when the source said the best code is one that gets the right answer.

SredniVashtar · June 11, 2021, 12:41am

I think? You’d need the sex comparison in the join part instead of the where part then. Not sure. Sorry if I’m being dumb here.

SredniVashtar · June 11, 2021, 12:50am

Nah. Having is for when you have aggregating functions.

dr_t_non-fan · June 11, 2021, 1:03am

Is this fix better?

Marcie · June 11, 2021, 1:16am

In this case, I’m not sure either WHERE or HAVING with a.SEX <> b.SEX will give you the result you seek, since the base fields will be null. But I could be wrong.

knoath · June 11, 2021, 3:08am

I would amend that to say that the best code is the one that gets the right answer every time you run it (months, quarters, years from now).

You might get lucky that first time if your dealing with clean data but if later versions of the data are dirty with duplicate rows, alphas in numeric columns, unexpected values, null values, multiple sexes per policy etc., your code will fail if it doesn’t handle the dirt properly.

magillaG · June 11, 2021, 3:48am

My personal preference is to keep clear what is part of the inner joins as separate from the where clause. So I really don’t like putting either the policy or the sex comparison in the where clause, since this really is part of the join.

To some degree this is personal preference. I see lots of people who are good at sql write the query the way you have. But it would make it harder for me to update later. Although for a query that simple it doesn’t really matter.

SteveGrondin · June 11, 2021, 11:48am

This is an interesting way of thinking about it. I was taught that reducing the “input” number of records to the sides of a JOIN is a good way to reduce the cost of a JOIN (hence the CTEs Bse and Rid). And when I think of JOINs, I think of how the items match, defaulting to a “=” in the field relationship. The <> is between the two sides of the JOIN, so you can’t employ the reduction technique for that condition before the JOIN. My way of thinking was “retain only these records from lots of matches”. Putting the mismatch in the ON statement is more like “only match up if this is true”. It seems as if that might be faster.

This is good advice too, especially when dealing with a data source that’s new to you.

magillaG · June 11, 2021, 2:16pm

The only way i tend to worry about speed is by worrying about how things scale. So I worry about O(N) vs O(N^2), which basically means making sure i have my indices done. Beyond that, though, I tend to think much more about the code being readable to people.

This means that i’ve never really dove into the details of how the engine constructs the procedure of the query, like it sounds like maybe you have. But i’ve always been able to work well enough with large data sets(well as large as you’d want to use relational databases for.) It might not be good enough if I were writing queries for web applications, say, and halving the speed of the query might really matter.

Doesn’t the engine know how to move things between the where and join clauses? If not then I’d worry more about moving the coverage selection to a sub query that is performed before the join. If it was a really huge data set, I’d probably be superstitious and do that even though i didn’t think it would really matter.

I like sql because it is easier to use than some alternatives. But I dislike that most relational databases are not really optimized for analysis.

SredniVashtar · June 11, 2021, 3:18pm

I agree with this. Laziness also features strongly since most of my SQL is ad hoc and I’m just writing the way I think.

SredniVashtar · June 11, 2021, 3:29pm

I just meant it might be easier to read. It’s the same.

SteveGrondin · June 12, 2021, 4:08pm

I haven’t really been able to figure out indices. My typical use case is “load monthly extracts to cumulative table once a month, hit cumulative table for downstream processes once a month”. I use bcp to load since it is fast and simple. I got the impression (could be wrong) bcp doesn’t play well with indices, so you drop and recreate the index. If creating the index is at least as costly as the use of a table without index, do I gain much? The tables don’t have frequent writes, and they aren’t being used by many people (nor are the queries), so much of the optimization I consider isn’t that helpful.

I haven’t done much exploration at all. I know that there are some built in (in MS- SQL at least) tools to look at things, but I don’t know much about them at all.

This is a good question. I might try a simple test and report back. But like I mentioned above, my facility with those tools might be so low I can’t tell.

1695814 · July 28, 2021, 10:38pm

NEW QUESTION.

It’s from “SELECT statement exercises: learning stage”, exercise “7 (2)”.

I have a table called Product with variables maker, model, & type.
There are three other tables. I happen to know that I only need two of them, PC & Laptop. They both have variables model & price.

To join just PRODUCT and PC I’d do this:

SELECT PRODUCT.model, price
FROM PC
INNER JOIN Product
ON PC.MODEL = PRODUCT.MODEL
WHERE maker = 'B'

Two problems:

I don’t know how to also get Laptop joined in there.
There are multiple records on PC that have the same model and price. It appears that I’m only supposed to return one instance of that.

Vorian_Atreides · July 29, 2021, 3:06am

Do a SELECT DISTINCT on the PC table, then join this to the prior table with an additional join statement.

dothemath · July 29, 2021, 3:06am

You could just do 2 queries. Use a UNION to get the results in a singe set.

Or, left join the PRODUCT table to each PC and LAPTOP table. Make your SELECT look like this

SELECT PRODUCT.model,
Price = COALESCE(PC.price, LAPTOP.price)

SteveGrondin · July 29, 2021, 2:48pm

First you bitch and moan about someone creating two separate tables that are identical in structure, plus the one having duplicate records. Then, realizing you are dealing with stupid people, you have to take precautions regarding the implications of duplicate data on matches.

I think I’d do a union on the laptop and PC tables to get a single table with distinct records. Then I’d join the result to the product table to be able to WHERE the maker = B. I haven’t tested the syntax of a UNION within a Common Table Expression, but it might actually work.

Rastiln · July 29, 2021, 3:01pm

Others have given solutions but I think here’s the code. isnull might be unique to SSMS, can’t recall, so the coalesce above might be needed. Curious if I got it right since it’s been a few years since I’ve used SQL.

SELECT PRODUCT.model, isnull(PC.price, Laptop.price)
FROM Product
LEFT JOIN Laptop
ON Product.model = Laptop.model
LEFT JOIN (SELECT DISTINCT model, price FROM PC) AS PC
ON Product.model = PC.model
WHERE PRODUCT.maker = ‘B’

1695814 · July 29, 2021, 3:50pm

Your code is really close. It doesn’t like that “isnull” part. When I take that out I get two columns of price - one from PC and one from LAPTOP. Obviously, the solution wants only one column of prices.

Here’s my current code:

SELECT PRODUCT.model, PC.price AS PRICE, LAPTOP.price AS PRICE
FROM Product
LEFT JOIN Laptop
 ON Product.model = Laptop.model
LEFT JOIN (SELECT DISTINCT model, price FROM PC) AS PC
 ON Product.model = PC.model
WHERE PRODUCT.maker = 'B'

The suggested “Help topics” list UNION…

…so that must be one of the steps.

1695814 · July 29, 2021, 4:00pm

This…

SELECT PRODUCT.model, LAPTOP.price AS PRICE
FROM Product
LEFT JOIN Laptop
 ON Product.model = Laptop.model
WHERE PRODUCT.maker = 'B' AND PRICE IS NOT NULL

UNION

SELECT PRODUCT.model, PC.price AS PRICE
FROM Product
LEFT JOIN (SELECT DISTINCT model, price FROM PC) AS PC
 ON Product.model = PC.model
WHERE PRODUCT.maker = 'B' AND PRICE IS NOT NULL

…gives me the correct bottom-line-result, but I get this comment from the russian machine:

Incorrect.

Your query returned the correct dataset on the first (available) database, but it returned incorrect dataset on the second checking database.

* Wrong number of records (less by 8)

This exercise has FAQ

BruteForce · July 29, 2021, 4:00pm

Are the prices the same for each row between the PC.price and Laptop.price columns?

If so, then I think here’s what I would do (I think this would work):

SELECT DISTINCT Product.model, Laptop.price
From Product
LEFT JOIN Laptop
 ON Product.model = Laptop.model
Where Product.maker = 'B'
UNION
SELECT Product.model, PC.price
From Product
LEFT JOIN PC
 ON Product.model = PC.model
Where Product.maker = 'B'

I think the distinct on the first statement is good for the entire pull. Not positive though.