Beginner SQL help

1695814 · November 23, 2021, 10:34pm

I’m on Exercise 26.

Find out the average price of PCs and laptops produced by maker A.

I figured I’d first grab all of the models & prices for PCs & Laptops & put them into their own dataset/table/whatever you call it, but it seems that it’s only grabbing unique values. I.e., one model has two prices that are the same, but only one of those records is showing up…I think.

Here’s my code just to look at the price data:

WITH PRICEDATA AS (
    SELECT model, price FROM PC
    UNION 
    SELECT model, price FROM Laptop
)

SELECT MODEL, PRICE 
 FROM PRICEDATA

I didn’t specify DISTINCT, so why does it seem to be giving me distinct values of model & price?

My actual attempt to get the average is here, in case you want to look at it:

WITH PRICEDATA AS (
    SELECT model, price FROM PC
    UNION 
    SELECT model, price FROM Laptop
)

SELECT *
FROM PRODUCT
 JOIN PRICEDATA
ON PRODUCT.MODEL = PRICEDATA.MODEL
WHERE MAKER = 'A'

Here’s the solution from the solution manual I found…it makes sense & all, but I was hoping to come up with my own way to get it.

SELECT AVG(price)
FROM (
	SELECT price
	FROM Product JOIN PC
	ON Product.model = PC.model
	WHERE maker = 'A'
	UNION ALL
	SELECT price
	FROM Product JOIN Laptop
	ON Product.model = Laptop.model
	WHERE maker='A'
) AS AVG_price

SteveGrondin · November 24, 2021, 1:14am

In MS version of T-SQL UNION has the effect of giving distinct values (removing duplicates) while UNION ALL preserves duplicates. YourSQLMV.

SredniVashtar · November 24, 2021, 4:27pm

I gotta say, that’s one of the dumber things about SQL.

1695814 · January 12, 2022, 4:30pm

New question.

I’d like to do this in the WHERE clause of my SELECT statement:

SELECT...
WHERE...
    AND 20190000 <= [DATE] <= 20211231
...

…but SQL doesn’t like that.

Am I relegated to splitting that into two lines? a la…

SELECT...
WHERE...
    AND 20190000 <= [DATE]
    AND [DATE] <= 20211231
...

Why waste time write lot code when few word do trick?

knoath · January 12, 2022, 4:37pm

Between

SteveGrondin · January 12, 2022, 5:08pm

TIL, nice.

In case you are using datetimes rather than dates, pay attention to this note in the MS help doc:

WHERE RateChangeDate BETWEEN ‘20011212’ AND ‘20020105’;

…

The query retrieves the expected rows because the date values in the query and the datetime values stored in the RateChangeDate column have been specified without the time part of the date. When the time part is unspecified, it defaults to 12:00 A.M. Note that a row that contains a time part that is after 12:00 A.M. on 2002-01-05 would not be returned by this query because it falls outside the range.

Vorian_Atreides · January 12, 2022, 7:25pm

I do believe you can revise that to:

WHERE TO_DATE(RateChangeDate) BETWEEN '20011212' AND '20020105'

To avoid the “time part” discrepancy.

1695814 · April 5, 2022, 8:32pm

Using Product table, find out the number of makers who produce only one model.

So, as you recall, Product table is made up of maker, model, & type. For this one, we only need to worry about maker & model.

I need to count the number of makers who have a count of model = 1.

My attempt on my own was this:

Select count(maker)
from (
Select maker
from Product
Group by maker
having count(model) = 1
...

…incomplete because I ran out of cares to give today.

Here’s the solution:

WITH total_count
AS (SELECT maker
    FROM product 
    GROUP BY maker 
    HAVING COUNT(model) = 1
    )
SELECT COUNT(maker)
FROM total_count

Oh…maybe I just figured out what that’s saying. Let me know if I’m on the right/wrong track.

Let “total_count” be the name of the set of all makers that have a model-count of 1.
From that set, now named “total_count”, give me the count of makers.

SteveGrondin · April 5, 2022, 11:01pm

Let “total_count” be a the name of a table that has one field, maker. Each row of that table is a maker that has one model in the Product table. Tell me the number of rows in the “total_count” table.

ISQLedYP, otherwise sounds good to me. It would appear the solution uses an implicit definition of total_count. The syntax I use would list the field names in total_count-you can rename things if you want-. It requires that the number of fields defined after the table name and the number of items in the AS (SELECT …) that follows are the same.

WITH total_count (mkr)
AS (SELECT maker
FROM product
GROUP BY maker
HAVING COUNT(model) = 1
)

knoath · April 6, 2022, 1:11am

Select count(*) from
(Select maker
from
product
group by maker having max(model) = min(model)
)

SredniVashtar · April 6, 2022, 3:42pm

Did it tell you that your solution was wrong? Looks fine to me, though I’ve been in SAS land so long that I’ve forgotten what normal sql looks like.

1695814 · April 6, 2022, 3:51pm

What I posted was the provided solution. I just couldn’t understand how that particular syntax worked. As I said after the code, I think I get it now…I might not be able to come up with that on my own but at least I can understand why it works.

SredniVashtar · April 6, 2022, 4:16pm

Ahh, okay. Yeah, I think that syntax is uncommon? I’ve seen it before, but not enough to have an opinion about it.

Of course for actudonk work, we usually just write a bunch of little queries, rather than one super matryoshka query.

SteveGrondin · April 7, 2022, 8:48pm

IMHO CTEs (common table expressions) in T SQL tend to be readable than nested sub queries. You kinda build each part first, then use the results in the next part of the statement.

1695814 · April 11, 2022, 5:12pm

Income_o

Income_o
point	date	inc
1	2001-03-22 00:00:00.000	15,000
1	2001-03-23 00:00:00.000	15,000
1	2001-03-24 00:00:00.000	3,400
1	2001-04-13 00:00:00.000	5,000
1	2001-05-11 00:00:00.000	4,500
2	2001-03-22 00:00:00.000	10,000
2	2001-03-24 00:00:00.000	1,500
3	2001-09-13 00:00:00.000	11,500
3	2001-10-02 00:00:00.000	18,000

Outcome_o

Outcome_o
point	date	out
1	2001-03-14 00:00:00.000	15,348
1	2001-03-24 00:00:00.000	3,663
1	2001-03-26 00:00:00.000	1,221
1	2001-03-28 00:00:00.000	2,075
1	2001-03-29 00:00:00.000	2,004
1	2001-04-11 00:00:00.000	3,195
1	2001-04-13 00:00:00.000	4,490
1	2001-04-27 00:00:00.000	3,110
1	2001-05-11 00:00:00.000	2,530
2	2001-03-22 00:00:00.000	1,440
2	2001-03-29 00:00:00.000	7,848
2	2001-04-02 00:00:00.000	2,040
3	2001-09-13 00:00:00.000	1,500
3	2001-09-14 00:00:00.000	2,300
3	2002-09-16 00:00:00.000	2,150

provided solution

Select i.point, i.date, i.inc, o.out
from Income_o i LEFT JOIN Outcome_o o
on i.point = o.point
and i.date = o.date

UNION

Select o.point, o.date, i.inc, o.out
from Outcome_o o LEFT JOIN Income_o i
on i.point = o.point
and i.date = o.date

output of provided solution

point	date	inc	out
1	2001-03-14 00:00:00.000	NULL	15348.0000
1	2001-03-22 00:00:00.000	15000.0000	NULL
1	2001-03-23 00:00:00.000	15000.0000	NULL
1	2001-03-24 00:00:00.000	3400.0000	3663.0000
1	2001-03-26 00:00:00.000	NULL	1221.0000
1	2001-03-28 00:00:00.000	NULL	2075.0000
1	2001-03-29 00:00:00.000	NULL	2004.0000
1	2001-04-11 00:00:00.000	NULL	3195.0400
1	2001-04-13 00:00:00.000	5000.0000	4490.0000
1	2001-04-27 00:00:00.000	NULL	3110.0000
1	2001-05-11 00:00:00.000	4500.0000	2530.0000
2	2001-03-22 00:00:00.000	10000.0000	1440.0000
2	2001-03-24 00:00:00.000	1500.0000	NULL
2	2001-03-29 00:00:00.000	NULL	7848.0000
2	2001-04-02 00:00:00.000	NULL	2040.0000
3	2001-09-13 00:00:00.000	11500.0000	1500.0000
3	2001-09-14 00:00:00.000	NULL	2300.0000
3	2001-10-02 00:00:00.000	18000.0000	NULL
3	2002-09-16 00:00:00.000	NULL	2150.000

Why do I have to do two left joins with a union in stead of just doing a full outer join? (The answer is “Because a FOJ gives me 6 columns instead of 4”.)

I was thinking of trying to do this…

SELECT i.point, i.date, o.point, o.date, i.inc, o.out
FROM Income_o i
FULL OUTER JOIN Outcome_o o ON i.point = o.point
and i.date = o.date

…but it’s not right because it gives me 6 columns instead of 4…

output of my attempt at it

point	date	point	date	inc	out
NULL	NULL	1	2001-03-14 00:00:00.000	NULL	15348.0000
NULL	NULL	1	2001-03-26 00:00:00.000	NULL	1221.0000
NULL	NULL	1	2001-03-28 00:00:00.000	NULL	2075.0000
NULL	NULL	1	2001-03-29 00:00:00.000	NULL	2004.0000
NULL	NULL	1	2001-04-11 00:00:00.000	NULL	3195.0400
NULL	NULL	1	2001-04-27 00:00:00.000	NULL	3110.0000
NULL	NULL	2	2001-03-29 00:00:00.000	NULL	7848.0000
NULL	NULL	2	2001-04-02 00:00:00.000	NULL	2040.0000
NULL	NULL	3	2001-09-14 00:00:00.000	NULL	2300.0000
NULL	NULL	3	2002-09-16 00:00:00.000	NULL	2150.0000
1	2001-03-22 00:00:00.000	NULL	NULL	15000.0000	NULL
1	2001-03-23 00:00:00.000	NULL	NULL	15000.0000	NULL
1	2001-03-24 00:00:00.000	1	2001-03-24 00:00:00.000	3400.0000	3663.0000
1	2001-04-13 00:00:00.000	1	2001-04-13 00:00:00.000	5000.0000	4490.0000
1	2001-05-11 00:00:00.000	1	2001-05-11 00:00:00.000	4500.0000	2530.0000
2	2001-03-22 00:00:00.000	2	2001-03-22 00:00:00.000	10000.0000	1440.0000
2	2001-03-24 00:00:00.000	NULL	NULL	1500.0000	NULL
3	2001-09-13 00:00:00.000	3	2001-09-13 00:00:00.000	11500.0000	1500.0000
3	2001-10-02 00:00:00.000	NULL	NULL	18000.0000	NULL

…so now my question is "How did that first solution know to drop the i. & o. prefixes and just call “point” and “date” by their generic, non-prefixed names?

knoath · April 11, 2022, 5:36pm

With a Union the column names are taken from the first table/ result set, no matter what you put in the second (assuming they match in data type).

SteveGrondin · April 11, 2022, 7:38pm

I think a full outer join may work if you use a CASE around point and date - horrible field name, possibly bad data type - to not show NULLs.

SELECT CASE WHEN o.point IS NULL THEN i.point ELSE o.point END AS point,
CASE WHEN o.[date] IS NULL THEN i.[date] ELSE o.[date] END AS [date],
inc,
[out]
FROM income_o i FULL OUTER JOIN outcome_o o ON i.point = o.point AND i.[date] = o.[date]

knoath · April 11, 2022, 9:41pm

Coalesce will do the same thing

1695814 · April 12, 2022, 8:46pm

Exercise 32:

Why can’t I multiply by “1/2” but I can multiply by “0.5” or “.5”?

This…

SELECT country, cast(ROUND(AVG(1/2*power(bore,3)),2) AS NUMERIC(10,2)) AS MW

…does not work, but this…

SELECT country, cast(ROUND(AVG(0.5*power(bore,3)),2) AS NUMERIC(10,2)) AS MW

…does.

knoath · April 12, 2022, 9:14pm

It could be a rounding issue. Some versions of SQL I’ve used will round to the number of decimal places that you have used (in your case, 0 decimal places). It might behave differently if you put in 1.0/2.
Alternatively, it might be an order of operations issue in which case you might want to put AVG((1/2) *