Beginner SQL help

I’m on Exercise 26.

Find out the average price of PCs and laptops produced by maker A.

I figured I’d first grab all of the models & prices for PCs & Laptops & put them into their own dataset/table/whatever you call it, but it seems that it’s only grabbing unique values. I.e., one model has two prices that are the same, but only one of those records is showing up…I think.

Here’s my code just to look at the price data:

WITH PRICEDATA AS (
    SELECT model, price FROM PC
    UNION 
    SELECT model, price FROM Laptop
)

SELECT MODEL, PRICE 
 FROM PRICEDATA

I didn’t specify DISTINCT, so why does it seem to be giving me distinct values of model & price?

My actual attempt to get the average is here, in case you want to look at it:

WITH PRICEDATA AS (
    SELECT model, price FROM PC
    UNION 
    SELECT model, price FROM Laptop
)

SELECT *
FROM PRODUCT
 JOIN PRICEDATA
ON PRODUCT.MODEL = PRICEDATA.MODEL
WHERE MAKER = 'A'

Here’s the solution from the solution manual I found…it makes sense & all, but I was hoping to come up with my own way to get it.

SELECT AVG(price)
FROM (
	SELECT price
	FROM Product JOIN PC
	ON Product.model = PC.model
	WHERE maker = 'A'
	UNION ALL
	SELECT price
	FROM Product JOIN Laptop
	ON Product.model = Laptop.model
	WHERE maker='A'
) AS AVG_price

In MS version of T-SQL UNION has the effect of giving distinct values (removing duplicates) while UNION ALL preserves duplicates. YourSQLMV.

4 Likes

I gotta say, that’s one of the dumber things about SQL.

1 Like

New question.

I’d like to do this in the WHERE clause of my SELECT statement:

SELECT...
WHERE...
    AND 20190000 <= [DATE] <= 20211231
...

…but SQL doesn’t like that.

Am I relegated to splitting that into two lines? a la…

SELECT...
WHERE...
    AND 20190000 <= [DATE]
    AND [DATE] <= 20211231
...

Why waste time write lot code when few word do trick?

Between

3 Likes

TIL, nice.

In case you are using datetimes rather than dates, pay attention to this note in the MS help doc:

WHERE RateChangeDate BETWEEN ‘20011212’ AND ‘20020105’;

The query retrieves the expected rows because the date values in the query and the datetime values stored in the RateChangeDate column have been specified without the time part of the date. When the time part is unspecified, it defaults to 12:00 A.M. Note that a row that contains a time part that is after 12:00 A.M. on 2002-01-05 would not be returned by this query because it falls outside the range.

1 Like

I do believe you can revise that to:

WHERE TO_DATE(RateChangeDate) BETWEEN '20011212' AND '20020105'

To avoid the “time part” discrepancy.

Using Product table, find out the number of makers who produce only one model.

So, as you recall, Product table is made up of maker, model, & type. For this one, we only need to worry about maker & model.

I need to count the number of makers who have a count of model = 1.

My attempt on my own was this:

Select count(maker)
from (
Select maker
from Product
Group by maker
having count(model) = 1
...

…incomplete because I ran out of cares to give today.

Here’s the solution:

WITH total_count
AS (SELECT maker
    FROM product 
    GROUP BY maker 
    HAVING COUNT(model) = 1
    )
SELECT COUNT(maker)
FROM total_count

Oh…maybe I just figured out what that’s saying. Let me know if I’m on the right/wrong track.

Let “total_count” be the name of the set of all makers that have a model-count of 1.
From that set, now named “total_count”, give me the count of makers.

Let “total_count” be a the name of a table that has one field, maker. Each row of that table is a maker that has one model in the Product table. Tell me the number of rows in the “total_count” table.

ISQLedYP, otherwise sounds good to me. It would appear the solution uses an implicit definition of total_count. The syntax I use would list the field names in total_count-you can rename things if you want-. It requires that the number of fields defined after the table name and the number of items in the AS (SELECT …) that follows are the same.

WITH total_count (mkr)
AS (SELECT maker
FROM product
GROUP BY maker
HAVING COUNT(model) = 1
)

1 Like

Select count(*) from
(Select maker
from
product
group by maker having max(model) = min(model)
)

Did it tell you that your solution was wrong? Looks fine to me, though I’ve been in SAS land so long that I’ve forgotten what normal sql looks like.

What I posted was the provided solution. I just couldn’t understand how that particular syntax worked. As I said after the code, I think I get it now…I might not be able to come up with that on my own but at least I can understand why it works.

Ahh, okay. Yeah, I think that syntax is uncommon? I’ve seen it before, but not enough to have an opinion about it.

Of course for actudonk work, we usually just write a bunch of little queries, rather than one super matryoshka query.

IMHO CTEs (common table expressions) in T SQL tend to be readable than nested sub queries. You kinda build each part first, then use the results in the next part of the statement.

1 Like
Income_o
Income_o
point date inc
1 2001-03-22 00:00:00.000 15,000
1 2001-03-23 00:00:00.000 15,000
1 2001-03-24 00:00:00.000 3,400
1 2001-04-13 00:00:00.000 5,000
1 2001-05-11 00:00:00.000 4,500
2 2001-03-22 00:00:00.000 10,000
2 2001-03-24 00:00:00.000 1,500
3 2001-09-13 00:00:00.000 11,500
3 2001-10-02 00:00:00.000 18,000
Outcome_o
Outcome_o
point date out
1 2001-03-14 00:00:00.000 15,348
1 2001-03-24 00:00:00.000 3,663
1 2001-03-26 00:00:00.000 1,221
1 2001-03-28 00:00:00.000 2,075
1 2001-03-29 00:00:00.000 2,004
1 2001-04-11 00:00:00.000 3,195
1 2001-04-13 00:00:00.000 4,490
1 2001-04-27 00:00:00.000 3,110
1 2001-05-11 00:00:00.000 2,530
2 2001-03-22 00:00:00.000 1,440
2 2001-03-29 00:00:00.000 7,848
2 2001-04-02 00:00:00.000 2,040
3 2001-09-13 00:00:00.000 1,500
3 2001-09-14 00:00:00.000 2,300
3 2002-09-16 00:00:00.000 2,150
provided solution

Select i.point, i.date, i.inc, o.out
from Income_o i LEFT JOIN Outcome_o o
on i.point = o.point
and i.date = o.date

UNION

Select o.point, o.date, i.inc, o.out
from Outcome_o o LEFT JOIN Income_o i
on i.point = o.point
and i.date = o.date

output of provided solution
point date inc out
1 2001-03-14 00:00:00.000 NULL 15348.0000
1 2001-03-22 00:00:00.000 15000.0000 NULL
1 2001-03-23 00:00:00.000 15000.0000 NULL
1 2001-03-24 00:00:00.000 3400.0000 3663.0000
1 2001-03-26 00:00:00.000 NULL 1221.0000
1 2001-03-28 00:00:00.000 NULL 2075.0000
1 2001-03-29 00:00:00.000 NULL 2004.0000
1 2001-04-11 00:00:00.000 NULL 3195.0400
1 2001-04-13 00:00:00.000 5000.0000 4490.0000
1 2001-04-27 00:00:00.000 NULL 3110.0000
1 2001-05-11 00:00:00.000 4500.0000 2530.0000
2 2001-03-22 00:00:00.000 10000.0000 1440.0000
2 2001-03-24 00:00:00.000 1500.0000 NULL
2 2001-03-29 00:00:00.000 NULL 7848.0000
2 2001-04-02 00:00:00.000 NULL 2040.0000
3 2001-09-13 00:00:00.000 11500.0000 1500.0000
3 2001-09-14 00:00:00.000 NULL 2300.0000
3 2001-10-02 00:00:00.000 18000.0000 NULL
3 2002-09-16 00:00:00.000 NULL 2150.000

Why do I have to do two left joins with a union in stead of just doing a full outer join? (The answer is “Because a FOJ gives me 6 columns instead of 4”.)

I was thinking of trying to do this…

SELECT i.point, i.date, o.point, o.date, i.inc, o.out
FROM Income_o i
FULL OUTER JOIN Outcome_o o ON i.point = o.point
and i.date = o.date

…but it’s not right because it gives me 6 columns instead of 4…

output of my attempt at it
point date point date inc out
NULL NULL 1 2001-03-14 00:00:00.000 NULL 15348.0000
NULL NULL 1 2001-03-26 00:00:00.000 NULL 1221.0000
NULL NULL 1 2001-03-28 00:00:00.000 NULL 2075.0000
NULL NULL 1 2001-03-29 00:00:00.000 NULL 2004.0000
NULL NULL 1 2001-04-11 00:00:00.000 NULL 3195.0400
NULL NULL 1 2001-04-27 00:00:00.000 NULL 3110.0000
NULL NULL 2 2001-03-29 00:00:00.000 NULL 7848.0000
NULL NULL 2 2001-04-02 00:00:00.000 NULL 2040.0000
NULL NULL 3 2001-09-14 00:00:00.000 NULL 2300.0000
NULL NULL 3 2002-09-16 00:00:00.000 NULL 2150.0000
1 2001-03-22 00:00:00.000 NULL NULL 15000.0000 NULL
1 2001-03-23 00:00:00.000 NULL NULL 15000.0000 NULL
1 2001-03-24 00:00:00.000 1 2001-03-24 00:00:00.000 3400.0000 3663.0000
1 2001-04-13 00:00:00.000 1 2001-04-13 00:00:00.000 5000.0000 4490.0000
1 2001-05-11 00:00:00.000 1 2001-05-11 00:00:00.000 4500.0000 2530.0000
2 2001-03-22 00:00:00.000 2 2001-03-22 00:00:00.000 10000.0000 1440.0000
2 2001-03-24 00:00:00.000 NULL NULL 1500.0000 NULL
3 2001-09-13 00:00:00.000 3 2001-09-13 00:00:00.000 11500.0000 1500.0000
3 2001-10-02 00:00:00.000 NULL NULL 18000.0000 NULL

…so now my question is "How did that first solution know to drop the i. & o. prefixes and just call “point” and “date” by their generic, non-prefixed names?

With a Union the column names are taken from the first table/ result set, no matter what you put in the second (assuming they match in data type).

2 Likes

I think a full outer join may work if you use a CASE around point and date - horrible field name, possibly bad data type - to not show NULLs.

SELECT CASE WHEN o.point IS NULL THEN i.point ELSE o.point END AS point,
CASE WHEN o.[date] IS NULL THEN i.[date] ELSE o.[date] END AS [date],
inc,
[out]
FROM income_o i FULL OUTER JOIN outcome_o o ON i.point = o.point AND i.[date] = o.[date]

Coalesce will do the same thing

Exercise 32:

Why can’t I multiply by “1/2” but I can multiply by “0.5” or “.5”?

This…

SELECT country, cast(ROUND(AVG(1/2*power(bore,3)),2) AS NUMERIC(10,2)) AS MW

…does not work, but this…

SELECT country, cast(ROUND(AVG(0.5*power(bore,3)),2) AS NUMERIC(10,2)) AS MW

…does.

It could be a rounding issue. Some versions of SQL I’ve used will round to the number of decimal places that you have used (in your case, 0 decimal places). It might behave differently if you put in 1.0/2.
Alternatively, it might be an order of operations issue in which case you might want to put AVG((1/2) *