Visual studio, database vs schema vs warehouse

In my current role, we’re moving away from snowflake for interacting with data to a snowflake extension in visual studio.

When i think of a general data hierarchy, they highest level is the database. then schema, then table, and that’s the full level of granularity.

Then we have these things called warehouses now? Are they supposed to fit in between database/schema/table? Are they even at a less granular level than database?

I try to run a very elementary query, and get this error message “No active warehouse selected in the current session. Select an active warehouse with the ‘use warehouse’ command” and don’t even know what warehouse i should be selecting with the “use warehouse” command. I can use the “SHOW WAREHOUSES” command, but don’t know which warehouse to select.

Any advice?

Thanks!

Epistemus

Ask someone who knows the data warehouses. No one here will know.

Datawarehouses where I’ve worked have been a central storage area that were loaded with multiple databases from around the organization. In that case, the datawarehouse contained databases, which contained tables (schema is just the way the tables will join to each other).

1 Like

A data warehouse is typically a database with denormalized tables that pulls in data from transactional databases which have normalized tables.

The reason why things are designed this way is that the transactional databases are optimized for capturing and storing data whereas the data warehouse is optimized for analytical tasks like building reports and doing data science. These two activities tend to have conflicting needs and therefore need different architecture. To find out more, search for OLTP vs OLAP databases.

A data warehouse is usually, but not always less granular than the transactional databases that feed into it.

3 Likes

Here are some examples of architectures I pulled from my db textbook to illustrate:

2 Likes

So this is useful for how to think of data. from most to least granular, we have data tables → schemas → databases → warehouses.

Nice!

That IS fascinating. I have some google-fu and youtubing to do.

Sorry to digress. I remember reading this stuff from your reading list for entry-level actuaries on the AO. Excellent reading. Did you ever update it?

What is the theory on that? Most users don’t need the level of detail and/or performance improvement is worth the trade off? The area that I deal with seems to have direct feeds from the admin systems without any loss of detail. Unless you consider the fact that the admin system has dailies (saved at least for a while) and data warehouse has only cumulative.

With very limited experience, this is what I’ve seen as well.

Also have seen a warehouse built that has been largely useless…so it can happen.

To add to what colonelsmoothie said, a data warehouse tends to be less volatile than the corresponding databases. The databases that it loads may only be updated once a day or once a week etc.

Some tables may still have the same granularity if they are transactional tables. If so, the history may only go back for a handful of months/years and there may be another less granular version with more history. These tables may take regular snapshots instead - end of day, end of month etc.

Some of the columns may not be loaded, if there isn’t a need for them on the reporting side.

1 Like