Got BDE?⚡️
Hi BDEs! ⚡️
Hope you’re having an amazing summer so far! I’ve been busy traveling for conferences and focusing on video content, but I’m COMMITTED to getting back into this newsletter weekly again 💪🏼
A few updates before I get into subqueries… 🔥
I did my first keynote at Infoshare in Gdańsk, Poland 🎉 I spoke on Staying Human in a Data-Driven World and explored how data-driven decisions can have real human consequences when made without empathy. From layoffs and firings to opaque performance reviews, these choices don’t just negatively affect individuals, they also shape company culture in lasting ways.
I went to Snowflake Summit as an Insider to get a sneak preview of all of their new data and AI announcements! It was so fun to hang with friends and partner with one of my favorite data companies ⚡️
I’ll be at Vidcon next week— so if you’re going, say hi 👋
I’m launching a data analytics roadmap soon so STAY TUNED 🥳
⚡️If you’re new here:
💁🏽♀️ Who Am I?
I’m Jess Ramos, the founder of Big Data Energy and the creator of the BEST SQL course and community: Big SQL Energy⚡️. Check me out on socials: 🔗LinkedIn, 🔗Instagram, 🔗YouTube, and 🔗TikTok. And of course subscribe to my 🔗newsletter here for all my upcoming lessons and updates— all for free!
⚡️ Social Highlights:
Subqueries: The basics
Learning about subqueries goes along well with CTEs because almost every CTE can be written as a subquery and vice versa.
Subqueries are a query nested inside of another query. Technically you can have subqueries in subqueries in subqueries in subqueries… and so on… but don’t do this to your coworkers LOL! They can also be turned inside out and written as a CTE (I’ll show you an example of this later!) which greatly increases readability and interpretability. I highly recommend using CTEs for multiple nested subqueries or really complicated/long subqueries for better code organization and readability.
Subqueries vs. CTEs
A few differences between subqueries and CTEs are that:
Subqueries cannot be reused throughout the query unlike CTEs. CTEs can be referred to multiple times within the same query, but subqueries cannot be reused.
You don’t have to name your subquery (although I recommend doing so with an alias) like you do a for CTE.
Subqueries can’t be written recursively like CTEs. If you’re not familiar with this, don’t worry, it’s not used much!
Subqueries are generally harder to interpret since a reader has to start on the inside of the query and work their way out to understand it. I highly recommend using CTEs for more complex queries because too many subqueries can greatly decrease readability and ease of maintenance.
Subqueries in the WHERE clause
A really common example of a subquery is using it in the WHERE clause of the outer query. You use the subquery for conditional logic by using an operator and then filtering the original dataset for that condition. Check out this example below with an aggregate function, min( ):
select *
from orders
where
order_date =
(select min(order_date) from orders) --this is the subquery
-- you can also use other operators here, not just =
Here, the query is being filtered for all orders that have the minimum order date, which is the condition. They can also be used to filter for a column (or a list) in the where clause like this:
select *
from orders
where
order_id IN
(select order_id from payments) --this subquery produces 1 column
Here we’re selecting all the order IDs in the payments table and then filtering the orders table for those orders. This gives us all the orders data for the orders that are in the payments table. Why would we use this instead of manually adding in the IDs we want to search for like this?
select *
from orders
where
order_id IN
('7364', '3820', '9818',...)
Imagine you’re working with a HUGE dataset with thousands or millions of rows… It wouldn’t be convenient to type in or even copy and paste that many IDs into a query. Plus, you could even explode your IDE (not literally) by trying process all those!
Adding in a manual list requires you to figure out which ones to add ahead of time instead of them being added in automatically. This adds in an extra manual step into your process.
Perhaps the most important reason is that we don’t want to hardcode in ID values unless it’s an ad hoc request that’s never going to be used again. It’s better to use a subquery here because the list will dynamically change and refresh every time you run the query. We love fresh data!
Subqueries are great for quick 1-liner queries to use conditional logic in the WHERE clause!
I was thinking the only advantage of a subquery over a CTE is that a subquery can be correlated