Big SQL Energy Beginner Course: Lesson 23 Having (vs. Group By)
Got BDE?⚡️
Hi BDEs! Welcome to Lesson 23 of the free Big SQL Energy Beginner Course⚡️ Today we’re learning about Having vs. Group By w/ Aggregate Functions in SQL!
Return to Big SQL Energy Beginner homepage here to access the other lessons:
If you’re new here:
💁🏽♀️ Who Am I?
I’m Jess Ramos, the founder of Big Data Energy and the creator of the BEST SQL course and community: Big SQL Energy⚡️. Check me out on socials: 🔗LinkedIn, 🔗Instagram, 🔗YouTube, and 🔗TikTok. And of course subscribe to my 🔗newsletter here for all my upcoming lessons and updates— all for free!
⚡️ What is Big SQL Energy?
🔗Big SQL Energy Intermediate is an intermediate SQL course designed to solve real-world business problems hands-on using realistic data and a modern tech stack. You’ll walk away with 2 SQL portfolio projects, portfolio building guidance, access to the Big Data Energy Discord community, and lots of confidence for you upcoming coding interviews! You also get access to all monthly events including masterclasses, office hours, and guest speakers. This course is all of the most important things I’ve learned on the job as a Senior Data Analyst in tech who grew from a $72K salary to over a $150K salary. And now I’m sharing it with YOU! It’s more than just another course— it’s a challenging program designed to upskill your SQL to the intermediate level, get ready to ace your live coding interviews, and build connections in the data community through the Discord and events.
Lesson 23: Having vs. Group By
We already learned in the last lesson all about group by and how it empowers us to segment our data by specified column values to perform calculations isolated within those segments. Well, Having and Group By go hand-in-hand like peanut butter and jelly which is why one of the MOST COMMON interview questions is:
What’s the difference between group by and having??
This is the #1 way to fail your interview and look like a complete NOOB… But that won’t be you because you’re going to keep reading on and answer this correctly in your interviews…
Having is another SQL clause that goes after Where and Group By and is used to filter groups created by Group By based on aggregated conditions. This means that Having is performed AFTER Group By because it uses those the aggregated conditions for those groupings to filter the data. I know this probably sounds really fancy and hard, but I promise it’s not!
The syntax and setup of Having is JUST LIKE Where. The only difference is that you cannot put aggregate functions/conditions in Where or you’ll get an error. That’s why we have to use Having for aggregate conditions. Write this down or get it tattooed on your body or something:
WHERE FILTERS ROWS, HAVING FILTERS GROUPS!
Let’s start with a basic group by below and sum the total quantity of items ordered for each customer:
select
customer_id,
sum(quantity) as num_items
from
orders
group by
customer_id
All the orders for each customer ID have now been aggregated or summarized into 1 row. So each customer ID now represents a group of orders— all of the ones for that customer. What if we want to filter our output of customers for only those with more than 75 items ordered? We can put that condition in the Having clause like below:
select
customer_id,
sum(quantity) as num_items
from
orders
group by
customer_id
having
num_items > 75 -- using the alias
-- sum(quantity) > 75 -- another method not using alias
Note: You can use EITHER option under Having. I prefer the top method using the alias, but if your SQL environment does not support that, feel free to use the more traditional method (2nd option) I commented out under group by.
Having in this query is filtering the customer IDs based on the aggregate condition of sum(quantity). Try to put the condition in the Where clause instead, and you’ll get an error!
We can put multiple conditions in the Having clause and combine them with Boolean operators (AND & OR) like below:
select
customer_id,
sum(price * quantity) as rev,
count(*) as num_orders
from
orders
group by
customer_id
having
rev >= 10000
and num_orders <= 50
;
This query filters customers that have at least $10,000 in revenue & at least 50 orders. Total revenue and number of orders have to go in Having since they’re both aggregates that were calculated for each customer. They each represent a bunch of rows and values that were summarized and aggregated into 1 value for each customer, so they cannot go in the Where clause!
Here are some key differences between Where, Group By, and Having. They’re also listed in the order SQL processes them:
Where: Filters rows based on conditions
Group By: Creates groups of data by aggregating and summarizing
Having: Filters groups based on aggregate conditions
Alright now that we have Having out of the way, we’re kicking it up a notch in the next lesson! If you’ve enjoyed the free BEGINNER Big SQL Energy course, grab my intermediate course. There are no deadlines or timelines for this course— you start and take as long as you want. It will take your beginner skills to the next level and get you ready to ace your interviews. DM me or email me at courses@bdeanalytics.com if you have questions on the intermediate course!
Return to Big SQL Energy Beginner homepage here to access the other lessons: