Big SQL Energy Beginner Course: Lesson 25 CASE & Categorizing Data
Got BDE?⚡️
Hi BDEs! Welcome to Lesson 25 of the free Big SQL Energy Beginner Course⚡️ Today we’re learning about Case & Categorizing Data in SQL!
Return to Big SQL Energy Beginner homepage here to access the other lessons:
If you’re new here:
💁🏽♀️ Who Am I?
I’m Jess Ramos, the founder of Big Data Energy and the creator of the BEST SQL course and community: Big SQL Energy⚡️. Check me out on socials: 🔗LinkedIn, 🔗Instagram, 🔗YouTube, and 🔗TikTok. And of course subscribe to my 🔗newsletter here for all my upcoming lessons and updates— all for free!
⚡️ What is Big SQL Energy?
🔗Big SQL Energy Intermediate is an intermediate SQL course designed to solve real-world business problems hands-on using realistic data and a modern tech stack. You’ll walk away with 2 SQL portfolio projects, portfolio building guidance, access to the Big Data Energy Discord community, and lots of confidence for you upcoming coding interviews! You also get access to all monthly events including masterclasses, office hours, and guest speakers. This course is all of the most important things I’ve learned on the job as a Senior Data Analyst in tech who grew from a $72K salary to over a $150K salary. And now I’m sharing it with YOU! It’s more than just another course— it’s a challenging program designed to upskill your SQL to the intermediate level, get ready to ace your live coding interviews, and build connections in the data community through the Discord and events.
Lesson 25: Case & Categorizing Data
Case is a serious SUPER POWER in SQL because it can help you categorize data, replace nulls, create binary columns, and even PIVOT data in SQL. Yes, like making a pivot table summarize your data! We don’t have time to learn alllll of this in the beginner course, but head on over to the 🔗intermediate course if you want to learn more on CASE and become an expert 😉 CASE is one the biggest ROI SQL skills you can learn, so my intermediate course has a WHOLE chapter on them.
Case works like a fancy if then statement where you can define multiple conditions and it will output the result for the first condition met. Here’s the basic syntax:
case
when condition1 then output1
when condition2 then output2
...
when conditionN then outputN
end as column_name
Because it steps through the conditions in order, ORDER MATTERS if your logic is dependent on the previous conditions. That means if you’re relying on waterfall logic and depending on and building off of the previous conditions above, you have to make sure your logic is super tight and there are no holes or unaccounted for edge cases. It’s really easy to accidentally forget an equals sign or mess up your numbers and produce incorrect results. See the query below where we rely on waterfall logic to classify revenue.
select
order_id,
price * quantity as rev,
case
when rev < 100 then 'small'
when rev < 500 then 'med'
when rev < 1000 then 'large'
else 'extra large'
end as rev_amount
from
orders
;
If the first condition isn’t true, we go to the second. If the second condition isn’t true, we go to the third. And so on until a condition is true! For example, the second condition (classifying med) relies on the logic that the number is less than 500, and assumes it’s also greater than or equal to 100 since the first condition was NOT met. And for the 3rd condition (classifying large) relies on the logic that the number is less than 1000, and assumes it’s also greater than or equal to 500 since the first and second conditions were NOT met. This is what I mean by waterfall logic!
To emphasize my point further, see what happens when you switch the order of the conditions and put large on top. Does this change your output? It does, because it’s no longer relying on the small and medium conditions since they’re further down in the logic and CASE returns the FIRST met condition output. This produces INCORRECT results below and accidentally misclassifies all the small and medium values as LARGE:
select
order_id,
price * quantity as rev,
case
when rev < 1000 then 'large'
when rev < 100 then 'small'
when rev < 500 then 'med'
else 'extra large'
end as rev_amount
from
orders
;
This is why order matters when you use dependent waterfall logic and why you have to be so careful when defining logic in CASE. If you want to write a query that does not rely on waterfall logic (and order doesn’t matter), you can clearly define the upper and lower endpoints of your value ranges. Just be careful with your equal signs here and don’t leave any holes in your logic. See the video for an example of this!
You can also use CASE to develop binary columns and classify based on lists of values— we do both of these in the video for this lesson, so check it out. And of course, the 🔗intermediate course goes over the more advanced use cases of CASE like using it to pivot data and wrangle your datasets 🤠 See you in the next lesson!
If you’ve enjoyed the free BEGINNER Big SQL Energy course, grab my intermediate course. There are no deadlines or timelines for this course— you start and take as long as you want. It will take your beginner skills to the next level and get you ready to ace your interviews. DM me or email me at courses@bdeanalytics.com if you have questions on the intermediate course!
Return to Big SQL Energy Beginner homepage here to access the other lessons: