Got BDE?⚡️
Hi BDEs! ⚡️
Hope you’re having an amazing week! I’m currently feeling super inspired making lots of content right now since I have the time with it being my full time job now 😉 I’m also prepping for a KEYNOTE for a conference at the end of May! EEEEEP 😳
Please support my amazing newsletter sponsor to support my free content ⬇️
Quadratic AI🔗 is the COOLEST spreadsheet ever because it integrates with many data sources including databases & APIs, and you can even vibe code your own Python visualizations using the built-in AI. The native AI also writes and debugs your SQL and Python code— it’s actually INSANE!
Quadratic AI empowers all business professionals (not just data people) to perform their own analyses and uncover insights which ultimately helps companies be more data-driven and frees up analytics teams from all the ad hoc requests.
Tutorial below ⬇️
⚡️If you’re new here:
💁🏽♀️ Who Am I?
I’m Jess Ramos, the founder of Big Data Energy and the creator of the BEST SQL course and community: Big SQL Energy⚡️. Check me out on socials: 🔗LinkedIn, 🔗Instagram, 🔗YouTube, and 🔗TikTok. And of course subscribe to my 🔗newsletter here for all my upcoming lessons and updates— all for free!
⚡️ Social Highlights:
CTEs in SQL
CTEs are one of the most important skills that will allow you to take your SQL skills to the next level. They’re so essential to coding SQL, it’s really hard to advance from a beginner in SQL to intermediate without them.
I’ll fully admit— I used CTEs for years at my first startup job without knowing their name. I remember going to data analytics interviews and being asked if I knew what CTEs were. Obviously, I was confused because I had never heard the formal name, and I likely missed many opportunities because of that slight lack of information. So it’s really important to not only know how to do something, but also the name of it and why you need it. This info will help you out when you’re interviewing.
I HIGHLY recommend my hands-on CTE Chapter in my intermediate course, Big SQL Energy🔗 for a deep dive with a LOT of practice on real business problems.
So what’s a CTE?
A CTE is a common table expression (quick! write that down so you don’t end up like me lol). In simple terms, a CTE is a temporary result set defined within the scope of the query. This means that you can refer to it whenever you want and however many times you want only within the scope of your current query. You can think of a CTE as a table you design with a query— and then you give it a name and tell SQL: “hold onto this and remember it for later!” Then you can use it just like you would a regular table. Here’s a very simple example of the syntax:
with cte_name as(
select *
from table1
)
select *
from cte_name
Benefits of CTEs:
I’ve already hinted at one of the biggest benefits of CTEs: they’re reusable, so they’re really handy to use if you have to refer to an output multiple times in the same query. They’re only usable within that query though— so if you don’t rerun the CTE, you won’t be able to use it.
One of the other major benefits of using CTEs is that they offer major readability benefits. A really important part of coding is making it useful long term, especially if someone else takes over or needs to edit your work. CTEs make it much more easier to understand code (especially compared to subqueries which I also go over in my intermediate course🔗) because you define output and sections one part at a time. This gives the reader the advantage of slowly digesting and interpreting your code one step at a time.
Another great benefit of CTEs is that they can be recursive. Although I haven’t had many use cases for this in the real world, it is a more advanced programming concept and is worth knowing and learning. This is where CTEs can stand out over subqueries since subqueries don’t have this capability.
CTEs vs. Subqueries:
I keep mentioning subqueries because it’s pretty impossible to not talk about CTEs and subqueries together— they’re kind of like peanut butter and jelly. CTEs are kind of the inside out version of a subquery. And you guessed it, there are some instances where you may prefer to use a subquery— but there’s not always a black and white answer. Although I’ll unpack subqueries in further detail in a future newsletter, it’s worth mentioning that almost all CTEs can also be written as a subquery (with recursive CTEs as an exception) and vice versa.
Why do you need to use CTEs?
Sometimes you may choose to use a CTE for better readability even if it offers no real advantages for syntax or optimization. I do this all the time if I’m writing code that will end up in a dashboard or be passed onto a teammate. Using CTEs and making your code readable is a nice thing to do to your coworkers. Please never send me multiple nested subqueries!
However, there are other times when you HAVE to use a CTE because you can’t do something all in one select statement. A really common beginner example is nested aggregate functions. In SQL, you can’t nest aggregate functions, so if you want to take the average of a bunch of sums, and you write avg(sum(column)), you’ll get an error yelling at you about nested aggregate functions. This is where CTEs come into play. You first have to define the summed column in a CTE, and then you can call on the result in the second query like this:
with order_counts as(
select
user_id,
sum(ordered) as num_orders
from
table1
group by 1
)
select avg(num_orders) as avg_per_user
from order_counts
Other common times you need to use a CTE is when you need to create other datasets with mismatching granularities. For example, maybe you have a dataset of orders and the day they were placed on. You may want to also include the monthly total orders for the company in that same row by the order date. You could create a CTE that counts the number of orders per month, which aggregates the dataset on month by summing the number of orders per month. Then in the following select statement, you can join the monthly totals CTE to the original dataset with the day the order was placed. This allows you to have the day the order was placed along with that monthly total side-by-side in the same row.
with monthly as(
select
date_trunc(‘month’, order_date) as month
count (*) as monthly_orders
from orders
group by 1
)
select
o.order_id,
order_date,
monthly_orders
from
orders o
left join
monthly m
on date_trunc(‘month’, o.order_date) = m.month
;
Try to do this without a CTE (or subquery), and I guarantee you you’ll have a hard time. Depending on the complexity of the query, number of datasources, and length, queries can have any number of CTEs. In my everyday job, I frequently write queries with 3-5+ CTEs. Sometimes they’re mandatory, and other times I opt in to use them optionally to make my code more readable and be nice to my coworkers.
CTEs are one of the most useful concepts to know if SQL, so make sure you master them! If you want extra practice on CTEs, I HIGHLY recommend my hands-on CTE chapter in my course: Big SQL Energy🔗.
Helpful knowledge on CTE's