Got BDE?⚡️
Hi BDEs! Welcome to Lesson 19 of the free Big SQL Energy Beginner Course⚡️ Today we’re learning about basic date functions in SQL!
Return to Big SQL Energy Beginner homepage here to access the other lessons:
If you’re new here:
💁🏽♀️ Who Am I?
I’m Jess Ramos, the founder of Big Data Energy and the creator of the BEST SQL course and community: Big SQL Energy⚡️. Check me out on socials: 🔗LinkedIn, 🔗Instagram, 🔗YouTube, and 🔗TikTok. And of course subscribe to my 🔗newsletter here for all my upcoming lessons and updates— all for free!
⚡️ What is Big SQL Energy?
🔗Big SQL Energy Intermediate is an intermediate SQL course designed to solve real-world business problems hands-on using realistic data and a modern tech stack. You’ll walk away with 2 SQL portfolio projects, portfolio building guidance, access to the Big Data Energy Discord community, and lots of confidence for you upcoming coding interviews! You also get access to all monthly events including masterclasses, office hours, and guest speakers. This course is all of the most important things I’ve learned on the job as a Senior Data Analyst in tech who grew from a $72K salary to over a $150K salary. And now I’m sharing it with YOU! It’s more than just another course— it’s a challenging program designed to upskill your SQL to the intermediate level, get ready to ace your live coding interviews, and build connections in the data community through the Discord and events.
Lesson 19: DateDiff()
DateDiff() is a really useful date function to determine the amount of time between 2 dates. Why would we want to know that as data analysts? Well, sometimes we need to know how much time has passed between 2 events or how long something is taking. Imagine if we have a bunch of orders that are taking weeks to ship out. That would likely impact customer satisfaction and be an area that we’d want to focus on and improve for the business.
Or looking at a more granular level, what if we have a bunch of statuses orders/customers go through in a process, and we have a log that tracks when they get to each step of the process. We can use those timestamps to determine how long a customer is in each step of the process and see if they’re getting stuck or experiencing friction in the product somewhere along the way. We go through this EXACT business problem as a final project in my 🔗intermediate course that you can put in your data analytics portfolio to show off to hiring managers! I highly recommend this project because it helped me land 2 Senior Data Analyst jobs making over $140K/year.
So understanding time differences can support many types of business questions in all areas of the business— including customer satisfaction and product development!
Here’s the basic syntax of DateDiff():
DATEDIFF(unit, start_date, end_date)
You need 3 parts (also called arguments) for the function: a unit, a start date, and an end date. The unit can be in days, months, years, or even time units like minute, hour, or second if you have a datetime or timestamp value to work with.
The syntax can vary a little bit across platforms and SQL dialects (such as needing an underscore for date_diff() or needing single quotes around the unit level), so definitely Google and ChatGPT if you have any syntax issues.
We can use datediff() to produce a new column that tells us how long (in days) it takes for each order to ship:
SELECT
order_id,
customer_id,
order_date,
shipping_date,
DATEDIFF('day', order_date, shipping_date) AS days_to_ship
FROM
orders
;
And we can also calculate how many days ago each order was placed by using currentdate — which we learned in a previous lesson!
SELECT
order_id,
customer_id,
order_date,
shipping_date,
DATEDIFF('day', order_date, current_date) AS days_ago_order_placed
FROM
orders
;
We can also change the unit level to get different granularities of time differences, but it’s really important to consider which level of granularity will be most useful for stakeholders given the business context:
SELECT
order_id,
customer_id,
order_date,
shipping_date,
DATEDIFF('day', order_date, shipping_date) AS days_to_ship,
DATEDIFF('month', order_date, shipping_date) AS months_to_ship,
DATEDIFF('year', order_date, shipping_date) AS years_to_ship,
DATEDIFF('hour', order_date, shipping_date) AS hours_to_ship,
DATEDIFF('minute', order_date, shipping_date) AS minutes_to_ship,
DATEDIFF('second', order_date, shipping_date) AS seconds_to_ship
FROM
orders
;
Note that if the date(s) input into the function are true dates without timestamps, using a time granularity like minute, hour, and second will not be 100% accurate because it will use 00:00:00 as the default timestamp on your date values. We can’t produce accurate calculations without having the data to support them…
And if you ever accidentally get a negative datediff() number, and it doesn’t make sense for your business context, it means you likely accidentally switched your start date and end date in your function, so switching them will likely resolve the issue!
We can also use datediff() to filter datasets in the WHERE clause as shown below:
SELECT
order_id,
customer_id,
order_date,
shipping_date,
DATEDIFF('day', order_date, shipping_date) AS days_to_ship
FROM
orders
where
DATEDIFF('day', order_date, shipping_date) > 1
-- pulls orders that took over a day to ship
;
BONUS:
And it’s even more useful to put this calculation into a CTE and perform additional descriptive statistics on the metric in a separate select statement below. But if you’re not familiar with CTEs yet, that’s okay— we cover them in-depth in my 🔗intermediate course. They’re definitely one of the most-used intermediate SQL skills and worth learning for sure, so don’t skip them! Here’s an example of using CTEs to define the datediff() calculation and then perform descriptive statistics on that column:
with shipping_time as(
SELECT
order_id,
customer_id,
order_date,
shipping_date,
DATEDIFF('day', order_date, shipping_date) AS days_to_ship
FROM
orders
)
select
avg(days_to_ship) as avg_days,
min(days_to_ship) as min_days,
max(days_to_ship) as max_days,
median(days_to_ship) as median_days,
stddev(days_to_ship) as stddev_days
from
shipping_time
;
If you’ve enjoyed the free BEGINNER Big SQL Energy course, grab my intermediate course. There are no deadlines or timelines for this course— you start and take as long as you want. It will take your beginner skills to the next level and get you ready to ace your interviews. DM me or email me at courses@bdeanalytics.com if you have questions on the intermediate course!
Return to Big SQL Energy Beginner homepage here to access the other lessons:
Real business use case examples. That definitely will help those of us who are trying to break into the industry.0