Big SQL Energy Beginner Course: Lesson 3 Primary vs. Foreign Keys & Distinct

Jan 03, 2025

Got BDE?⚡️

Hi BDEs! Welcome to Lesson 3 of the free Big SQL Energy Beginner Course⚡️ Today we’re learning the difference between Primary & Foreign keys and how they produce different results with distinct!

Return to Big SQL Energy Beginner homepage here to access the other lessons:

Return to Homepage 📚

If you’re new here:

💁🏽‍♀️ Who Am I?

I’m Jess Ramos, the founder of Big Data Energy and the creator of the BEST SQL course and community: Big SQL Energy⚡️. Check me out on socials: 🔗LinkedIn, 🔗Instagram, 🔗YouTube, and 🔗TikTok. And of course subscribe to my 🔗newsletter here for all my upcoming lessons and updates— all for free!

⚡️ What is Big SQL Energy?

🔗Big SQL Energy Intermediate is an intermediate SQL course designed to solve real-world business problems hands-on using realistic data and a modern tech stack. You’ll walk away with 2 SQL portfolio projects, portfolio building guidance, access to the Big Data Energy Discord community, and lots of confidence for you upcoming coding interviews! You also get access to all monthly events including masterclasses, office hours, and guest speakers. This course is all of the most important things I’ve learned on the job as a Senior Data Analyst in tech who grew from a $72K salary to over a $150K salary. And now I’m sharing it with YOU! It’s more than just another course— it’s a challenging program designed to upskill your SQL to the intermediate level, get ready to ace your live coding interviews, and build connections in the data community through the Discord and events.

JOIN now (pun intended 😉)

Lesson 3: Primary vs. Foreign Keys w/ Distinct

🎥 Video

Something that can be a huge pain point in data analytics is duplicate data. Duplicate data can happen due to a variety of reasons such as logging errors or data transformations, but many times duplicate values are there for a reason. And very often we need to understand the distinct values in a dataset without any dupes or repeats. We can do this by using distinct in SQL. Distinct goes before a column name, and it will dedupe the column and only return the distinct values. So for example, if you have a list of customer orders and use distinct on customers, it’ll only return each customer once even if some show up in the list of orders multiple times. This can be really useful when it comes to counting in SQL because we don’t want to count repeats, but distinct can be taxing to run, so only use it when necessary, and use it with caution on large datasets.

select
   distinct column
from
   table

What do you notice about using distinct on the order_id and customer_id columns?

select
   distinct order_id
from
   table
;

select
   distinct customer_id
from
   table
;

Did you notice that distinct on the order_id column returns the same number of values as the number of rows in the whole dataset which suggests that there were no duplicates, meanwhile distinct on the customer_id column returns less values which suggests that were duplicates removed? This is because the order_id column is a primary key and the customer_id column is a foreign key.

A primary key is a unique, non-null value to represent a row in a dataset. That means that it is a unique identifier for a single row in the dataset, and by definition, it cannot have any duplicates. It’s an actual restriction put on that column when the table is created in the database (if it’s set up correctly…). An example of a primary key is like a customer_id for a customers table. Each customer_id represents a customer on each row of the table. Each customer has exactly 1 ID and each ID belongs to only 1 customer.

A foreign key is a column in one table that maps to a primary key in another table. It’s a way to link the tables together (which we’ll learn about more with joins at the end of this course). A foreign key can have multiple duplicates or repeats in the column since it doesn’t have to be unique. An example of a foreign key is like a customer_id for an orders table. Each customer_id in the orders table maps a customer in the customers table. Each customer can have multiple orders, so they may show up multiple times in the customer_id column in the orders table. Since each customer can have multiple orders, this is called a one to many (1:M) relationship between customers and orders. These relationships between tables and entities will be really important when combining and transforming different datasets— which we’ll do at the end of the course with joins!

If you’ve enjoyed the free BEGINNER Big SQL Energy course, grab my intermediate course. There are no deadlines or timelines for this course— you start and take as long as you want. It will take your beginner skills to the next level and get you ready to ace your interviews.

JOIN now (pun intended 😉)

DM me or email me at courses@bdeanalytics.com if you have questions on the intermediate course!