In this article, we will explore how to use the SQL QUALIFY
clause with window functions to filter data more precisely. We’ll dive into practical examples, including filtering out duplicate data and retrieving the latest records in business scenarios. Let’s begin with a simple table example and its result, then move into real-world business cases.
If you are unfamiliar with basic SELECT clause, SQL SELECT – Let’s look up data
■ What is QUALIFY?
The QUALIFY clause in SQL is the same syntax for filtering as WHERE or HAVING. However, whereas WHERE filters on a row-by-row basis and HAVING filters on a group-by-group basis, QUALIFY is used to filter on the result of a window function. In other words, QUALIFY is very useful when you want to further condition the result of a window function.
The basic syntax is:
SELECT ColumnName FROM TableName QUALIFY WindowFunction = Condition;
■ QUALIFY and ROW_NUMBER() Example
First, let’s assume we have a mock table of customer orders.
order_id | customer_id | order_date | total_amount |
1 | 101 | 2023-08-01 | 200 |
2 | 102 | 2023-08-05 | 150 |
3 | 101 | 2023-08-10 | 180 |
4 | 103 | 2023-07-20 | 300 |
5 | 102 | 2023-08-15 | 250 |
6 | 101 | 2023-08-20 | 100 |
7 | 103 | 2023-08-25 | 400 |
We want to retrieve the most recent order for each customer. Using ROW_NUMBER()
and QUALIFY
, we can easily get the latest order.
Query: Retrieve the Latest Order for Each Customer
SELECT DISTINCT order_id, customer_id, order_date, total_amount FROM Customer_Orders QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) = 1;
Query Result
order_id | customer_id | order_date | total_amount |
6 | 101 | 2023-08-20 | 100 |
5 | 102 | 2023-08-15 | 250 |
7 | 103 | 2023-08-25 | 400 |
Explanation:
ROW_NUMBER()
assigns a unique number to each order based on thecustomer_id
and orders them byorder_date
in descending order.PARTITION BY customer_id
creates a group for each customer, andORDER BY order_date DESC
ensures that the most recent order is assigned a row number of 1.QUALIFY ROW_NUMBER() = 1
filters out only the most recent order for each customer.
If you want to read more about SELECT DISTINCT, SQL SELECT – Removing Duplicates and Excluding Specific Columns
■ QUALIFY and RANK() Example
Now, let’s assume we are managing advertising campaigns. We want to retrieve the most recent campaign for each customer. In this case, we can use RANK()
to achieve that.
campaign_id | customer_id | campaign_name | start_date | status |
1 | 101 | Summer Sale | 2023-06-01 | active |
2 | 102 | Winter Promo | 2023-07-15 | paused |
3 | 101 | Fall Launch | 2023-08-10 | active |
4 | 103 | New Year Sale | 2023-01-05 | inactive |
5 | 102 | Spring Offer | 2023-03-25 | active |
We can use RANK()
to retrieve the most recent campaign for each customer.
SELECT customer_id, campaign_id, campaign_name, start_date, status FROM Campaigns QUALIFY RANK() OVER (PARTITION BY customer_id ORDER BY start_date DESC) = 1;
customer_id | campaign_name | start_date | status |
101 | Fall Launch | 200 | active |
102 | Winter Promo | 150 | paused |
103 | New Year Sale | 300 | inactive |
Explanation:
RANK()
assigns ranks to each campaign based on thestart_date
within eachcustomer_id
group.PARTITION BY customer_id
divides the data into groups by customer, andORDER BY start_date DESC
orders them by the most recent start date.QUALIFY RANK() = 1
ensures that only the most recent campaign for each customer is returned.
■ Real Business Cases
1. Extracting the Latest Purchase Record for Each Customer
In consumer behavior analysis, extracting the latest purchase record for each customer is essential for re-targeting and marketing strategies. With ROW_NUMBER()
and QUALIFY
, you can easily fetch the latest purchase.
SELECT DISTINCT customer_id, purchase_id, purchase_date, total_amount FROM Purchases QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY purchase_date DESC) = 1;
Business Use Case: This query helps identify the most recent purchase for each customer, which can be used to create personalized marketing campaigns.
2. Filtering Duplicate Event Logs
When dealing with event logs, there may be multiple logs for the same event. You may want to keep only the most recent log for analysis. In this case, ROW_NUMBER()
and QUALIFY
can help filter out duplicates.
SELECT DISTINCT event_id, user_id, event_timestamp FROM Event_Logs WHERE event_type = 'login' QUALIFY ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_timestamp DESC) = 1;
Business Use Case: This query helps track the latest login information for each user, providing more accurate user activity data for analysis.
Conclusion
The SQL QUALIFY
clause is a powerful tool for filtering data based on window function results. Whether you’re working with ROW_NUMBER()
or RANK()
, QUALIFY
allows you to efficiently extract the most relevant data, such as the latest records or unique entries within a group. In real-world business scenarios like customer behavior analysis or event log processing, QUALIFY
can significantly improve data quality and analysis efficiency.
Now, apply the QUALIFY
clause to your complex datasets to extract only the information you need!