Skip to content

SQL QUALIFY – Filtering Data More Precisely with Window Functions

In this article, we will explore how to use the SQL QUALIFY clause with window functions to filter data more precisely. We’ll dive into practical examples, including filtering out duplicate data and retrieving the latest records in business scenarios. Let’s begin with a simple table example and its result, then move into real-world business cases.

If you are unfamiliar with basic SELECT clause, SQL SELECT – Let’s look up data

■ What is QUALIFY?

The QUALIFY clause in SQL is the same syntax for filtering as WHERE or HAVING. However, whereas WHERE filters on a row-by-row basis and HAVING filters on a group-by-group basis, QUALIFY is used to filter on the result of a window function. In other words, QUALIFY is very useful when you want to further condition the result of a window function.

The basic syntax is:

SELECT ColumnName FROM TableName
QUALIFY WindowFunction = Condition;

■ QUALIFY and ROW_NUMBER() Example

First, let’s assume we have a mock table of customer orders.

order_idcustomer_idorder_datetotal_amount
11012023-08-01200
21022023-08-05150
31012023-08-10180
41032023-07-20300
51022023-08-15250
61012023-08-20100
71032023-08-25400

We want to retrieve the most recent order for each customer. Using ROW_NUMBER() and QUALIFY, we can easily get the latest order.

Query: Retrieve the Latest Order for Each Customer

SELECT DISTINCT order_id, customer_id, order_date, total_amount
FROM Customer_Orders
QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY order_date DESC) = 1;

Query Result

order_idcustomer_idorder_datetotal_amount
61012023-08-20100
51022023-08-15250
71032023-08-25400

Explanation:

  • ROW_NUMBER() assigns a unique number to each order based on the customer_id and orders them by order_date in descending order.
  • PARTITION BY customer_id creates a group for each customer, and ORDER BY order_date DESC ensures that the most recent order is assigned a row number of 1.
  • QUALIFY ROW_NUMBER() = 1 filters out only the most recent order for each customer.

If you want to read more about SELECT DISTINCT, SQL SELECT – Removing Duplicates and Excluding Specific Columns

■ QUALIFY and RANK() Example

Now, let’s assume we are managing advertising campaigns. We want to retrieve the most recent campaign for each customer. In this case, we can use RANK() to achieve that.

campaign_idcustomer_idcampaign_namestart_datestatus
1101Summer Sale2023-06-01active
2102Winter Promo2023-07-15paused
3101Fall Launch2023-08-10active
4103New Year Sale2023-01-05inactive
5102Spring Offer2023-03-25active

We can use RANK() to retrieve the most recent campaign for each customer.

SELECT customer_id, campaign_id, campaign_name, start_date, status
FROM Campaigns
QUALIFY RANK() OVER (PARTITION BY customer_id ORDER BY start_date DESC) = 1;
customer_idcampaign_namestart_datestatus
101Fall Launch200active
102Winter Promo150paused
103New Year Sale300inactive

Explanation:

  • RANK() assigns ranks to each campaign based on the start_date within each customer_id group.
  • PARTITION BY customer_id divides the data into groups by customer, and ORDER BY start_date DESC orders them by the most recent start date.
  • QUALIFY RANK() = 1 ensures that only the most recent campaign for each customer is returned.

■ Real Business Cases

1. Extracting the Latest Purchase Record for Each Customer

In consumer behavior analysis, extracting the latest purchase record for each customer is essential for re-targeting and marketing strategies. With ROW_NUMBER() and QUALIFY, you can easily fetch the latest purchase.

SELECT DISTINCT customer_id, purchase_id, purchase_date, total_amount
FROM Purchases
QUALIFY ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY purchase_date DESC) = 1;

Business Use Case: This query helps identify the most recent purchase for each customer, which can be used to create personalized marketing campaigns.

2. Filtering Duplicate Event Logs

When dealing with event logs, there may be multiple logs for the same event. You may want to keep only the most recent log for analysis. In this case, ROW_NUMBER() and QUALIFY can help filter out duplicates.

SELECT DISTINCT event_id, user_id, event_timestamp
FROM Event_Logs
WHERE event_type = 'login'
QUALIFY ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY event_timestamp DESC) = 1;

Business Use Case: This query helps track the latest login information for each user, providing more accurate user activity data for analysis.

Conclusion

The SQL QUALIFY clause is a powerful tool for filtering data based on window function results. Whether you’re working with ROW_NUMBER() or RANK(), QUALIFY allows you to efficiently extract the most relevant data, such as the latest records or unique entries within a group. In real-world business scenarios like customer behavior analysis or event log processing, QUALIFY can significantly improve data quality and analysis efficiency.

Now, apply the QUALIFY clause to your complex datasets to extract only the information you need!

Tags:

Leave a Reply

Your email address will not be published. Required fields are marked *