Skip to content

[SQL Interview] Differences Between CTE and Subquery

In a data analyst interview, various questions can be asked to assess data analysis and SQL query writing skills. This time, we’ll cover a question about the differences between CTEs (Common Table Expressions) and Subqueries.

Question: Explain the differences between CTEs and Subqueries, and provide an example of how these two features can be used in a real business scenario.

CTE (Common Table Expressions)

A CTE is a method for defining a temporary result set that you can reference within another query. It is primarily used to enhance the readability of a query and to simplify complex queries.

WITH cte_name AS (
    SELECT column1, column2
    FROM table_name
    WHERE condition
)
SELECT column1, column2
FROM cte_name
WHERE another_condition;

Subqueries

A subquery is a query nested within another query. It is commonly used in the WHERE, SELECT, or FROM clauses to filter or transform the main query’s data.

SELECT column1, column2
FROM table_name
WHERE column1 IN (
    SELECT column1
    FROM another_table
    WHERE condition
);

Differences Between CTEs and Subqueries

  • Readability: CTEs improve readability by breaking down complex queries into manageable parts. Subqueries are often nested within a query, which can reduce readability.
  • Reusability: CTEs can be reused multiple times within the same query, whereas subqueries are executed separately each time they are referenced.
  • Performance: The performance differences between CTEs and subqueries depend on the database system and the complexity of the query. Generally, CTEs are advantageous for recursive queries.

Note: A “recursive query” refers to a query that references itself to repeatedly process data. Recursive queries are useful for handling hierarchical or tree-structured data.

Example: Analyzing Product Sales

A company is analyzing monthly product sales data. To find the top-selling product and the total sales for each month, we can use both CTEs and subqueries to see their differences.

Example Using CTE

WITH MonthlySales AS (
    SELECT 
        product_name,
        MONTH(sale_date) AS sale_month,
        SUM(sales) AS total_sales
    FROM 
        dataset.sales
    GROUP BY 
        product_name, 
        MONTH(sale_date)
), TopSellingProduct AS (
    SELECT 
        sale_month,
        product_name,
        total_sales,
        ROW_NUMBER() OVER (PARTITION BY sale_month ORDER BY total_sales DESC) AS row_num
    FROM 
        MonthlySales
)
SELECT 
    sale_month,
    product_name,
    total_sales
FROM 
    TopSellingProduct
WHERE 
    row_num = 1;

In this CTE example, the MonthlySales CTE calculates the total sales for each month, and the TopSellingProduct CTE finds the top-selling product for each month. Using CTEs improves the query’s readability by separating it into distinct parts, making it easier to understand and debug.

Example Using Subquery

SELECT 
    sale_month,
    product_name,
    total_sales
FROM (
    SELECT 
        MONTH(sale_date) AS sale_month,
        product_name,
        SUM(sales) AS total_sales,
        ROW_NUMBER() OVER (PARTITION BY MONTH(sale_date) ORDER BY SUM(sales) DESC) AS row_num
    FROM 
        dataset.sales
    GROUP BY 
        product_name, 
        MONTH(sale_date)
) AS Subquery
WHERE 
    row_num = 1;

In this subquery example, the same task is accomplished using a nested query. While subqueries can be less readable due to nesting, they can be more efficient in certain situations.

Comparing Results

sale_monthproduct_nametotal_sales
1Product A1500
2Product B1800
3Product C1700

When Subqueries Are More Efficient

For example, if you want to calculate the average sales of a specific product and find other products with higher sales, subqueries can be more straightforward.

SELECT 
    product_name,
    sales
FROM 
    dataset.sales
WHERE 
    sales > (
        SELECT 
            AVG(sales)
        FROM 
            dataset.sales
        WHERE 
            product_name = 'Product A'
    );

In the above example, the subquery calculates the average sales for ‘Product A’ and returns a single value. This value is then used in the main query’s WHERE clause to filter the dataset. Subqueries are useful for straightforward conditions and filtering that do not require breaking down the query into multiple steps.

How to Approach This in an Interview

  1. Confirm Understanding: Clearly explain the basic definitions and differences between CTEs and subqueries.
  2. Explain Syntax: Describe the syntax for each feature and how they are used in WHERE, SELECT, and FROM clauses.
  3. Apply to Real-World Scenarios: Provide examples of business problems solved using CTEs and subqueries, highlighting the insights gained.
  4. Provide Code Examples: Write and explain SQL queries, discussing expected results.
  5. Prepare for Follow-Up Questions: Be ready to explain why CTEs and subqueries are important, along with their advantages and disadvantages.

Leave a Reply

Your email address will not be published. Required fields are marked *