Skip to content

How to use Aggregate Awareness in Looker

Here’s a blog post on leveraging aggregate awareness in Looker using web traffic data tables as an example. This post explains how to improve query performance and optimize resource usage through the effective use of aggregate tables in Looker.

What is Aggregate Awareness?

In Looker, ‘Aggregate Awareness’ refers to efficiently utilizing pre-aggregated data tables stored in databases to analyze user queries and automatically select the most appropriate aggregate table. This feature boosts query performance and optimizes resource use. For those familiar with PDTs (Persistent Derived Tables), aggregate awareness functions similarly. If you’re new to PDTs, refer to my previous post on creating, configuring, and troubleshooting PDTs in Looker.

Benefits and Reasons to Use Aggregate Awareness

  • Performance Enhancement: Aggregate tables process less data than original datasets, thus improving query speed. For instance, using a traffic table that includes daily counts of visitors, page views, and clicks allows quicker querying by accessing only the aggregated data for each day, rather than the entire table.
  • Resource Optimization: Utilizing pre-calculated aggregates reduces the load on database servers and makes resource use more efficient, which can also lead to reductions in cloud costs.
  • Flexibility: Generating tables at various aggregation levels supports easy navigation and analysis of data at different detail levels. For example, executives might focus on monthly trends, while content managers might be more interested in daily or hourly data.

Defining Aggregate Awareness in Looker

Here’s how you can define an aggregate table within your model:

Include Connection Information and Required View Files:

connection: "bigquery_project"
include: "/**/*.view.lkml"                 # Include all views in this project

Define Explore and Joins for Analyzing Daily Traffic Data:

explore: daily_traffic {
    label: "Daily Traffic Analysis"

    join: site_traffic {
      type: full_outer
      relationship: one_to_one
      sql_on: ${daily_traffic.site_id} = ${site_traffic.site_id}
              AND ${daily_traffic.date_id} = ${site_traffic.date_id};;
    }

    join: sites_info {
      relationship: many_to_one
      sql_on: ${daily_traffic.site_id} = ${sites_info.site_id};;
    }
}

Set Up the Aggregate Table:

aggregate_table: daily_site_traffic_summary {
    query: {
      dimensions: [sites_info.site_name, sites_info.site_id, daily_traffic.date]
      measures: [site_traffic.daily_visitors, site_traffic.page_views, site_traffic.clicks]
      timezone: Europe/Berlin
    }
    materialization: {
      sql_trigger_value: SELECT CURDATE();;  # Update the aggregate table daily with new data
    }
}

Points to Consider

  • Time Zone Handling: If an aggregate table query includes time-based fields, the timezone must be explicitly set. Otherwise, Looker defaults to the database’s timezone (e.g., UTC). If the timezone is not specified, such as for users in Berlin, Looker will not utilize the table for those in different time zones without corresponding aggregate tables.
  • Limitations on Aggregatable Measures: Measures like distinct_count, median, and percentile do not support direct aggregation and thus cannot be used in aggregate tables.

    Also, complex joins, like one-to-many or many-to-many, can complicate the use of aggregate tables. For instance, if daily_site_traffic is joined in a many-to-one relationship with sites_with_traffic, unique handling of visitor counts might necessitate a one-to-one PDT to resolve potential issues.
  • Performance Considerations: Creating separate aggregate tables for different periods (monthly vs. daily) can impact performance due to data volume differences.

    In my tests with monthly and daily tables, although there was a data volume difference (approximately 100 KB monthly vs. 5 MB daily), the actual speed improvement in query performance was not significant, leading to a decision against creating additional aggregate tables for each period.

I hope this post helps you understand how to efficiently use aggregate tables in Looker to improve your data management practices. Also, if you want to know more about, please refer this Google doc: cloud.google.com/looker/docs/aggregate_awareness