A Comprehensive Guide to Rebuilding Persistent Derived Tables (PDTs) in Looker

Exploring Persistent Derived Table (PDT) Rebuilding in Looker

In this article, we delve into the methods for rebuilding Persistent Derived Tables (PDTs) in Looker. We’ll cover various configuration options, including datagroups, max_cache_age, sql_trigger, and interval_trigger, and share real-world business scenarios to help you optimize your data workflows. For an introduction to PDT basics, refer to this guide.

Configuring Datagroups in Your Looker Model

Looker allows you to manage PDT rebuilding schedules using datagroups. By setting datagroups, you can efficiently update PDTs to include new data. Datagroups are typically triggered based on data states or specific events, enabling tailored updates for each PDT.

To configure a datagroup in your model file, insert the definition before your explore statements. For example, the following configuration triggers PDT rebuilding when the _updated_at field in the core.fct_cpa_commissions table changes:

Understanding Max Cache Age, SQL Trigger, and Interval Trigger

max_cache_age:
Specifies the maximum validity period for a PDT before rebuilding. For example, max_cache_age: "24 hours" ensures a PDT remains valid for up to 24 hours after creation. This is ideal for datasets with infrequent updates.
sql_trigger:
Triggers PDT rebuilding based on a specific SQL query result, such as monitoring the last update time in a table. For instance:yaml코드 복사sql_trigger: SELECT MAX(transacted_at) FROM revenue; This method is suitable for scenarios requiring real-time updates but may encounter issues with timezone discrepancies.
interval_trigger:
Executes rebuilding at fixed intervals, such as interval_trigger: "20 minutes". This is useful for datasets with regular update cycles.

Note: SQL triggers and interval triggers cannot be used simultaneously. Validation may not flag this conflict, but it can disrupt trigger functionality.

Key Differences: Max Cache Age vs. Interval Trigger

max_cache_age rebuilds PDTs based on the time elapsed since creation, regardless of data changes.
interval_trigger rebuilds PDTs at a predefined interval, independent of whether they are queried or not.

For example, max_cache_age: "24 hours" and interval_trigger: "1 day" might seem similar, but the former rebuilds on demand, while the latter rebuilds at fixed intervals.

Configuring Datagroup Triggers for Multiple PDTs

To manage multiple PDTs efficiently, assign a shared datagroup_trigger for related PDTs. This setup ensures synchronized updates when a single datagroup’s condition is met.

Example configuration within a view file:

Business Case: Solving Issues with SQL Triggers

In one project, a PDT used the following SQL trigger:

sql_trigger: SELECT MAX(transacted_at) FROM revenue;;

This setup aimed to rebuild the PDT whenever the revenue table’s latest transaction time updated. However, it failed due to timezone inconsistencies—some data was recorded in non-UTC formats. As a result, rebuilding stalled until the timestamp exceeded the max cache period.

Solution: Temporarily replaced the SQL trigger with interval_trigger: "20 minutes" to ensure regular updates while fixing the timezone discrepancies.

Conclusion

Configuring PDT rebuilding in Looker requires careful consideration of your data’s characteristics and update frequency. While options like sql_trigger provide real-time capabilities, they may introduce issues like timezone conflicts. Conversely, interval_trigger ensures stability but may overlook data changes. Select the appropriate option for your scenario to maintain efficient data management.