Introduction
The DBT Composer blueprint is available on Github here. The blueprint is ready to deploy terraform with a fully operational and integrated dashboard.
By deploying the example DBT Job (README), the following architecture will be instantiated, including the example operational dashboard:
DBT cloud architecture diagram Dashboard
The dashboard is below (add your own color):
DBT dashboardThe dashboard includes working links to the following:
- Airflow DAG, task, logs where the DBT job was launched;
- DBT generated files (logs, target) and DBT generated documentation;
- Container build and source information;
- BigQuery jobs, destination tables, and audit log information (billed bytes, slot seconds)
Capturing DBT Metadata & Output
Capturing metadata provides actionable links in Looker Studio and requires the following components:
- Cloud Build and Docker are used to bake into the container environment variables, prefixed with
DBT_ENV_CUSTOM_ENV
, for the build and source metadata.
- Airflow operator sets environment variables, prefixed with
DBT_ENV_CUSTOM_ENV
, are assigned with Airflow context (Airflow UI link, task ID, dag ID, and execution date).
- DBT on-run-hook is used to create a SQL comment that contains environment variables and other metadata to be logged into the audit logs.
- Audit logs are exported into BigQuery and materialized views created to power Looker Studio dashboard.
In additional, capturing the output requires a few other techniques:
- GCS Fuse is used to map DBT’s target and logs folder to a GCS folder, prefixed by Airflow task ID, dag ID, execution date
- DBT documentation is generated with a static HTML so it can be hosted directly by GCS
Conclusion
This blueprint provides a ready-to-go operational environment for running DBT in Composer, providing traceability from individual BigQuery jobs to DBT invocation to Airflow task and dag.