A data engineer's migration guide to dbt v1.0.0
The highly anticipated
dbt version 1.0.0 release just came out a few days ago, and your data team may be looking to upgrade soon to try out some of the new features and enhancements. If you are in charge of said migration, you should also be interested in any potential breaking changes that may impact your project, as well as your and your team’s daily dbt workflows. If you feel this applies to you, then this migration guide is for you: I’ll be going over the new changes coming with a particular focus on avoiding risks during the version upgrade.
This guide will take a lot of information from the official migration guide, but aims to be more self-contained by offering practical migration examples and cover more potential breaking changes.
Right from the get go, the way we install
dbt is no longer
pip install dbt, as this has been deprecated and will raise an error. From v1.0.0 onwards, to install the core
dbt CLI you should run
pip install dbt-core.
Before v1.0.0, installing dbt would also install all database adapters (Postgres, Snowflake, Redshift, and BigQuery). But now, after installing
dbt-core, you have to install any adapters that you need by running
pip install dbt-<adapter>.
This has probably brought you issues if you are running Amazon’s MWAA, as the Snowflake adapter has a Rust dependency that not all platforms may support1. This was particularly annoying since the Snowflake adapter was installed even if you were not using Snowflake! As someone who ran into these issues a lot, I’m very happy to see this change, although I would have preferred the inclusion of the adapters as extras of the main
dbt-core package to simplify the installation even more.
On a minor note, support for Python 3.6 has been dropped by
dbt v1.0.0, as we are approaching EOL.
Unfortunately, as of the time of writing, the official Docker images have yet to be updated with the v1.0.0 release. And the official docs state that we should wait for more information coming soon. I suspect that dbt-labs may be migrating to a new DockerHub organization given that they were still using the old fishtownanalytics (just speculating, seems plausible that they would want to clean up things for a 1.0 release).
If you rely on the Docker image for production, development, or CI/CD I recommend either holding off on the upgrade until the new images are available, or building your own using the official Dockerfile:
git clone https://github.com/dbt-labs/dbt-core cd dbt-core docker build -t dbt:1.0.0 \ --build-arg BASE_REQUIREMENTS_SRC_PATH=docker/requirements/requirements.txt \ --build-arg DIST_PATH=dist/ \ --build-arg WHEEL_REQUIREMENTS_SRC_PATH=docker/requirements/requirements.txt -f docker/Dockerfile .
I’m don’t know too much about the Docker image building pipeline used by dbt-labs, as I couldn’t find any relevant GitHub actions, so I just plugged in arguments that work. You could probably write a more straight forward image:
ARG BASE_IMAGE=python:3.8-slim-bullseye FROM $BASE_IMAGE RUN apt-get update \ && apt-get dist-upgrade -y \ && apt-get install -y --no-install-recommends \ git \ ssh-client \ software-properties-common \ make \ build-essential \ ca-certificates \ libpq-dev \ && apt-get clean \ && rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/* RUN pip install --upgrade pip setuptools RUN pip install dbt-core==1.0.0 ENV PYTHONIOENCODING=utf-8 ENV LANG C.UTF-8 WORKDIR /usr/app VOLUME /usr/app ENTRYPOINT ["dbt"]
Just add your adapters to the highlighted line. Although keep in mind I haven’t tested this too much!
If you rely on
airflow-dbt-python to run
dbt operators for
Airflow, support for v1.0.0 of
dbt is planned for version 0.10 projected to be released before December 15th 2021.
Updates to dbt_project.yml
Significant changes have come to the configuration of your
dbt_project.yml. Here is a summary table of the changes:
||Default is now
||Default is now
||Default is now
All these changes should be pretty straight-forward: assuming that your
dbt_project.yml file in v0.21 was:
name: 'my_new_project' version: '1.0.0' config-version: 2 profile: 'default' source-paths: ["models"] analysis-paths: ["analysis"] test-paths: ["test"] data-paths: ["data"] macro-paths: ["macros"] snapshot-paths: ["snapshots"] target-path: "target" modules-path: "dbt_modules" clean-targets: - "target" - "dbt_modules" models: my_new_project: example: +materialized: view seeds: +quote_columns: true
dbt_project.yml now needs attention in the following lines:
name: 'my_new_project' version: '1.0.0' config-version: 2 profile: 'default' model-paths: ["models"] analysis-paths: ["analyses"] # Only default changed test-paths: ["tests"] # Only default changed seed-paths: ["seeds"] macro-paths: ["macros"] snapshot-paths: ["snapshots"] target-path: "target" packages-install-path: "dbt_packages" clean-targets: - "target" - "dbt_packages" models: my_new_project: example: +materialized: view seeds: +quote_columns: true # Now default for all adapters except Snowflake
Workflow changes coming with dbt v1.0.0
This section covers changes that affect the way we invoke
dbt commands, either via the CLI or using Airflow with
Running dbt singular and generic tests
dbt tests came in two flavors:
schema tests. These are now
generic tests. This changes the invocation of the
dbt test command as
--schema flags are now deprecated. This means:
dbt test --data dbt test --schema
dbt test --select test_type:singular dbt test --select test_type:generic
test_type selection method still accepts
schema for backwards compatibility:
dbt test --select test_type:data test_type:schema
But I would still recommend migrating these invocations too as backwards compatibility may be dropped in the future.
In Airflow with airflow-dbt-python
As of version 0.10,
airflow-dbt-python reflects these flag changes by deprecating the
schema attributes of
DbtTestOperator in favor of
generic. This means:
data_tests = DbtTestOperator( task_id="dbt_test", data=True, ) schema_tests = DbtTestOperator( task_id="dbt_test", schema=True, ) all_tests = DbtTestOperator( task_id="dbt_test", select=["test_type:data", "test_type:schema"], )
singular_tests = DbtTestOperator( task_id="dbt_test", singular=True, ) generic_tests = DbtTestOperator( task_id="dbt_test", generic=True, ) all_tests = DbtTestOperator( task_id="dbt_test", select=["test_type:singular", "test_type:generic"], )
Deprecated macros and arguments
Some macros have been deprecated, ensure these are not in use in your models:
On the arguments side:
releasehas been removed from
packageshas been deprecated from
The dbt RPC server is no longer part of dbt-core
Eventually, the RPC is being replaced by a new dbt Server. In the meantime, you will have to install the
dbt-rpc package to continue using the RPC server:
pip install dbt-rpc
Also, instead of using the
dbt rpc command, you will have to call:
New artifact schema versions
dbt following artifacts had their schemas updated:
- manifest: Now in v4. Most notable changes from previous version:
- Added metrics nodes in top level key.
- Renamed schema and data test nodes to reflect their new names.
- run results: Also in v4. No major changes from previous version at least in top level keys.
- sources: Now in v3. No major changes from previous version at least in top level keys.
This could impact any data ingestion pipelines laid in place to consume this information.
New logging for adapter plugins
dbt logging has been migrated to a new structured event interface. If you are an adapter maintainer you will have to either set the environment variable
DBT_ENABLE_LEGACY_LOGGER=True to use legacy logging or migrate to the new one. I’m not an adapter maintainer so I don’t think I can add too much to this section. Check the official README.
For the majority of
dbt users the migration should be very straight-forward: you’ll have to review your
dbt_project.yml, update your
dbt test calls, and ensure you are using the new PyPI packages in any installation processes.
The lack of a Docker image is perhaps the only thing I would consider a blocker for this upgrade, although if you can build your own do so as the new features coming with
dbt v1.0.0 seem very worth it. I’ll make sure to update this post once an official Docker image is published.
#python #data-engineering #dbt