Skip to main content

Snowflake and Apache Iceberg

dbt supports materializing the table in Iceberg table format in two different ways:

  • The model configuration field table_format = 'iceberg' (legacy)
  • Catalog integration in the model, resource, or dbt_project.yml configuration

We recommend that you use the Iceberg catalog configuration and apply the catalog in the model config for ease of use and future-proof your code. Using table_format = 'iceberg' directly on the model configuration is a legacy approach.

Creating Iceberg Tables

dbt supports creating Iceberg tables for three of the Snowflake materializations:

Iceberg catalogs

Snowflake has support for Iceberg tables via built-in and external catalogs, including:

  • Snowflake built-in catalog (metadata managed by Snowflake’s built-in information schema)
  • Polaris/Open Catalog (managed Polaris)*
  • Glue Data Catalog*
  • Iceberg REST Compatible*

*dbt catalog support coming soon.

To use an externally managed catalog (anything outside of the built-in catalog), you must set up a catalog integration. To do so, you must run a SQL command similar to the following.

External catalogs

Example configurations for external catalogs.

You must set up a catalog integration to use Polaris/Open Catalog (managed Polaris).

Example code:


CREATE CATALOG INTEGRATION my_polaris_catalog_int
CATALOG_SOURCE = POLARIS
TABLE_FORMAT = ICEBERG
REST_CONFIG = (
CATALOG_URI = 'https://<org>-<account>.snowflakecomputing.com/polaris/api/catalog'
CATALOG_NAME = '<open_catalog_name>'
)
REST_AUTHENTICATION = (
TYPE = OAUTH
OAUTH_CLIENT_ID = '<client_id>'
OAUTH_CLIENT_SECRET = '<client_secret>'
OAUTH_ALLOWED_SCOPES = ('PRINCIPAL_ROLE:ALL')
)
ENABLED = TRUE;

Executing this will register the external Polaris catalog with Snowflake. Once configured, dbt can create Iceberg tables in Snowflake that register the existence of the new database object with the catalog as metadata and query Polaris-managed tables.

After you have created the external catalog integration, you will be able to do two things:

  • Query an externally managed table: Snowflake can query Iceberg tables whose metadata lives in the external catalog. In this scenario, Snowflake is a "reader" of the external catalog. The table’s data remains in external cloud storage (AWS S3 or GCP Bucket) as defined in the catalog storage configuration. Snowflake will use the catalog integration to fetch metadata via the REST API. Snowflake then reads the data files from cloud storage.

  • Sync Snowflake-managed tables to an external catalog: You can create a Snowflake Iceberg table that Snowflake manages via a cloud storage location and then register/sync that table to the external catalog. This allows other engines to discover the table.

dbt Catalog Integration Configurations for Snowflake

The following table outlines the configuration fields required to set up a catalog integration for Iceberg tables in Snowflake.

FieldRequiredAccepted values
nameyesName of catalog integration
catalog_nameyesThe name of the catalog integration in Snowflake. For example, my_dbt_iceberg_catalog)
external_volumeyes<external_volume_name>
table_formatyesiceberg
catalog_typeyesbuilt_in, iceberg_rest*

*Coming soon! Stay tuned for updates.

Configure catalog integration for managed Iceberg tables

  1. Create a catalogs.yml at the top level of your dbt project.

    An example of Snowflake Horizon as the catalog:

catalogs:
- name: catalog_horizon
active_write_integration: snowflake_write_integration
write_integrations:
- name: snowflake_write_integration
external_volume: dbt_external_volume
table_format: iceberg
catalog_type: built_in

  1. Apply the catalog configuration at either the model, folder, or project level.

    An example of iceberg_model.sql:

{{
config(
materialized='table',
catalog = catalog_horizon

)
}}

select * from {{ ref('jaffle_shop_customers') }}

  1. Execute the dbt model with a dbt run -s iceberg_model.

For more information, refer to our documentation on Snowflake configurations.

Limitations

For external catalogs, Snowflake only supports read, which means it can query the table but cannot insert or modify data.

The syncing experience will be different depending on the catalog you choose. Some catalogs are automatically refreshed, and you can set parameters to do so with your catalog integration. Other catalogs might require a separate job to manage the metadata sync.

0