Files
hunfabric/modules/bigquery-dataset/README.md
Julio Castillo 2eaa0d5e27 Add support for dynamic tags (#3897)
* Allow creation of dynamic tags

* Extend project factory and related modules to support dynamic values

* Extend folder and organization modules

* project and organization readme

* Simplify dynamic tag support and remove unnecessary restrictions

  • Schemas & Validations: Removed the restriction that forbade combining IAM fields with  allowed_values_regex  on tags. Updated validations in  project  and  organization  modules, and
  simplified all relevant JSON schemas.
  • Module Tag Bindings: Simplified the  tag_value  assignment in  folder ,  project ,  gcs ,  bigquery-dataset , and  kms  modules by removing the defensive  can(regex(...))  check and
  calling  templatestring  directly.
  • Outputs: Removed the  tags_dynamic  output from  project  and  organization  modules, as the same information is now available in  tag_keys .
  • Project Factory: Updated  tag_vars_projects  in  projects.tf  to use the native  namespaced_name  attribute and filtered manually for dynamic tags.

* fix(organization, project): fix linting and tests for dynamic tag support

- Align allowed_values_regex and description extraction in _tags_merged
  locals to use lookup() for consistency with other fields.
- Fix spacing in project context variable (alphabetical ordering).
- Update organization tags test to include the new cost_center tag key
  with allowed_values_regex.
- Update project tags test to include the new cost_center tag key and
  reflect the resolved allowed_values_regex on environment.

* refactor(gcs): refine tag bindings and fix context test

- Add _tag_bindings local to pre-resolve context references, enabling
  templatestring to receive a direct map reference (required by Terraform).
- Use var.context.tag_vars instead of the non-existent local.ctx.tag_vars.
- Fix HCL syntax in context.tfvars (escaped inner quotes).
- Update context test inventory to reflect 3 tag bindings including a
  dynamic value resolved via templatestring.

* refactor: align modules with tag binding context pattern

- Add _tag_bindings local + templatestring dance to cloud-run-v2,
  compute-vm, folder, kms modules (bigquery-dataset already had it)
- Exclude tag_vars from local.ctx in cloud-run-v2, compute-vm, folder,
  kms, project modules (bigquery-dataset already had it)
- Add tag_vars to context variable in cloud-run-v2, compute-vm modules
  (others already had it)
- Update all context tests with dynamic tag binding values using
  var.context.tag_vars

* docs: add module-level tftest.yaml test instructions to GEMINI.md

* docs: regenerate READMEs after tag-regex alignment

- Regenerate variable tables in 7 module READMEs to reflect
  line number shifts from prior tag-regex changes
- Add tag_vars exclusion to gcs ctx local
- Fix whitespace alignment in iam-service-account and
  project-factory tag_vars blocks
- Update tftest resource counts for organization and project
- Remove tags_dynamic from organization/project output tables

* fix(project-factory): update test inventory for tag_bindings module split

- Move tag binding address from folder-2 to folder-2-iam in test
  inventory (tag_bindings moved from creation to IAM modules)
- Update module instance count from 34 to 35
- Regenerate README tables after terraform fmt line shifts
- Apply terraform fmt to variables.tf

* refactor(project-factory): remove unnecessary depends_on from folder-iam modules

Folder IAM modules depend on their own folder creation modules, not
on module.projects. The explicit depends_on was leftover from an
earlier design.

* FAST stages

* Address review comments.

- FAST Stages:
  - Added tag_keys to output-files.tf in 0-org-setup to pass org tags via tfvars.
  - Sorted tag_keys and tag_values in output-files.tf.
  - Updated project-factory, networking, and security stages to use tag_keys.
  - Filtered tag_keys for dynamic tags only.
- Modules:
  - Excluded tag_vars from local.ctx in iam-service-account and organization.
  - Simplified tag_value in iam-service-account.
- Tests:
  - Updated test inventories for 0-org-setup and project-factory.

* Fix tf format

* Fix tfdoc

* docs: add ADR for templatestring vars convention and update status of base path ADR

* More tfdoc

* Update schemas

* Use endswith in context loop

* Address review

* Update FAST readmes

* Update last modules

* Terraform fmt

* Revert alloydb

* Fix whitespace

---------

Co-authored-by: Ludovico Magnocavallo <ludo@qix.it>
2026-04-24 20:45:45 +00:00

404 lines
16 KiB
Markdown

# Google Cloud Bigquery Module
This module allows managing a single BigQuery dataset, including access configuration, tables and views.
<!-- BEGIN TOC -->
- [Simple dataset with access configuration](#simple-dataset-with-access-configuration)
- [IAM roles](#iam-roles)
- [Authorized Views, Datasets, and Routines](#authorized-views-datasets-and-routines)
- [Dataset options](#dataset-options)
- [Tables, views and routines](#tables-views-and-routines)
- [Tag bindings](#tag-bindings)
- [TODO](#todo)
- [Variables](#variables)
- [Outputs](#outputs)
<!-- END TOC -->
## Simple dataset with access configuration
Access configuration defaults to using the separate `google_bigquery_dataset_access` resource, so as to leave the default dataset access rules untouched.
You can choose to manage the `google_bigquery_dataset` access rules instead via the `dataset_access` variable, but be sure to always have at least one `OWNER` access and to avoid duplicating accesses, or `terraform apply` will fail.
The access variables are split into `access` and `access_identities` variables, so that dynamic values can be passed in for identities (eg a service account email generated by a different module or resource).
```hcl
module "bigquery-dataset" {
source = "./fabric/modules/bigquery-dataset"
project_id = "my-project"
id = "my_dataset"
access = {
reader-group = { role = "READER", type = "group" }
owner = { role = "OWNER", type = "user" }
project_owners = { role = "OWNER", type = "special_group" }
view_1 = { role = "READER", type = "view" }
}
access_identities = {
reader-group = "playground-test@ludomagno.net"
owner = "ludo@ludomagno.net"
project_owners = "projectOwners"
view_1 = "my-project|my_dataset|my-table"
}
}
# tftest modules=1 resources=5 inventory=simple.yaml
```
## IAM roles
Access configuration can also be specified via IAM instead of basic roles via the `iam` variable. When using IAM, basic roles cannot be used via the `access` family variables.
```hcl
module "bigquery-dataset" {
source = "./fabric/modules/bigquery-dataset"
project_id = "my-project"
id = "my_dataset"
iam = {
"roles/bigquery.dataOwner" = ["user:user1@example.org"]
}
iam_bindings = {
reader_user = {
role = "roles/bigquery.dataViewer"
members = ["user:user2@example.org"]
}
}
}
# tftest modules=1 resources=3 inventory=iam.yaml
```
## Authorized Views, Datasets, and Routines
You can specify authorized [views](https://cloud.google.com/bigquery/docs/authorized-views), [datasets](https://cloud.google.com/bigquery/docs/authorized-datasets?hl=en), and [routines](https://cloud.google.com/bigquery/docs/authorized-routines) via the `authorized_views`, `authorized_datasets` and `authorized_routines` variables, respectively.
```hcl
// Create private BigQuery dataset that will not be publicly accessible, except via the authorized BigQuery resources
module "bigquery-dataset-private" {
source = "./fabric/modules/bigquery-dataset"
project_id = "private_project"
id = "private_dataset"
authorized_views = [
{
project_id = "auth_view_project"
dataset_id = "auth_view_dataset"
table_id = "auth_view"
}
]
authorized_datasets = [
{
project_id = "auth_dataset_project"
dataset_id = "auth_dataset"
}
]
authorized_routines = [
{
project_id = "auth_routine_project"
dataset_id = "auth_routine_dataset"
routine_id = "auth_routine"
}
]
}
// Create authorized view in a public dataset
module "bigquery-authorized-views-dataset-public" {
source = "./fabric/modules/bigquery-dataset"
project_id = "auth_view_project"
id = "auth_view_dataset"
views = {
auth_view = {
friendly_name = "Public"
labels = {}
query = "SELECT * FROM `private_project.private_dataset.private_table`"
use_legacy_sql = false
deletion_protection = true
}
}
}
// Create public authorized dataset
module "bigquery-authorized-dataset-public" {
source = "./fabric/modules/bigquery-dataset"
project_id = "auth_dataset_project"
id = "auth_dataset"
}
// Create public authorized routine
module "bigquery-authorized-authorized-routine-dataset-public" {
source = "./fabric/modules/bigquery-dataset"
project_id = "auth_routine_project"
id = "auth_routine_dataset"
}
resource "google_bigquery_routine" "public-routine" {
project = "private_project"
dataset_id = module.bigquery-authorized-authorized-routine-dataset-public.dataset_id
routine_id = "auth_routine"
routine_type = "TABLE_VALUED_FUNCTION"
language = "SQL"
definition_body = <<-EOS
SELECT 1 + value AS value
EOS
arguments {
name = "value"
argument_kind = "FIXED_TYPE"
data_type = jsonencode({ "typeKind" = "INT64" })
}
return_table_type = jsonencode({ "columns" = [
{ "name" = "value", "type" = { "typeKind" = "INT64" } },
] })
}
# tftest modules=4 resources=9 inventory=authorized_resources.yaml
```
Authorized views can be specified both using the standard `access` options and the `authorized_views` blocks. The example configuration below uses both blocks, and will create a dataset with three authorized views `view_id_1`, `view_id_2`, and `view_id_3`.
```hcl
module "bigquery-dataset" {
source = "./fabric/modules/bigquery-dataset"
project_id = "my-project"
id = "my_dataset"
authorized_views = [
{
project_id = "view_project"
dataset_id = "view_dataset"
table_id = "view_id_1"
},
{
project_id = "view_project"
dataset_id = "view_dataset"
table_id = "view_id_2"
}
]
access = {
view_2 = { role = "READER", type = "view" }
view_3 = { role = "READER", type = "view" }
}
access_identities = {
view_2 = "view_project|view_dataset|view_id_2"
view_3 = "view_project|view_dataset|view_id_3"
}
}
# tftest modules=1 resources=4 inventory=authorized_resources_views.yaml
```
## Dataset options
Dataset options are set via the `options` variable. all options must be specified, but a `null` value can be set to options that need to use defaults.
```hcl
module "bigquery-dataset" {
source = "./fabric/modules/bigquery-dataset"
project_id = "my-project"
id = "my_dataset"
options = {
default_table_expiration_ms = 3600000
default_partition_expiration_ms = null
delete_contents_on_destroy = false
max_time_travel_hours = 168
}
}
# tftest modules=1 resources=1 inventory=options.yaml
```
## Tables, views and routines
Tables are created via the `tables` variable. Support for external tables will be added in a future release.
```hcl
locals {
countries_schema = jsonencode([
{ name = "country", type = "STRING" },
{ name = "population", type = "INT64" },
])
}
module "bigquery-dataset" {
source = "./fabric/modules/bigquery-dataset"
project_id = "my-project"
id = "my_dataset"
tables = {
countries = {
friendly_name = "Countries"
schema = local.countries_schema
deletion_protection = true
}
}
}
# tftest modules=1 resources=2 inventory=tables.yaml
```
If partitioning is needed, populate the `partitioning` variable using either the `time` or `range` attribute.
```hcl
locals {
countries_schema = jsonencode([
{ name = "country", type = "STRING" },
{ name = "population", type = "INT64" },
])
}
module "bigquery-dataset" {
source = "./fabric/modules/bigquery-dataset"
project_id = "my-project"
id = "my_dataset"
tables = {
table_a = {
deletion_protection = true
friendly_name = "Table a"
schema = local.countries_schema
partitioning = {
time = { type = "DAY", expiration_ms = null }
}
}
}
}
# tftest modules=1 resources=2 inventory=partitioning.yaml
```
To create views use the `views` variable. If you're querying a table created by the same module `terraform apply` will initially fail and eventually succeed once the underlying table has been created. You can probably also use the module's output in the view's query to create a dependency on the table.
```hcl
locals {
countries_schema = jsonencode([
{ name = "country", type = "STRING" },
{ name = "population", type = "INT64" },
])
population_schema = [
{
name = "total",
type = "INT64",
description = "Total population"
}
]
}
module "bigquery-dataset" {
source = "./fabric/modules/bigquery-dataset"
project_id = "my-project"
id = "my_dataset"
tables = {
countries = {
friendly_name = "Countries"
schema = local.countries_schema
deletion_protection = true
}
}
views = {
population = {
friendly_name = "Population"
query = "SELECT SUM(population) AS total FROM my_dataset.countries"
schema = local.population_schema
use_legacy_sql = false
deletion_protection = true
}
}
}
# tftest modules=1 resources=3 inventory=views.yaml
```
To create routines use the `routines` variable.
```hcl
module "bigquery-dataset" {
source = "./fabric/modules/bigquery-dataset"
project_id = "my-project"
id = "my_dataset"
routines = {
custom_masking_routine = {
routine_type = "SCALAR_FUNCTION"
language = "SQL"
data_governance_type = "DATA_MASKING"
definition_body = "SAFE.REGEXP_REPLACE(ssn, '[0-9]', 'X')"
return_type = "{\"typeKind\" : \"STRING\"}"
arguments = {
ssn = {
data_type = "{\"typeKind\" : \"STRING\"}"
}
}
}
}
}
# tftest modules=1 resources=2 inventory=routines.yaml
```
## Tag bindings
Refer to the [Creating and managing tags](https://cloud.google.com/resource-manager/docs/tags/tags-creating-and-managing) documentation for details on usage.
```hcl
module "org" {
source = "./fabric/modules/organization"
organization_id = var.organization_id
tags = {
environment = {
description = "Environment specification."
values = {
dev = {}
prod = {}
sandbox = {}
}
}
}
}
module "bigquery-dataset" {
source = "./fabric/modules/bigquery-dataset"
project_id = "my-project"
id = "my_dataset"
tag_bindings = {
env-sandbox = module.org.tag_values["environment/sandbox"].id
}
}
# tftest modules=2 resources=6
```
## TODO
- [ ] check for dynamic values in tables and views
- [ ] add support for external tables
<!-- BEGIN TFDOC -->
## Variables
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [id](variables.tf#L115) | Dataset id. | <code>string</code> | ✓ | |
| [project_id](variables.tf#L179) | Id of the project where datasets will be created. | <code>string</code> | ✓ | |
| [access](variables.tf#L17) | Map of access rules with role and identity type. Keys are arbitrary and must match those in the `access_identities` variable, types are `domain`, `group`, `special_group`, `user`, `view`. | <code>map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [access_identities](variables.tf#L33) | Map of access identities used for basic access roles. View identities have the format 'project_id\|dataset_id\|table_id'. | <code>map&#40;string&#41;</code> | | <code>&#123;&#125;</code> |
| [authorized_datasets](variables.tf#L39) | An array of datasets to be authorized on the dataset. | <code>list&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#91;&#93;</code> |
| [authorized_routines](variables.tf#L48) | An array of routines to be authorized on the dataset. | <code>list&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#91;&#93;</code> |
| [authorized_views](variables.tf#L58) | An array of views to be authorized on the dataset. | <code>list&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#91;&#93;</code> |
| [context](variables.tf#L68) | Context-specific interpolations. | <code>object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>&#123;&#125;</code> |
| [dataset_access](variables.tf#L87) | Set access in the dataset resource instead of using separate resources. | <code>bool</code> | | <code>false</code> |
| [description](variables.tf#L93) | Optional description. | <code>string</code> | | <code>&#34;Terraform managed.&#34;</code> |
| [encryption_key](variables.tf#L99) | Self link of the KMS key that will be used to protect destination table. | <code>string</code> | | <code>null</code> |
| [friendly_name](variables.tf#L105) | Dataset friendly name. | <code>string</code> | | <code>null</code> |
| [iam](variables-iam.tf#L17) | IAM bindings in {ROLE => [MEMBERS]} format. Mutually exclusive with the access_* variables used for basic roles. | <code>map&#40;list&#40;string&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [iam_bindings](variables-iam.tf#L23) | Authoritative IAM bindings in {KEY => {role = ROLE, members = [], condition = {}}}. Keys are arbitrary. | <code>map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [iam_bindings_additive](variables-iam.tf#L38) | Individual additive IAM bindings. Keys are arbitrary. | <code>map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [iam_by_principals](variables-iam.tf#L53) | Authoritative IAM binding in {PRINCIPAL => [ROLES]} format. Principals need to be statically defined to avoid errors. Merged internally with the `iam` variable. | <code>map&#40;list&#40;string&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [labels](variables.tf#L120) | Dataset labels. | <code>map&#40;string&#41;</code> | | <code>&#123;&#125;</code> |
| [location](variables.tf#L126) | Dataset location. | <code>string</code> | | <code>&#34;EU&#34;</code> |
| [materialized_views](variables.tf#L132) | Materialized views definitions. | <code>map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [options](variables.tf#L165) | Dataset options. | <code>object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>&#123;&#125;</code> |
| [routines](variables.tf#L184) | Routine definitions. | <code>map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [tables](variables.tf#L223) | Table definitions. Options and partitioning default to null. Partitioning can only use `range` or `time`, set the unused one to null. | <code>map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#123;&#125;</code> |
| [tag_bindings](variables.tf#L308) | Tag bindings for this dataset, in key => tag value id format. | <code>map&#40;string&#41;</code> | | <code>&#123;&#125;</code> |
| [views](variables.tf#L315) | View definitions. | <code>map&#40;object&#40;&#123;&#8230;&#125;&#41;&#41;</code> | | <code>&#123;&#125;</code> |
## Outputs
| name | description | sensitive |
|---|---|:---:|
| [dataset](outputs.tf#L17) | Dataset resource. | |
| [dataset_id](outputs.tf#L22) | Dataset id. | |
| [id](outputs.tf#L37) | Fully qualified dataset id. | |
| [materialized_view_ids](outputs.tf#L52) | Map of fully qualified materialized view ids keyed by view ids. | |
| [materialized_views](outputs.tf#L57) | Materialized view resources. | |
| [routine_ids](outputs.tf#L62) | Map of fully qualified routine ids keyed by routine ids. | |
| [routines](outputs.tf#L67) | Routine resources. | |
| [self_link](outputs.tf#L72) | Dataset self link. | |
| [table_ids](outputs.tf#L87) | Map of fully qualified table ids keyed by table ids. | |
| [tables](outputs.tf#L92) | Table resources. | |
| [view_ids](outputs.tf#L97) | Map of fully qualified view ids keyed by view ids. | |
| [views](outputs.tf#L102) | View resources. | |
<!-- END TFDOC -->