* extend attributes for project factory secondary resources * remove extra files * complete * tf fmt * tfdoc * schemas * fix tests * tfdoc
404 lines
26 KiB
Markdown
404 lines
26 KiB
Markdown
# Google Cloud Bigquery Module
|
|
|
|
This module allows managing a single BigQuery dataset, including access configuration, tables and views.
|
|
|
|
<!-- BEGIN TOC -->
|
|
- [Simple dataset with access configuration](#simple-dataset-with-access-configuration)
|
|
- [IAM roles](#iam-roles)
|
|
- [Authorized Views, Datasets, and Routines](#authorized-views-datasets-and-routines)
|
|
- [Dataset options](#dataset-options)
|
|
- [Tables, views and routines](#tables-views-and-routines)
|
|
- [Tag bindings](#tag-bindings)
|
|
- [TODO](#todo)
|
|
- [Variables](#variables)
|
|
- [Outputs](#outputs)
|
|
<!-- END TOC -->
|
|
|
|
## Simple dataset with access configuration
|
|
|
|
Access configuration defaults to using the separate `google_bigquery_dataset_access` resource, so as to leave the default dataset access rules untouched.
|
|
|
|
You can choose to manage the `google_bigquery_dataset` access rules instead via the `dataset_access` variable, but be sure to always have at least one `OWNER` access and to avoid duplicating accesses, or `terraform apply` will fail.
|
|
|
|
The access variables are split into `access` and `access_identities` variables, so that dynamic values can be passed in for identities (eg a service account email generated by a different module or resource).
|
|
|
|
```hcl
|
|
module "bigquery-dataset" {
|
|
source = "./fabric/modules/bigquery-dataset"
|
|
project_id = "my-project"
|
|
id = "my_dataset"
|
|
access = {
|
|
reader-group = { role = "READER", type = "group" }
|
|
owner = { role = "OWNER", type = "user" }
|
|
project_owners = { role = "OWNER", type = "special_group" }
|
|
view_1 = { role = "READER", type = "view" }
|
|
}
|
|
access_identities = {
|
|
reader-group = "playground-test@ludomagno.net"
|
|
owner = "ludo@ludomagno.net"
|
|
project_owners = "projectOwners"
|
|
view_1 = "my-project|my_dataset|my-table"
|
|
}
|
|
}
|
|
# tftest modules=1 resources=5 inventory=simple.yaml
|
|
```
|
|
|
|
## IAM roles
|
|
|
|
Access configuration can also be specified via IAM instead of basic roles via the `iam` variable. When using IAM, basic roles cannot be used via the `access` family variables.
|
|
|
|
```hcl
|
|
module "bigquery-dataset" {
|
|
source = "./fabric/modules/bigquery-dataset"
|
|
project_id = "my-project"
|
|
id = "my_dataset"
|
|
iam = {
|
|
"roles/bigquery.dataOwner" = ["user:user1@example.org"]
|
|
}
|
|
iam_bindings = {
|
|
reader_user = {
|
|
role = "roles/bigquery.dataViewer"
|
|
members = ["user:user2@example.org"]
|
|
}
|
|
}
|
|
}
|
|
# tftest modules=1 resources=3 inventory=iam.yaml
|
|
```
|
|
|
|
## Authorized Views, Datasets, and Routines
|
|
|
|
You can specify authorized [views](https://cloud.google.com/bigquery/docs/authorized-views), [datasets](https://cloud.google.com/bigquery/docs/authorized-datasets?hl=en), and [routines](https://cloud.google.com/bigquery/docs/authorized-routines) via the `authorized_views`, `authorized_datasets` and `authorized_routines` variables, respectively.
|
|
|
|
```hcl
|
|
// Create private BigQuery dataset that will not be publicly accessible, except via the authorized BigQuery resources
|
|
module "bigquery-dataset-private" {
|
|
source = "./fabric/modules/bigquery-dataset"
|
|
project_id = "private_project"
|
|
id = "private_dataset"
|
|
authorized_views = [
|
|
{
|
|
project_id = "auth_view_project"
|
|
dataset_id = "auth_view_dataset"
|
|
table_id = "auth_view"
|
|
}
|
|
]
|
|
authorized_datasets = [
|
|
{
|
|
project_id = "auth_dataset_project"
|
|
dataset_id = "auth_dataset"
|
|
}
|
|
]
|
|
authorized_routines = [
|
|
{
|
|
project_id = "auth_routine_project"
|
|
dataset_id = "auth_routine_dataset"
|
|
routine_id = "auth_routine"
|
|
}
|
|
]
|
|
}
|
|
|
|
// Create authorized view in a public dataset
|
|
module "bigquery-authorized-views-dataset-public" {
|
|
source = "./fabric/modules/bigquery-dataset"
|
|
project_id = "auth_view_project"
|
|
id = "auth_view_dataset"
|
|
views = {
|
|
auth_view = {
|
|
friendly_name = "Public"
|
|
labels = {}
|
|
query = "SELECT * FROM `private_project.private_dataset.private_table`"
|
|
use_legacy_sql = false
|
|
deletion_protection = true
|
|
}
|
|
}
|
|
}
|
|
|
|
// Create public authorized dataset
|
|
module "bigquery-authorized-dataset-public" {
|
|
source = "./fabric/modules/bigquery-dataset"
|
|
project_id = "auth_dataset_project"
|
|
id = "auth_dataset"
|
|
}
|
|
|
|
// Create public authorized routine
|
|
module "bigquery-authorized-authorized-routine-dataset-public" {
|
|
source = "./fabric/modules/bigquery-dataset"
|
|
project_id = "auth_routine_project"
|
|
id = "auth_routine_dataset"
|
|
}
|
|
|
|
resource "google_bigquery_routine" "public-routine" {
|
|
project = "private_project"
|
|
dataset_id = module.bigquery-authorized-authorized-routine-dataset-public.dataset_id
|
|
routine_id = "auth_routine"
|
|
routine_type = "TABLE_VALUED_FUNCTION"
|
|
language = "SQL"
|
|
definition_body = <<-EOS
|
|
SELECT 1 + value AS value
|
|
EOS
|
|
arguments {
|
|
name = "value"
|
|
argument_kind = "FIXED_TYPE"
|
|
data_type = jsonencode({ "typeKind" = "INT64" })
|
|
}
|
|
return_table_type = jsonencode({ "columns" = [
|
|
{ "name" = "value", "type" = { "typeKind" = "INT64" } },
|
|
] })
|
|
}
|
|
# tftest modules=4 resources=9 inventory=authorized_resources.yaml
|
|
```
|
|
|
|
Authorized views can be specified both using the standard `access` options and the `authorized_views` blocks. The example configuration below uses both blocks, and will create a dataset with three authorized views `view_id_1`, `view_id_2`, and `view_id_3`.
|
|
|
|
```hcl
|
|
module "bigquery-dataset" {
|
|
source = "./fabric/modules/bigquery-dataset"
|
|
project_id = "my-project"
|
|
id = "my_dataset"
|
|
authorized_views = [
|
|
{
|
|
project_id = "view_project"
|
|
dataset_id = "view_dataset"
|
|
table_id = "view_id_1"
|
|
},
|
|
{
|
|
project_id = "view_project"
|
|
dataset_id = "view_dataset"
|
|
table_id = "view_id_2"
|
|
}
|
|
]
|
|
access = {
|
|
view_2 = { role = "READER", type = "view" }
|
|
view_3 = { role = "READER", type = "view" }
|
|
}
|
|
access_identities = {
|
|
view_2 = "view_project|view_dataset|view_id_2"
|
|
view_3 = "view_project|view_dataset|view_id_3"
|
|
}
|
|
}
|
|
# tftest modules=1 resources=4 inventory=authorized_resources_views.yaml
|
|
```
|
|
|
|
## Dataset options
|
|
|
|
Dataset options are set via the `options` variable. all options must be specified, but a `null` value can be set to options that need to use defaults.
|
|
|
|
```hcl
|
|
module "bigquery-dataset" {
|
|
source = "./fabric/modules/bigquery-dataset"
|
|
project_id = "my-project"
|
|
id = "my_dataset"
|
|
options = {
|
|
default_table_expiration_ms = 3600000
|
|
default_partition_expiration_ms = null
|
|
delete_contents_on_destroy = false
|
|
max_time_travel_hours = 168
|
|
}
|
|
}
|
|
# tftest modules=1 resources=1 inventory=options.yaml
|
|
```
|
|
|
|
## Tables, views and routines
|
|
|
|
Tables are created via the `tables` variable. Support for external tables will be added in a future release.
|
|
|
|
```hcl
|
|
locals {
|
|
countries_schema = jsonencode([
|
|
{ name = "country", type = "STRING" },
|
|
{ name = "population", type = "INT64" },
|
|
])
|
|
}
|
|
|
|
module "bigquery-dataset" {
|
|
source = "./fabric/modules/bigquery-dataset"
|
|
project_id = "my-project"
|
|
id = "my_dataset"
|
|
tables = {
|
|
countries = {
|
|
friendly_name = "Countries"
|
|
schema = local.countries_schema
|
|
deletion_protection = true
|
|
}
|
|
}
|
|
}
|
|
# tftest modules=1 resources=2 inventory=tables.yaml
|
|
```
|
|
|
|
If partitioning is needed, populate the `partitioning` variable using either the `time` or `range` attribute.
|
|
|
|
```hcl
|
|
locals {
|
|
countries_schema = jsonencode([
|
|
{ name = "country", type = "STRING" },
|
|
{ name = "population", type = "INT64" },
|
|
])
|
|
}
|
|
|
|
module "bigquery-dataset" {
|
|
source = "./fabric/modules/bigquery-dataset"
|
|
project_id = "my-project"
|
|
id = "my_dataset"
|
|
tables = {
|
|
table_a = {
|
|
deletion_protection = true
|
|
friendly_name = "Table a"
|
|
schema = local.countries_schema
|
|
partitioning = {
|
|
time = { type = "DAY", expiration_ms = null }
|
|
}
|
|
}
|
|
}
|
|
}
|
|
# tftest modules=1 resources=2 inventory=partitioning.yaml
|
|
```
|
|
|
|
To create views use the `views` variable. If you're querying a table created by the same module `terraform apply` will initially fail and eventually succeed once the underlying table has been created. You can probably also use the module's output in the view's query to create a dependency on the table.
|
|
|
|
```hcl
|
|
locals {
|
|
countries_schema = jsonencode([
|
|
{ name = "country", type = "STRING" },
|
|
{ name = "population", type = "INT64" },
|
|
])
|
|
population_schema = [
|
|
{
|
|
name = "total",
|
|
type = "INT64",
|
|
description = "Total population"
|
|
}
|
|
]
|
|
}
|
|
|
|
module "bigquery-dataset" {
|
|
source = "./fabric/modules/bigquery-dataset"
|
|
project_id = "my-project"
|
|
id = "my_dataset"
|
|
tables = {
|
|
countries = {
|
|
friendly_name = "Countries"
|
|
schema = local.countries_schema
|
|
deletion_protection = true
|
|
}
|
|
}
|
|
views = {
|
|
population = {
|
|
friendly_name = "Population"
|
|
query = "SELECT SUM(population) AS total FROM my_dataset.countries"
|
|
schema = local.population_schema
|
|
use_legacy_sql = false
|
|
deletion_protection = true
|
|
}
|
|
}
|
|
}
|
|
# tftest modules=1 resources=3 inventory=views.yaml
|
|
```
|
|
|
|
To create routines use the `routines` variable.
|
|
|
|
```hcl
|
|
module "bigquery-dataset" {
|
|
source = "./fabric/modules/bigquery-dataset"
|
|
project_id = "my-project"
|
|
id = "my_dataset"
|
|
routines = {
|
|
custom_masking_routine = {
|
|
routine_type = "SCALAR_FUNCTION"
|
|
language = "SQL"
|
|
data_governance_type = "DATA_MASKING"
|
|
definition_body = "SAFE.REGEXP_REPLACE(ssn, '[0-9]', 'X')"
|
|
return_type = "{\"typeKind\" : \"STRING\"}"
|
|
arguments = {
|
|
ssn = {
|
|
data_type = "{\"typeKind\" : \"STRING\"}"
|
|
}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
# tftest modules=1 resources=2 inventory=routines.yaml
|
|
```
|
|
|
|
## Tag bindings
|
|
|
|
Refer to the [Creating and managing tags](https://cloud.google.com/resource-manager/docs/tags/tags-creating-and-managing) documentation for details on usage.
|
|
|
|
```hcl
|
|
module "org" {
|
|
source = "./fabric/modules/organization"
|
|
organization_id = var.organization_id
|
|
tags = {
|
|
environment = {
|
|
description = "Environment specification."
|
|
values = {
|
|
dev = {}
|
|
prod = {}
|
|
sandbox = {}
|
|
}
|
|
}
|
|
}
|
|
}
|
|
|
|
module "bigquery-dataset" {
|
|
source = "./fabric/modules/bigquery-dataset"
|
|
project_id = "my-project"
|
|
id = "my_dataset"
|
|
tag_bindings = {
|
|
env-sandbox = module.org.tag_values["environment/sandbox"].id
|
|
}
|
|
}
|
|
# tftest modules=2 resources=6
|
|
```
|
|
|
|
## TODO
|
|
|
|
- [ ] check for dynamic values in tables and views
|
|
- [ ] add support for external tables
|
|
<!-- BEGIN TFDOC -->
|
|
## Variables
|
|
|
|
| name | description | type | required | default |
|
|
|---|---|:---:|:---:|:---:|
|
|
| [id](variables.tf#L111) | Dataset id. | <code>string</code> | ✓ | |
|
|
| [project_id](variables.tf#L175) | Id of the project where datasets will be created. | <code>string</code> | ✓ | |
|
|
| [access](variables.tf#L17) | Map of access rules with role and identity type. Keys are arbitrary and must match those in the `access_identities` variable, types are `domain`, `group`, `special_group`, `user`, `view`. | <code title="map(object({ role = string type = string }))">map(object({…}))</code> | | <code>{}</code> |
|
|
| [access_identities](variables.tf#L33) | Map of access identities used for basic access roles. View identities have the format 'project_id\|dataset_id\|table_id'. | <code>map(string)</code> | | <code>{}</code> |
|
|
| [authorized_datasets](variables.tf#L39) | An array of datasets to be authorized on the dataset. | <code title="list(object({ dataset_id = string, project_id = string, }))">list(object({…}))</code> | | <code>[]</code> |
|
|
| [authorized_routines](variables.tf#L48) | An array of routines to be authorized on the dataset. | <code title="list(object({ project_id = string, dataset_id = string, routine_id = string }))">list(object({…}))</code> | | <code>[]</code> |
|
|
| [authorized_views](variables.tf#L58) | An array of views to be authorized on the dataset. | <code title="list(object({ dataset_id = string, project_id = string, table_id = string # this is the view id, but we keep table_id to stay consistent as the resource }))">list(object({…}))</code> | | <code>[]</code> |
|
|
| [context](variables.tf#L68) | Context-specific interpolations. | <code title="object({ condition_vars = optional(map(map(string)), {}) custom_roles = optional(map(string), {}) kms_keys = optional(map(string), {}) iam_principals = optional(map(string), {}) locations = optional(map(string), {}) project_ids = optional(map(string), {}) tag_values = optional(map(string), {}) })">object({…})</code> | | <code>{}</code> |
|
|
| [dataset_access](variables.tf#L83) | Set access in the dataset resource instead of using separate resources. | <code>bool</code> | | <code>false</code> |
|
|
| [description](variables.tf#L89) | Optional description. | <code>string</code> | | <code>"Terraform managed."</code> |
|
|
| [encryption_key](variables.tf#L95) | Self link of the KMS key that will be used to protect destination table. | <code>string</code> | | <code>null</code> |
|
|
| [friendly_name](variables.tf#L101) | Dataset friendly name. | <code>string</code> | | <code>null</code> |
|
|
| [iam](variables-iam.tf#L17) | IAM bindings in {ROLE => [MEMBERS]} format. Mutually exclusive with the access_* variables used for basic roles. | <code>map(list(string))</code> | | <code>{}</code> |
|
|
| [iam_bindings](variables-iam.tf#L23) | Authoritative IAM bindings in {KEY => {role = ROLE, members = [], condition = {}}}. Keys are arbitrary. | <code title="map(object({ members = list(string) role = string condition = optional(object({ expression = string title = string description = optional(string) })) }))">map(object({…}))</code> | | <code>{}</code> |
|
|
| [iam_bindings_additive](variables-iam.tf#L38) | Individual additive IAM bindings. Keys are arbitrary. | <code title="map(object({ member = string role = string condition = optional(object({ expression = string title = string description = optional(string) })) }))">map(object({…}))</code> | | <code>{}</code> |
|
|
| [iam_by_principals](variables-iam.tf#L53) | Authoritative IAM binding in {PRINCIPAL => [ROLES]} format. Principals need to be statically defined to avoid errors. Merged internally with the `iam` variable. | <code>map(list(string))</code> | | <code>{}</code> |
|
|
| [labels](variables.tf#L116) | Dataset labels. | <code>map(string)</code> | | <code>{}</code> |
|
|
| [location](variables.tf#L122) | Dataset location. | <code>string</code> | | <code>"EU"</code> |
|
|
| [materialized_views](variables.tf#L128) | Materialized views definitions. | <code title="map(object({ query = string allow_non_incremental_definition = optional(bool) deletion_protection = optional(bool) description = optional(string, "Terraform managed.") enable_refresh = optional(bool) friendly_name = optional(string) labels = optional(map(string), {}) refresh_interval_ms = optional(bool) require_partition_filter = optional(bool) options = optional(object({ clustering = optional(list(string)) expiration_time = optional(number) }), {}) partitioning = optional(object({ field = optional(string) range = optional(object({ end = number interval = number start = number })) time = optional(object({ type = string expiration_ms = optional(number) field = optional(string) })) })) }))">map(object({…}))</code> | | <code>{}</code> |
|
|
| [options](variables.tf#L161) | Dataset options. | <code title="object({ default_collation = optional(string) default_table_expiration_ms = optional(number) default_partition_expiration_ms = optional(number) delete_contents_on_destroy = optional(bool, false) is_case_insensitive = optional(bool) max_time_travel_hours = optional(number, 168) storage_billing_model = optional(string) })">object({…})</code> | | <code>{}</code> |
|
|
| [routines](variables.tf#L180) | Routine definitions. | <code title="map(object({ description = optional(string) routine_type = string language = optional(string) definition_body = string imported_libraries = optional(list(string)) determinism_level = optional(string) data_governance_type = optional(string) return_type = optional(string) return_table_type = optional(string) arguments = optional(map(object({ argument_kind = optional(string) mode = optional(string) data_type = optional(string) })), {}) spark_options = optional(object({ archive_uris = optional(list(string), []) connection = string container_image = optional(string) file_uris = optional(list(string), []) jar_uris = optional(list(string), []) main_file_uri = optional(string) main_class = optional(string) properties = optional(map(string), {}) py_file_uris = optional(list(string), []) runtime_version = optional(string) })) remote_function_options = optional(object({ connection = string endpoint = optional(string) max_batching_rows = optional(string) user_defined_context = optional(map(string), {}) })) }))">map(object({…}))</code> | | <code>{}</code> |
|
|
| [tables](variables.tf#L219) | Table definitions. Options and partitioning default to null. Partitioning can only use `range` or `time`, set the unused one to null. | <code title="map(object({ deletion_protection = optional(bool) description = optional(string, "Terraform managed.") friendly_name = optional(string) labels = optional(map(string), {}) require_partition_filter = optional(bool) schema = optional(string) external_data_configuration = optional(object({ autodetect = bool source_uris = list(string) avro_logical_types = optional(bool) compression = optional(string) connection_id = optional(string) file_set_spec_type = optional(string) ignore_unknown_values = optional(bool) metadata_cache_mode = optional(string) object_metadata = optional(string) json_options_encoding = optional(string) reference_file_schema_uri = optional(string) schema = optional(string) source_format = optional(string) max_bad_records = optional(number) csv_options = optional(object({ quote = string allow_jagged_rows = optional(bool) allow_quoted_newlines = optional(bool) encoding = optional(string) field_delimiter = optional(string) skip_leading_rows = optional(number) })) google_sheets_options = optional(object({ range = optional(string) skip_leading_rows = optional(number) })) hive_partitioning_options = optional(object({ mode = optional(string) require_partition_filter = optional(bool) source_uri_prefix = optional(string) })) parquet_options = optional(object({ enum_as_string = optional(bool) enable_list_inference = optional(bool) })) })) options = optional(object({ clustering = optional(list(string)) encryption_key = optional(string) expiration_time = optional(number) max_staleness = optional(string) }), {}) partitioning = optional(object({ field = optional(string) range = optional(object({ end = number interval = number start = number })) time = optional(object({ type = string expiration_ms = optional(number) field = optional(string) })) })) table_constraints = optional(object({ primary_key_columns = optional(list(string)) foreign_keys = optional(object({ referenced_table = object({ project_id = string dataset_id = string table_id = string }) column_references = object({ referencing_column = string referenced_column = string }) name = optional(string) })) })) }))">map(object({…}))</code> | | <code>{}</code> |
|
|
| [tag_bindings](variables.tf#L304) | Tag bindings for this dataset, in key => tag value id format. | <code>map(string)</code> | | <code>{}</code> |
|
|
| [views](variables.tf#L311) | View definitions. | <code title="map(object({ query = string deletion_protection = optional(bool) description = optional(string, "Terraform managed.") friendly_name = optional(string) labels = optional(map(string), {}) use_legacy_sql = optional(bool) schema = optional(list(object({ name = string type = string description = string mode = optional(string) }))) }))">map(object({…}))</code> | | <code>{}</code> |
|
|
|
|
## Outputs
|
|
|
|
| name | description | sensitive |
|
|
|---|---|:---:|
|
|
| [dataset](outputs.tf#L17) | Dataset resource. | |
|
|
| [dataset_id](outputs.tf#L22) | Dataset id. | |
|
|
| [id](outputs.tf#L37) | Fully qualified dataset id. | |
|
|
| [materialized_view_ids](outputs.tf#L52) | Map of fully qualified materialized view ids keyed by view ids. | |
|
|
| [materialized_views](outputs.tf#L57) | Materialized view resources. | |
|
|
| [routine_ids](outputs.tf#L62) | Map of fully qualified routine ids keyed by routine ids. | |
|
|
| [routines](outputs.tf#L67) | Routine resources. | |
|
|
| [self_link](outputs.tf#L72) | Dataset self link. | |
|
|
| [table_ids](outputs.tf#L87) | Map of fully qualified table ids keyed by table ids. | |
|
|
| [tables](outputs.tf#L92) | Table resources. | |
|
|
| [view_ids](outputs.tf#L97) | Map of fully qualified view ids keyed by view ids. | |
|
|
| [views](outputs.tf#L102) | View resources. | |
|
|
<!-- END TFDOC -->
|