diff --git a/blueprints/README.md b/blueprints/README.md index 37a6ae979..dbeb28fbd 100644 --- a/blueprints/README.md +++ b/blueprints/README.md @@ -6,7 +6,7 @@ Currently available blueprints: - **apigee** - [Apigee X foundations](./apigee/apigee-x-foundations/). [Apigee Hybrid on GKE](./apigee/hybrid-gke/), [Apigee X analytics in BigQuery](./apigee/bigquery-analytics), [Apigee network patterns](./apigee/network-patterns/) - **cloud operations** - [Active Directory Federation Services](./cloud-operations/adfs), [Cloud Asset Inventory feeds for resource change tracking and remediation](./cloud-operations/asset-inventory-feed-remediation), [Fine-grained Cloud DNS IAM via Service Directory](./cloud-operations/dns-fine-grained-iam), [Cloud DNS & Shared VPC design](./cloud-operations/dns-shared-vpc), [Delegated Role Grants](./cloud-operations/iam-delegated-role-grants), [Network Quota Monitoring](./cloud-operations/network-quota-monitoring), [Compute Image builder with Hashicorp Packer](./cloud-operations/packer-image-builder), [Packer example](./cloud-operations/packer-image-builder/packer), [Compute Engine quota monitoring](./cloud-operations/compute-quota-monitoring), [Scheduled Cloud Asset Inventory Export to Bigquery](./cloud-operations/scheduled-asset-inventory-export-bq), [Configuring workload identity federation with Terraform Cloud/Enterprise workflows](./cloud-operations/terraform-cloud-dynamic-credentials), [TCP healthcheck and restart for unmanaged GCE instances](./cloud-operations/unmanaged-instances-healthcheck), [Migrate for Compute Engine (v5) blueprints](./cloud-operations/vm-migration), [Configuring workload identity federation to access Google Cloud resources from apps running on Azure](./cloud-operations/workload-identity-federation) -- **data solutions** - [GCE and GCS CMEK via centralized Cloud KMS](./data-solutions/cmek-via-centralized-kms), [Cloud SQL instance with multi-region read replicas](./data-solutions/cloudsql-multiregion), [Data Platform](./data-solutions/data-platform-foundations), [Minimal Data Platform](./data-solutions/data-platform-minimal), [Spinning up a foundation data pipeline on Google Cloud using Cloud Storage, Dataflow and BigQuery](./data-solutions/gcs-to-bq-with-least-privileges), [#SQL Server Always On Groups blueprint](./data-solutions/sqlserver-alwayson), [Data Playground](./data-solutions/data-playground), [MLOps with Vertex AI](./data-solutions/vertex-mlops), [Shielded Folder](./data-solutions/shielded-folder), [BigQuery ML and Vertex AI Pipeline](./data-solutions/bq-ml) +- **data solutions** - [GCE and GCS CMEK via centralized Cloud KMS](./data-solutions/cmek-via-centralized-kms), [Cloud SQL instance with multi-region read replicas](./data-solutions/cloudsql-multiregion), [Minimal Data Platform](./data-solutions/data-platform-minimal), [Spinning up a foundation data pipeline on Google Cloud using Cloud Storage, Dataflow and BigQuery](./data-solutions/gcs-to-bq-with-least-privileges), [#SQL Server Always On Groups blueprint](./data-solutions/sqlserver-alwayson), [Data Playground](./data-solutions/data-playground), [MLOps with Vertex AI](./data-solutions/vertex-mlops), [Shielded Folder](./data-solutions/shielded-folder), [BigQuery ML and Vertex AI Pipeline](./data-solutions/bq-ml) - **factories** - [Fabric resource factories](./factories) - **GKE** - [Binary Authorization Pipeline Blueprint](./gke/binauthz), [Storage API](./gke/binauthz/image), [Multi-cluster mesh on GKE (fleet API)](./gke/multi-cluster-mesh-gke-fleet-api), [GKE Multitenant](../fast/stages/3-gke-dev), [Shared VPC with GKE support](./networking/shared-vpc-gke/), [GKE Autopilot](./gke/autopilot) - **networking** - [Calling a private Cloud Function from On-premises](./networking/private-cloud-function-from-onprem), [HA VPN over Interconnect](./networking/ha-vpn-over-interconnect/), [GLB and multi-regional daisy-chaining through hybrid NEGs](./networking/glb-hybrid-neg-internal), [Hybrid connectivity to on-premise services through PSC](./networking/psc-hybrid), [HTTP Load Balancer with Cloud Armor](./networking/glb-and-armor), On-prem DNS and Google Private Access, [PSC Producer](./networking/psc-hybrid/psc-producer), [PSC Consumer](./networking/psc-hybrid/psc-consumer), [Shared VPC with optional GKE cluster](./networking/shared-vpc-gke), [VPC Connectivity Lab](./networking/vpc-connectivity-lab/) diff --git a/blueprints/data-solutions/README.md b/blueprints/data-solutions/README.md index cce860112..a2b518485 100644 --- a/blueprints/data-solutions/README.md +++ b/blueprints/data-solutions/README.md @@ -25,13 +25,6 @@ This blueprint is deprecated. To create a Cloud Composer instance please consult
-### Data Platform Foundations - - -This [blueprint](./data-platform-foundations/) implements a robust and flexible Data Platform on GCP that provides opinionated defaults, allowing customers to build and scale out additional data pipelines quickly and reliably. - -
- ### Minimal Data Platform diff --git a/blueprints/data-solutions/data-platform-foundations/01-dropoff.tf b/blueprints/data-solutions/data-platform-foundations/01-dropoff.tf deleted file mode 100644 index 152d20045..000000000 --- a/blueprints/data-solutions/data-platform-foundations/01-dropoff.tf +++ /dev/null @@ -1,138 +0,0 @@ -# Copyright 2024 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# tfdoc:file:description drop off project and resources. - -locals { - drp_iam = { - data_engineers = [ - "roles/bigquery.dataEditor", - "roles/bigquery.user" - ] - sa_drop_bq = [ - "roles/bigquery.dataEditor" - ] - sa_drop_cs = [ - "roles/storage.objectCreator" - ] - sa_drop_ps = [ - "roles/pubsub.publisher" - ] - sa_load = [ - "roles/bigquery.user", - "roles/pubsub.subscriber", - "roles/storage.objectAdmin" - ] - sa_orch = [ - "roles/pubsub.subscriber", - "roles/storage.objectViewer" - ] - } -} - -module "drop-project" { - source = "../../../modules/project" - parent = var.project_config.parent - billing_account = var.project_config.billing_account_id - project_reuse = var.project_config.project_create ? null : {} - prefix = local.use_projects ? null : var.prefix - name = ( - local.use_projects - ? var.project_config.project_ids.drop - : "${var.project_config.project_ids.drop}${local.project_suffix}" - ) - iam = local.use_projects ? {} : local.drp_iam_auth - iam_bindings_additive = !local.use_projects ? {} : local.drp_iam_additive - services = concat(var.project_services, [ - "bigquery.googleapis.com", - "bigqueryreservation.googleapis.com", - "bigquerystorage.googleapis.com", - "cloudkms.googleapis.com", - "pubsub.googleapis.com", - "storage.googleapis.com", - "storage-component.googleapis.com", - ]) - service_encryption_key_ids = { - "bigquery.googleapis.com" = compact([var.service_encryption_keys.bq]) - "pubsub.googleapis.com" = compact([var.service_encryption_keys.pubsub]) - "storage.googleapis.com" = compact([var.service_encryption_keys.storage]) - } -} - -module "drop-sa-cs-0" { - source = "../../../modules/iam-service-account" - project_id = module.drop-project.project_id - prefix = var.prefix - name = "drp-cs-0" - display_name = "Data platform GCS drop off service account." - iam = { - "roles/iam.serviceAccountTokenCreator" = [ - local.groups_iam.data-engineers - ] - } -} - -module "drop-cs-0" { - source = "../../../modules/gcs" - project_id = module.drop-project.project_id - prefix = var.prefix - name = "drp-cs-0" - location = var.location - storage_class = "MULTI_REGIONAL" - encryption_key = var.service_encryption_keys.storage - force_destroy = !var.deletion_protection - # retention_policy = { - # retention_period = 7776000 # 90 * 24 * 60 * 60 - # is_locked = false - # } -} - -module "drop-sa-ps-0" { - source = "../../../modules/iam-service-account" - project_id = module.drop-project.project_id - prefix = var.prefix - name = "drp-ps-0" - display_name = "Data platform PubSub drop off service account" - iam = { - "roles/iam.serviceAccountTokenCreator" = [ - local.groups_iam.data-engineers - ] - } -} - -module "drop-ps-0" { - source = "../../../modules/pubsub" - project_id = module.drop-project.project_id - name = "${var.prefix}-drp-ps-0" - kms_key = var.service_encryption_keys.pubsub -} - -module "drop-sa-bq-0" { - source = "../../../modules/iam-service-account" - project_id = module.drop-project.project_id - prefix = var.prefix - name = "drp-bq-0" - display_name = "Data platform BigQuery drop off service account" - iam = { - "roles/iam.serviceAccountTokenCreator" = [local.groups_iam.data-engineers] - } -} - -module "drop-bq-0" { - source = "../../../modules/bigquery-dataset" - project_id = module.drop-project.project_id - id = "${replace(var.prefix, "-", "_")}_drp_bq_0" - location = var.location - encryption_key = var.service_encryption_keys.bq -} diff --git a/blueprints/data-solutions/data-platform-foundations/02-load.tf b/blueprints/data-solutions/data-platform-foundations/02-load.tf deleted file mode 100644 index 32dfe191e..000000000 --- a/blueprints/data-solutions/data-platform-foundations/02-load.tf +++ /dev/null @@ -1,135 +0,0 @@ -# Copyright 2024 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# tfdoc:file:description Load project and VPC. - -locals { - load_iam = { - data_engineers = [ - "roles/dataflow.admin", - "roles/dataflow.developer" - ] - robots_dataflow_load = [ - "roles/storage.objectAdmin" - ] - sa_load = [ - "roles/bigquery.jobUser", - "roles/dataflow.admin", - "roles/dataflow.worker", - "roles/storage.objectAdmin" - ] - sa_orch = [ - "roles/dataflow.admin" - ] - } -} - -module "load-project" { - source = "../../../modules/project" - parent = var.project_config.parent - billing_account = var.project_config.billing_account_id - project_reuse = var.project_config.project_create ? null : {} - prefix = local.use_projects ? null : var.prefix - name = ( - local.use_projects - ? var.project_config.project_ids.load - : "${var.project_config.project_ids.load}${local.project_suffix}" - ) - iam = local.use_projects ? {} : local.load_iam_auth - iam_bindings_additive = !local.use_projects ? {} : local.load_iam_additive - services = concat(var.project_services, [ - "bigquery.googleapis.com", - "bigqueryreservation.googleapis.com", - "bigquerystorage.googleapis.com", - "cloudkms.googleapis.com", - "compute.googleapis.com", - "dataflow.googleapis.com", - "datalineage.googleapis.com", - "dlp.googleapis.com", - "pubsub.googleapis.com", - "servicenetworking.googleapis.com", - "storage.googleapis.com", - "storage-component.googleapis.com" - ]) - service_encryption_key_ids = { - "pubsub.googleapis.com" = compact([var.service_encryption_keys.pubsub]) - "dataflow.googleapis.com" = compact([var.service_encryption_keys.dataflow]) - "storage.googleapis.com" = compact([var.service_encryption_keys.storage]) - } - shared_vpc_service_config = local.shared_vpc_project == null ? null : { - attach = true - host_project = local.shared_vpc_project - } -} - -module "load-sa-df-0" { - source = "../../../modules/iam-service-account" - project_id = module.load-project.project_id - prefix = var.prefix - name = "load-df-0" - display_name = "Data platform Dataflow load service account" - iam = { - "roles/iam.serviceAccountTokenCreator" = [ - local.groups_iam.data-engineers, - module.orch-sa-cmp-0.iam_email - ], - "roles/iam.serviceAccountUser" = [ - module.orch-sa-cmp-0.iam_email - ] - } -} - -module "load-cs-df-0" { - source = "../../../modules/gcs" - project_id = module.load-project.project_id - prefix = var.prefix - name = "load-cs-0" - location = var.location - storage_class = "MULTI_REGIONAL" - encryption_key = var.service_encryption_keys.storage - force_destroy = !var.deletion_protection -} - -module "load-vpc" { - source = "../../../modules/net-vpc" - count = local.use_shared_vpc ? 0 : 1 - project_id = module.load-project.project_id - name = "${var.prefix}-lod" - subnets = [ - { - ip_cidr_range = "10.10.0.0/24" - name = "${var.prefix}-lod" - region = var.region - } - ] -} - -module "load-vpc-firewall" { - source = "../../../modules/net-vpc-firewall" - count = local.use_shared_vpc ? 0 : 1 - project_id = module.load-project.project_id - network = module.load-vpc[0].name - default_rules_config = { - admin_ranges = ["10.10.0.0/24"] - } -} - -module "load-nat" { - source = "../../../modules/net-cloudnat" - count = local.use_shared_vpc ? 0 : 1 - project_id = module.load-project.project_id - name = "${var.prefix}-lod" - region = var.region - router_network = module.load-vpc[0].name -} diff --git a/blueprints/data-solutions/data-platform-foundations/03-composer.tf b/blueprints/data-solutions/data-platform-foundations/03-composer.tf deleted file mode 100644 index 0d8a7ee6e..000000000 --- a/blueprints/data-solutions/data-platform-foundations/03-composer.tf +++ /dev/null @@ -1,152 +0,0 @@ -# Copyright 2023 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# tfdoc:file:description Orchestration Cloud Composer definition. - -locals { - _env_variables = { - BQ_LOCATION = var.location - DATA_CAT_TAGS = try(jsonencode(module.common-datacatalog.tags), "{}") - DF_KMS_KEY = try(var.service_encryption_keys.dataflow, "") - DRP_PRJ = module.drop-project.project_id - DRP_BQ = module.drop-bq-0.dataset_id - DRP_GCS = module.drop-cs-0.url - DRP_PS = module.drop-ps-0.id - DWH_LAND_PRJ = module.dwh-lnd-project.project_id - DWH_LAND_BQ_DATASET = module.dwh-lnd-bq-0.dataset_id - DWH_LAND_GCS = module.dwh-lnd-cs-0.url - DWH_CURATED_PRJ = module.dwh-cur-project.project_id - DWH_CURATED_BQ_DATASET = module.dwh-cur-bq-0.dataset_id - DWH_CURATED_GCS = module.dwh-cur-cs-0.url - DWH_CONFIDENTIAL_PRJ = module.dwh-conf-project.project_id - DWH_CONFIDENTIAL_BQ_DATASET = module.dwh-conf-bq-0.dataset_id - DWH_CONFIDENTIAL_GCS = module.dwh-conf-cs-0.url - GCP_REGION = var.region - LOD_PRJ = module.load-project.project_id - LOD_GCS_STAGING = module.load-cs-df-0.url - LOD_NET_VPC = local.load_vpc - LOD_NET_SUBNET = local.load_subnet - LOD_SA_DF = module.load-sa-df-0.email - ORC_PRJ = module.orch-project.project_id - ORC_GCS = module.orch-cs-0.url - ORC_GCS_TMP_DF = module.orch-cs-df-template.url - TRF_PRJ = module.transf-project.project_id - TRF_GCS_STAGING = module.transf-cs-df-0.url - TRF_NET_VPC = local.transf_vpc - TRF_NET_SUBNET = local.transf_subnet - TRF_SA_DF = module.transf-sa-df-0.email - TRF_SA_BQ = module.transf-sa-bq-0.email - } - env_variables = { - for k, v in merge( - try(var.composer_config.software_config.env_variables, null), - local._env_variables - ) : "AIRFLOW_VAR_${k}" => v - } -} -module "orch-sa-cmp-0" { - source = "../../../modules/iam-service-account" - project_id = module.orch-project.project_id - prefix = var.prefix - name = "orc-cmp-0" - display_name = "Data platform Composer service account" - iam = { - "roles/iam.serviceAccountTokenCreator" = [local.groups_iam.data-engineers] - "roles/iam.serviceAccountUser" = [module.orch-sa-cmp-0.iam_email] - } -} - -resource "google_composer_environment" "orch-cmp-0" { - count = var.composer_config.disable_deployment == true ? 0 : 1 - provider = google-beta - project = module.orch-project.project_id - name = "${var.prefix}-orc-cmp-0" - region = var.region - config { - software_config { - airflow_config_overrides = try(var.composer_config.software_config.airflow_config_overrides, null) - pypi_packages = try(var.composer_config.software_config.pypi_packages, null) - env_variables = local.env_variables - image_version = try(var.composer_config.software_config.image_version, null) - cloud_data_lineage_integration { - enabled = var.composer_config.software_config.cloud_data_lineage_integration - } - } - dynamic "workloads_config" { - for_each = (try(var.composer_config.workloads_config, null) != null ? { 1 = 1 } : {}) - - content { - scheduler { - cpu = try(var.composer_config.workloads_config.scheduler.cpu, null) - memory_gb = try(var.composer_config.workloads_config.scheduler.memory_gb, null) - storage_gb = try(var.composer_config.workloads_config.scheduler.storage_gb, null) - count = try(var.composer_config.workloads_config.scheduler.count, null) - } - web_server { - cpu = try(var.composer_config.workloads_config.web_server.cpu, null) - memory_gb = try(var.composer_config.workloads_config.web_server.memory_gb, null) - storage_gb = try(var.composer_config.workloads_config.web_server.storage_gb, null) - } - worker { - cpu = try(var.composer_config.workloads_config.worker.cpu, null) - memory_gb = try(var.composer_config.workloads_config.worker.memory_gb, null) - storage_gb = try(var.composer_config.workloads_config.worker.storage_gb, null) - min_count = try(var.composer_config.workloads_config.worker.min_count, null) - max_count = try(var.composer_config.workloads_config.worker.max_count, null) - } - } - } - - environment_size = var.composer_config.environment_size - - node_config { - network = local.orch_vpc - subnetwork = local.orch_subnet - service_account = module.orch-sa-cmp-0.email - enable_ip_masq_agent = "true" - tags = ["composer-worker"] - ip_allocation_policy { - cluster_secondary_range_name = try( - var.network_config.composer_secondary_ranges.pods, "pods" - ) - services_secondary_range_name = try( - var.network_config.composer_secondary_ranges.services, "services" - ) - } - } - private_environment_config { - enable_private_endpoint = "true" - cloud_sql_ipv4_cidr_block = try( - var.network_config.composer_ip_ranges.cloudsql, "10.20.10.0/24" - ) - master_ipv4_cidr_block = try( - var.network_config.composer_ip_ranges.gke_master, "10.20.11.0/28" - ) - } - dynamic "encryption_config" { - for_each = ( - try(var.service_encryption_keys[var.region], null) != null - ? { 1 = 1 } - : {} - ) - content { - kms_key_name = try(var.service_encryption_keys[var.region], null) - } - } - } - depends_on = [ - google_project_iam_member.shared_vpc, - module.orch-project - ] -} diff --git a/blueprints/data-solutions/data-platform-foundations/03-orchestration.tf b/blueprints/data-solutions/data-platform-foundations/03-orchestration.tf deleted file mode 100644 index 5663c3b1d..000000000 --- a/blueprints/data-solutions/data-platform-foundations/03-orchestration.tf +++ /dev/null @@ -1,195 +0,0 @@ -# Copyright 2024 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# tfdoc:file:description Orchestration project and VPC. - -locals { - orch_iam = { - data_engineers = [ - "roles/artifactregistry.admin", - "roles/bigquery.dataEditor", - "roles/bigquery.jobUser", - "roles/cloudbuild.builds.editor", - "roles/composer.admin", - "roles/composer.user", - "roles/composer.environmentAndStorageObjectAdmin", - "roles/iam.serviceAccountUser", - "roles/iap.httpsResourceAccessor", - "roles/serviceusage.serviceUsageConsumer", - "roles/storage.objectAdmin" - ] - robots_cloudbuild = [ - "roles/storage.objectAdmin" - ] - robots_composer = [ - "roles/composer.ServiceAgentV2Ext", - "roles/storage.objectAdmin" - ] - sa_df_build = [ - "roles/cloudbuild.serviceAgent", - "roles/storage.objectAdmin" - ] - sa_load = [ - "roles/artifactregistry.reader", - "roles/bigquery.dataEditor", - "roles/storage.objectViewer" - ] - sa_orch = [ - "roles/bigquery.jobUser", - "roles/composer.worker", - "roles/iam.serviceAccountUser", - "roles/storage.objectAdmin" - ] - sa_transf_df = [ - "roles/bigquery.dataEditor" - ] - } -} - -module "orch-project" { - source = "../../../modules/project" - parent = var.project_config.parent - billing_account = var.project_config.billing_account_id - project_reuse = var.project_config.project_create ? null : {} - prefix = local.use_projects ? null : var.prefix - name = ( - local.use_projects - ? var.project_config.project_ids.orc - : "${var.project_config.project_ids.orc}${local.project_suffix}" - ) - iam = local.use_projects ? {} : local.orch_iam_auth - iam_bindings_additive = !local.use_projects ? {} : local.orch_iam_additive - - services = concat(var.project_services, [ - "artifactregistry.googleapis.com", - "bigquery.googleapis.com", - "bigqueryreservation.googleapis.com", - "bigquerystorage.googleapis.com", - "cloudbuild.googleapis.com", - "cloudkms.googleapis.com", - "composer.googleapis.com", - "compute.googleapis.com", - "container.googleapis.com", - "containerregistry.googleapis.com", - "artifactregistry.googleapis.com", - "dataflow.googleapis.com", - "datalineage.googleapis.com", - "orgpolicy.googleapis.com", - "pubsub.googleapis.com", - "servicenetworking.googleapis.com", - "storage.googleapis.com", - "storage-component.googleapis.com" - ]) - service_encryption_key_ids = { - "composer.googleapis.com" = compact([var.service_encryption_keys.composer]) - "storage.googleapis.com" = compact([var.service_encryption_keys.storage]) - } - shared_vpc_service_config = local.shared_vpc_project == null ? null : { - attach = true - host_project = local.shared_vpc_project - } -} - -module "orch-cs-0" { - source = "../../../modules/gcs" - project_id = module.orch-project.project_id - prefix = var.prefix - name = "orc-cs-0" - location = var.location - storage_class = "MULTI_REGIONAL" - encryption_key = var.service_encryption_keys.storage - force_destroy = !var.deletion_protection -} - -module "orch-vpc" { - source = "../../../modules/net-vpc" - count = local.use_shared_vpc ? 0 : 1 - project_id = module.orch-project.project_id - name = "${var.prefix}-orch" - subnets = [ - { - ip_cidr_range = "10.10.0.0/24" - name = "${var.prefix}-orch" - region = var.region - secondary_ip_ranges = { - pods = "10.10.8.0/22" - services = "10.10.12.0/24" - } - } - ] -} - -module "orch-vpc-firewall" { - source = "../../../modules/net-vpc-firewall" - count = local.use_shared_vpc ? 0 : 1 - project_id = module.orch-project.project_id - network = module.orch-vpc[0].name - default_rules_config = { - admin_ranges = ["10.10.0.0/24"] - } -} - -module "orch-nat" { - count = local.use_shared_vpc ? 0 : 1 - source = "../../../modules/net-cloudnat" - project_id = module.orch-project.project_id - name = "${var.prefix}-orch" - region = var.region - router_network = module.orch-vpc[0].name -} - -module "orch-artifact-reg" { - source = "../../../modules/artifact-registry" - project_id = module.orch-project.project_id - name = "${var.prefix}-app-images" - location = var.region - description = "Docker repository storing application images e.g. Dataflow, Cloud Run etc..." - format = { docker = { standard = {} } } -} - -module "orch-cs-df-template" { - source = "../../../modules/gcs" - project_id = module.orch-project.project_id - prefix = var.prefix - name = "orc-cs-df-template" - location = var.location - storage_class = "MULTI_REGIONAL" - encryption_key = var.service_encryption_keys.storage - force_destroy = !var.deletion_protection -} - -module "orch-cs-build-staging" { - source = "../../../modules/gcs" - project_id = module.orch-project.project_id - prefix = var.prefix - name = "orc-cs-build-staging" - location = var.location - storage_class = "MULTI_REGIONAL" - encryption_key = var.service_encryption_keys.storage - force_destroy = !var.deletion_protection -} - -module "orch-sa-df-build" { - source = "../../../modules/iam-service-account" - project_id = module.orch-project.project_id - prefix = var.prefix - name = "orc-sa-df-build" - display_name = "Data platform Dataflow build service account" - # Note values below should pertain to the system / group / users who are able to - # invoke the build via this service account - iam = { - "roles/iam.serviceAccountTokenCreator" = [local.groups_iam.data-engineers] - "roles/iam.serviceAccountUser" = [local.groups_iam.data-engineers] - } -} diff --git a/blueprints/data-solutions/data-platform-foundations/04-transformation.tf b/blueprints/data-solutions/data-platform-foundations/04-transformation.tf deleted file mode 100644 index 2a45def1a..000000000 --- a/blueprints/data-solutions/data-platform-foundations/04-transformation.tf +++ /dev/null @@ -1,151 +0,0 @@ -# Copyright 2024 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# tfdoc:file:description Transformation project and VPC. - -locals { - trf_iam = { - data_engineers = [ - "roles/bigquery.jobUser", - "roles/dataflow.admin" - ] - robots_dataflow_trf = [ - "roles/storage.objectAdmin" - ] - sa_orch = [ - "roles/dataflow.admin" - ] - sa_transf_bq = [ - "roles/bigquery.jobUser" - ] - sa_transf_df = [ - "roles/dataflow.worker", - "roles/storage.objectAdmin" - ] - } -} - -module "transf-project" { - source = "../../../modules/project" - parent = var.project_config.parent - billing_account = var.project_config.billing_account_id - project_reuse = var.project_config.project_create ? null : {} - prefix = local.use_projects ? null : var.prefix - name = ( - local.use_projects - ? var.project_config.project_ids.trf - : "${var.project_config.project_ids.trf}${local.project_suffix}" - ) - iam = local.use_projects ? {} : local.trf_iam_auth - iam_bindings_additive = !local.use_projects ? {} : local.trf_iam_additive - services = concat(var.project_services, [ - "bigquery.googleapis.com", - "bigqueryreservation.googleapis.com", - "bigquerystorage.googleapis.com", - "cloudkms.googleapis.com", - "compute.googleapis.com", - "dataflow.googleapis.com", - "dlp.googleapis.com", - "pubsub.googleapis.com", - "servicenetworking.googleapis.com", - "storage.googleapis.com", - "storage-component.googleapis.com" - ]) - service_encryption_key_ids = { - "dataflow.googleapis.com" = compact([var.service_encryption_keys.dataflow]) - "storage.googleapis.com" = compact([var.service_encryption_keys.storage]) - } - shared_vpc_service_config = local.shared_vpc_project == null ? null : { - attach = true - host_project = local.shared_vpc_project - } -} - -module "transf-sa-df-0" { - source = "../../../modules/iam-service-account" - project_id = module.transf-project.project_id - prefix = var.prefix - name = "trf-df-0" - display_name = "Data platform Dataflow transformation service account" - iam = { - "roles/iam.serviceAccountTokenCreator" = [ - local.groups_iam.data-engineers, - module.orch-sa-cmp-0.iam_email - ], - "roles/iam.serviceAccountUser" = [ - module.orch-sa-cmp-0.iam_email - ] - } -} - -module "transf-cs-df-0" { - source = "../../../modules/gcs" - project_id = module.transf-project.project_id - prefix = var.prefix - name = "trf-cs-0" - location = var.location - storage_class = "MULTI_REGIONAL" - encryption_key = var.service_encryption_keys.storage - force_destroy = !var.deletion_protection -} - -module "transf-sa-bq-0" { - source = "../../../modules/iam-service-account" - project_id = module.transf-project.project_id - prefix = var.prefix - name = "trf-bq-0" - display_name = "Data platform BigQuery transformation service account" - iam = { - "roles/iam.serviceAccountTokenCreator" = [ - local.groups_iam.data-engineers, - module.orch-sa-cmp-0.iam_email - ], - "roles/iam.serviceAccountUser" = [ - module.orch-sa-cmp-0.iam_email - ] - } -} - -module "transf-vpc" { - source = "../../../modules/net-vpc" - count = local.use_shared_vpc ? 0 : 1 - project_id = module.transf-project.project_id - name = "${var.prefix}-trf" - subnets = [ - { - ip_cidr_range = "10.10.0.0/24" - name = "${var.prefix}-trf" - region = var.region - } - ] -} - -module "transf-vpc-firewall" { - source = "../../../modules/net-vpc-firewall" - count = local.use_shared_vpc ? 0 : 1 - project_id = module.transf-project.project_id - network = module.transf-vpc[0].name - default_rules_config = { - admin_ranges = ["10.10.0.0/24"] - } -} - -module "transf-nat" { - source = "../../../modules/net-cloudnat" - count = local.use_shared_vpc ? 0 : 1 - project_id = module.transf-project.project_id - name = "${var.prefix}-trf" - region = var.region - router_network = module.transf-vpc[0].name -} diff --git a/blueprints/data-solutions/data-platform-foundations/05-datawarehouse.tf b/blueprints/data-solutions/data-platform-foundations/05-datawarehouse.tf deleted file mode 100644 index f58dc034a..000000000 --- a/blueprints/data-solutions/data-platform-foundations/05-datawarehouse.tf +++ /dev/null @@ -1,182 +0,0 @@ -# Copyright 2024 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# tfdoc:file:description Data Warehouse projects. - -locals { - dwh_iam = { - data_analysts = [ - "roles/bigquery.dataViewer", - "roles/bigquery.jobUser", - "roles/datacatalog.tagTemplateViewer", - "roles/datacatalog.viewer", - "roles/storage.objectViewer" - ] - data_engineers = [ - "roles/bigquery.dataViewer", - "roles/bigquery.jobUser", - "roles/datacatalog.tagTemplateViewer", - "roles/datacatalog.viewer", - "roles/storage.objectViewer" - ] - sa_transf_bq = [ - "roles/bigquery.dataOwner", - "roles/bigquery.jobUser" - ] - sa_transf_df = [ - "roles/bigquery.dataOwner", - "roles/storage.objectAdmin" - ] - } - lnd_iam = { - data_engineers = [ - "roles/bigquery.dataViewer", - "roles/bigquery.jobUser", - "roles/datacatalog.tagTemplateViewer", - "roles/datacatalog.viewer", - "roles/storage.objectViewer" - ] - sa_load = [ - "roles/bigquery.dataOwner", - "roles/bigquery.jobUser", - "roles/storage.objectCreator" - ] - sa_transf_bq = [ - "roles/bigquery.dataViewer", - "roles/datacatalog.categoryAdmin" - ] - sa_transf_df = [ - "roles/bigquery.dataViewer" - ] - } -} - -# Project - -module "dwh-lnd-project" { - source = "../../../modules/project" - parent = var.project_config.parent - billing_account = var.project_config.billing_account_id - project_reuse = var.project_config.project_create ? null : {} - prefix = local.use_projects ? null : var.prefix - name = ( - local.use_projects - ? var.project_config.project_ids.dwh-lnd - : "${var.project_config.project_ids.dwh-lnd}${local.project_suffix}" - ) - iam = local.use_projects ? {} : local.lnd_iam_auth - iam_bindings_additive = !local.use_projects ? {} : local.lnd_iam_additive - services = local.dwh_services - service_encryption_key_ids = { - "bigquery.googleapis.com" = compact([var.service_encryption_keys.bq]) - "storage.googleapis.com" = compact([var.service_encryption_keys.storage]) - } -} - -module "dwh-cur-project" { - source = "../../../modules/project" - parent = var.project_config.parent - billing_account = var.project_config.billing_account_id - project_reuse = var.project_config.project_create ? null : {} - prefix = local.use_projects ? null : var.prefix - name = ( - local.use_projects - ? var.project_config.project_ids.dwh-cur - : "${var.project_config.project_ids.dwh-cur}${local.project_suffix}" - ) - iam = local.use_projects ? {} : local.dwh_iam_auth - iam_bindings_additive = !local.use_projects ? {} : local.dwh_iam_additive - services = local.dwh_services - service_encryption_key_ids = { - "bigquery.googleapis.com" = compact([var.service_encryption_keys.bq]) - "storage.googleapis.com" = compact([var.service_encryption_keys.storage]) - } -} - -module "dwh-conf-project" { - source = "../../../modules/project" - parent = var.project_config.parent - billing_account = var.project_config.billing_account_id - project_reuse = var.project_config.project_create ? null : {} - prefix = local.use_projects ? null : var.prefix - name = ( - local.use_projects - ? var.project_config.project_ids.dwh-conf - : "${var.project_config.project_ids.dwh-conf}${local.project_suffix}" - ) - iam = local.use_projects ? {} : local.dwh_iam_auth - iam_bindings_additive = !local.use_projects ? {} : local.dwh_iam_additive - services = local.dwh_services - service_encryption_key_ids = { - "bigquery.googleapis.com" = compact([var.service_encryption_keys.bq]) - "storage.googleapis.com" = compact([var.service_encryption_keys.storage]) - } -} - -module "dwh-lnd-bq-0" { - source = "../../../modules/bigquery-dataset" - project_id = module.dwh-lnd-project.project_id - id = "${replace(var.prefix, "-", "_")}_dwh_lnd_bq_0" - location = var.location - encryption_key = var.service_encryption_keys.bq -} - -module "dwh-cur-bq-0" { - source = "../../../modules/bigquery-dataset" - project_id = module.dwh-cur-project.project_id - id = "${replace(var.prefix, "-", "_")}_dwh_cur_bq_0" - location = var.location - encryption_key = var.service_encryption_keys.bq -} - -module "dwh-conf-bq-0" { - source = "../../../modules/bigquery-dataset" - project_id = module.dwh-conf-project.project_id - id = "${replace(var.prefix, "-", "_")}_dwh_conf_bq_0" - location = var.location - encryption_key = var.service_encryption_keys.bq -} - -module "dwh-lnd-cs-0" { - source = "../../../modules/gcs" - project_id = module.dwh-lnd-project.project_id - prefix = var.prefix - name = "dwh-lnd-cs-0" - location = var.location - storage_class = "MULTI_REGIONAL" - encryption_key = var.service_encryption_keys.storage - force_destroy = !var.deletion_protection -} - -module "dwh-cur-cs-0" { - source = "../../../modules/gcs" - project_id = module.dwh-cur-project.project_id - prefix = var.prefix - name = "dwh-cur-cs-0" - location = var.location - storage_class = "MULTI_REGIONAL" - encryption_key = var.service_encryption_keys.storage - force_destroy = !var.deletion_protection -} - -module "dwh-conf-cs-0" { - source = "../../../modules/gcs" - project_id = module.dwh-conf-project.project_id - prefix = var.prefix - name = "dwh-conf-cs-0" - location = var.location - storage_class = "MULTI_REGIONAL" - encryption_key = var.service_encryption_keys.storage - force_destroy = !var.deletion_protection -} diff --git a/blueprints/data-solutions/data-platform-foundations/06-common.tf b/blueprints/data-solutions/data-platform-foundations/06-common.tf deleted file mode 100644 index 6ebf2adf1..000000000 --- a/blueprints/data-solutions/data-platform-foundations/06-common.tf +++ /dev/null @@ -1,115 +0,0 @@ -# Copyright 2023 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# tfdoc:file:description common project. - -locals { - cmn_iam = { - data_analysts = [ - # uncomment if access to all tagged columns is needed - # "roles/datacatalog.categoryFineGrainedReader", - "roles/datacatalog.viewer" - ] - data_engineers = [ - "roles/dlp.estimatesAdmin", - "roles/dlp.reader", - "roles/dlp.user" - ] - data_security = [ - "roles/datacatalog.admin", - "roles/dlp.admin" - ] - sa_load = [ - "roles/datacatalog.viewer", - "roles/dlp.user" - ] - sa_transf_bq = [ - "roles/datacatalog.categoryFineGrainedReader", - "roles/datacatalog.viewer" - ] - sa_transf_df = [ - "roles/datacatalog.categoryFineGrainedReader", - "roles/datacatalog.viewer", - "roles/dlp.user" - ] - } -} - -module "common-project" { - source = "../../../modules/project" - parent = var.project_config.parent - billing_account = var.project_config.billing_account_id - project_reuse = var.project_config.project_create ? null : {} - prefix = local.use_projects ? null : var.prefix - name = ( - local.use_projects - ? var.project_config.project_ids.common - : "${var.project_config.project_ids.common}${local.project_suffix}" - ) - iam = local.use_projects ? {} : local.cmn_iam_auth - iam_bindings_additive = !local.use_projects ? {} : local.cmn_iam_additive - services = concat(var.project_services, [ - "datacatalog.googleapis.com", - "dlp.googleapis.com", - ]) -} - -module "common-datacatalog" { - source = "../../../modules/data-catalog-policy-tag" - project_id = module.common-project.project_id - name = "${var.prefix}-datacatalog-policy-tags" - location = var.location - tags = var.data_catalog_tags -} - -# To create KMS keys in the common project: uncomment this section -# and assign key links accondingly in local.service_encryption_keys variable - -# module "cmn-kms-0" { -# source = "../../../modules/kms" -# project_id = module.common-project.project_id -# keyring = { -# name = "${var.prefix}-kr-global", -# location = "global" -# } -# keys = { -# pubsub = null -# } -# } - -# module "cmn-kms-1" { -# source = "../../../modules/kms" -# project_id = module.common-project.project_id -# keyring = { -# name = "${var.prefix}-kr-mregional", -# location = var.location -# } -# keys = { -# bq = null -# storage = null -# } -# } - -# module "cmn-kms-2" { -# source = "../../../modules/kms" -# project_id = module.cmn-prj.project_id -# keyring = { -# name = "${var.prefix}-kr-regional", -# location = var.region -# } -# keys = { -# composer = null -# dataflow = null -# } -# } diff --git a/blueprints/data-solutions/data-platform-foundations/07-exposure.tf b/blueprints/data-solutions/data-platform-foundations/07-exposure.tf deleted file mode 100644 index 8418080ff..000000000 --- a/blueprints/data-solutions/data-platform-foundations/07-exposure.tf +++ /dev/null @@ -1,28 +0,0 @@ -# Copyright 2023 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# tfdoc:file:description common project. - -module "exp-project" { - source = "../../../modules/project" - parent = var.project_config.parent - billing_account = var.project_config.billing_account_id - project_reuse = var.project_config.project_create ? null : {} - prefix = local.use_projects ? null : var.prefix - name = ( - local.use_projects - ? var.project_config.project_ids.exp - : "${var.project_config.project_ids.exp}${local.project_suffix}" - ) -} diff --git a/blueprints/data-solutions/data-platform-foundations/IAM.md b/blueprints/data-solutions/data-platform-foundations/IAM.md deleted file mode 100644 index ffc680837..000000000 --- a/blueprints/data-solutions/data-platform-foundations/IAM.md +++ /dev/null @@ -1,89 +0,0 @@ -# IAM bindings reference - -Legend: + additive, conditional. - -## Project cmn - -| members | roles | -|---|---| -|gcp-data-analysts
group|[roles/datacatalog.viewer](https://cloud.google.com/iam/docs/understanding-roles#datacatalog.viewer) | -|gcp-data-engineers
group|[roles/dlp.estimatesAdmin](https://cloud.google.com/iam/docs/understanding-roles#dlp.estimatesAdmin)
[roles/dlp.reader](https://cloud.google.com/iam/docs/understanding-roles#dlp.reader)
[roles/dlp.user](https://cloud.google.com/iam/docs/understanding-roles#dlp.user) | -|gcp-data-security
group|[roles/datacatalog.admin](https://cloud.google.com/iam/docs/understanding-roles#datacatalog.admin)
[roles/dlp.admin](https://cloud.google.com/iam/docs/understanding-roles#dlp.admin) | -|load-df-0
serviceAccount|[roles/datacatalog.viewer](https://cloud.google.com/iam/docs/understanding-roles#datacatalog.viewer)
[roles/dlp.user](https://cloud.google.com/iam/docs/understanding-roles#dlp.user) | -|trf-bq-0
serviceAccount|[roles/datacatalog.categoryFineGrainedReader](https://cloud.google.com/iam/docs/understanding-roles#datacatalog.categoryFineGrainedReader)
[roles/datacatalog.viewer](https://cloud.google.com/iam/docs/understanding-roles#datacatalog.viewer) | -|trf-df-0
serviceAccount|[roles/datacatalog.categoryFineGrainedReader](https://cloud.google.com/iam/docs/understanding-roles#datacatalog.categoryFineGrainedReader)
[roles/datacatalog.viewer](https://cloud.google.com/iam/docs/understanding-roles#datacatalog.viewer)
[roles/dlp.user](https://cloud.google.com/iam/docs/understanding-roles#dlp.user) | - -## Project drp - -| members | roles | -|---|---| -|gcp-data-engineers
group|[roles/bigquery.dataEditor](https://cloud.google.com/iam/docs/understanding-roles#bigquery.dataEditor)
[roles/bigquery.user](https://cloud.google.com/iam/docs/understanding-roles#bigquery.user) | -|drp-bq-0
serviceAccount|[roles/bigquery.dataEditor](https://cloud.google.com/iam/docs/understanding-roles#bigquery.dataEditor) | -|drp-cs-0
serviceAccount|[roles/storage.objectCreator](https://cloud.google.com/iam/docs/understanding-roles#storage.objectCreator) | -|drp-ps-0
serviceAccount|[roles/pubsub.publisher](https://cloud.google.com/iam/docs/understanding-roles#pubsub.publisher) | -|load-df-0
serviceAccount|[roles/bigquery.user](https://cloud.google.com/iam/docs/understanding-roles#bigquery.user)
[roles/pubsub.subscriber](https://cloud.google.com/iam/docs/understanding-roles#pubsub.subscriber)
[roles/storage.objectAdmin](https://cloud.google.com/iam/docs/understanding-roles#storage.objectAdmin) | -|orc-cmp-0
serviceAccount|[roles/pubsub.subscriber](https://cloud.google.com/iam/docs/understanding-roles#pubsub.subscriber)
[roles/storage.objectViewer](https://cloud.google.com/iam/docs/understanding-roles#storage.objectViewer) | - -## Project dwh-conf - -| members | roles | -|---|---| -|gcp-data-analysts
group|[roles/bigquery.dataViewer](https://cloud.google.com/iam/docs/understanding-roles#bigquery.dataViewer)
[roles/bigquery.jobUser](https://cloud.google.com/iam/docs/understanding-roles#bigquery.jobUser)
[roles/datacatalog.tagTemplateViewer](https://cloud.google.com/iam/docs/understanding-roles#datacatalog.tagTemplateViewer)
[roles/datacatalog.viewer](https://cloud.google.com/iam/docs/understanding-roles#datacatalog.viewer)
[roles/storage.objectViewer](https://cloud.google.com/iam/docs/understanding-roles#storage.objectViewer) | -|gcp-data-engineers
group|[roles/bigquery.dataViewer](https://cloud.google.com/iam/docs/understanding-roles#bigquery.dataViewer)
[roles/bigquery.jobUser](https://cloud.google.com/iam/docs/understanding-roles#bigquery.jobUser)
[roles/datacatalog.tagTemplateViewer](https://cloud.google.com/iam/docs/understanding-roles#datacatalog.tagTemplateViewer)
[roles/datacatalog.viewer](https://cloud.google.com/iam/docs/understanding-roles#datacatalog.viewer)
[roles/storage.objectViewer](https://cloud.google.com/iam/docs/understanding-roles#storage.objectViewer) | -|SERVICE_IDENTITY_service-networking
serviceAccount|[roles/servicenetworking.serviceAgent](https://cloud.google.com/iam/docs/understanding-roles#servicenetworking.serviceAgent) +| -|trf-bq-0
serviceAccount|[roles/bigquery.dataOwner](https://cloud.google.com/iam/docs/understanding-roles#bigquery.dataOwner)
[roles/bigquery.jobUser](https://cloud.google.com/iam/docs/understanding-roles#bigquery.jobUser) | -|trf-df-0
serviceAccount|[roles/bigquery.dataOwner](https://cloud.google.com/iam/docs/understanding-roles#bigquery.dataOwner)
[roles/storage.objectAdmin](https://cloud.google.com/iam/docs/understanding-roles#storage.objectAdmin) | - -## Project dwh-cur - -| members | roles | -|---|---| -|gcp-data-analysts
group|[roles/bigquery.dataViewer](https://cloud.google.com/iam/docs/understanding-roles#bigquery.dataViewer)
[roles/bigquery.jobUser](https://cloud.google.com/iam/docs/understanding-roles#bigquery.jobUser)
[roles/datacatalog.tagTemplateViewer](https://cloud.google.com/iam/docs/understanding-roles#datacatalog.tagTemplateViewer)
[roles/datacatalog.viewer](https://cloud.google.com/iam/docs/understanding-roles#datacatalog.viewer)
[roles/storage.objectViewer](https://cloud.google.com/iam/docs/understanding-roles#storage.objectViewer) | -|gcp-data-engineers
group|[roles/bigquery.dataViewer](https://cloud.google.com/iam/docs/understanding-roles#bigquery.dataViewer)
[roles/bigquery.jobUser](https://cloud.google.com/iam/docs/understanding-roles#bigquery.jobUser)
[roles/datacatalog.tagTemplateViewer](https://cloud.google.com/iam/docs/understanding-roles#datacatalog.tagTemplateViewer)
[roles/datacatalog.viewer](https://cloud.google.com/iam/docs/understanding-roles#datacatalog.viewer)
[roles/storage.objectViewer](https://cloud.google.com/iam/docs/understanding-roles#storage.objectViewer) | -|SERVICE_IDENTITY_service-networking
serviceAccount|[roles/servicenetworking.serviceAgent](https://cloud.google.com/iam/docs/understanding-roles#servicenetworking.serviceAgent) +| -|trf-bq-0
serviceAccount|[roles/bigquery.dataOwner](https://cloud.google.com/iam/docs/understanding-roles#bigquery.dataOwner)
[roles/bigquery.jobUser](https://cloud.google.com/iam/docs/understanding-roles#bigquery.jobUser) | -|trf-df-0
serviceAccount|[roles/bigquery.dataOwner](https://cloud.google.com/iam/docs/understanding-roles#bigquery.dataOwner)
[roles/storage.objectAdmin](https://cloud.google.com/iam/docs/understanding-roles#storage.objectAdmin) | - -## Project dwh-lnd - -| members | roles | -|---|---| -|gcp-data-engineers
group|[roles/bigquery.dataViewer](https://cloud.google.com/iam/docs/understanding-roles#bigquery.dataViewer)
[roles/bigquery.jobUser](https://cloud.google.com/iam/docs/understanding-roles#bigquery.jobUser)
[roles/datacatalog.tagTemplateViewer](https://cloud.google.com/iam/docs/understanding-roles#datacatalog.tagTemplateViewer)
[roles/datacatalog.viewer](https://cloud.google.com/iam/docs/understanding-roles#datacatalog.viewer)
[roles/storage.objectViewer](https://cloud.google.com/iam/docs/understanding-roles#storage.objectViewer) | -|SERVICE_IDENTITY_service-networking
serviceAccount|[roles/servicenetworking.serviceAgent](https://cloud.google.com/iam/docs/understanding-roles#servicenetworking.serviceAgent) +| -|load-df-0
serviceAccount|[roles/bigquery.dataOwner](https://cloud.google.com/iam/docs/understanding-roles#bigquery.dataOwner)
[roles/bigquery.jobUser](https://cloud.google.com/iam/docs/understanding-roles#bigquery.jobUser)
[roles/storage.objectCreator](https://cloud.google.com/iam/docs/understanding-roles#storage.objectCreator) | -|trf-bq-0
serviceAccount|[roles/bigquery.dataViewer](https://cloud.google.com/iam/docs/understanding-roles#bigquery.dataViewer)
[roles/datacatalog.categoryAdmin](https://cloud.google.com/iam/docs/understanding-roles#datacatalog.categoryAdmin) | -|trf-df-0
serviceAccount|[roles/bigquery.dataViewer](https://cloud.google.com/iam/docs/understanding-roles#bigquery.dataViewer) | - -## Project lod - -| members | roles | -|---|---| -|gcp-data-engineers
group|[roles/dataflow.admin](https://cloud.google.com/iam/docs/understanding-roles#dataflow.admin)
[roles/dataflow.developer](https://cloud.google.com/iam/docs/understanding-roles#dataflow.developer) | -|SERVICE_IDENTITY_dataflow-service-producer-prod
serviceAccount|[roles/storage.objectAdmin](https://cloud.google.com/iam/docs/understanding-roles#storage.objectAdmin) | -|SERVICE_IDENTITY_service-networking
serviceAccount|[roles/servicenetworking.serviceAgent](https://cloud.google.com/iam/docs/understanding-roles#servicenetworking.serviceAgent) +| -|load-df-0
serviceAccount|[roles/bigquery.jobUser](https://cloud.google.com/iam/docs/understanding-roles#bigquery.jobUser)
[roles/dataflow.admin](https://cloud.google.com/iam/docs/understanding-roles#dataflow.admin)
[roles/dataflow.worker](https://cloud.google.com/iam/docs/understanding-roles#dataflow.worker)
[roles/storage.objectAdmin](https://cloud.google.com/iam/docs/understanding-roles#storage.objectAdmin) | -|orc-cmp-0
serviceAccount|[roles/dataflow.admin](https://cloud.google.com/iam/docs/understanding-roles#dataflow.admin) | - -## Project orc - -| members | roles | -|---|---| -|gcp-data-engineers
group|[roles/artifactregistry.admin](https://cloud.google.com/iam/docs/understanding-roles#artifactregistry.admin)
[roles/bigquery.dataEditor](https://cloud.google.com/iam/docs/understanding-roles#bigquery.dataEditor)
[roles/bigquery.jobUser](https://cloud.google.com/iam/docs/understanding-roles#bigquery.jobUser)
[roles/cloudbuild.builds.editor](https://cloud.google.com/iam/docs/understanding-roles#cloudbuild.builds.editor)
[roles/composer.admin](https://cloud.google.com/iam/docs/understanding-roles#composer.admin)
[roles/composer.environmentAndStorageObjectAdmin](https://cloud.google.com/iam/docs/understanding-roles#composer.environmentAndStorageObjectAdmin)
[roles/composer.user](https://cloud.google.com/iam/docs/understanding-roles#composer.user)
[roles/iam.serviceAccountUser](https://cloud.google.com/iam/docs/understanding-roles#iam.serviceAccountUser)
[roles/iap.httpsResourceAccessor](https://cloud.google.com/iam/docs/understanding-roles#iap.httpsResourceAccessor)
[roles/serviceusage.serviceUsageConsumer](https://cloud.google.com/iam/docs/understanding-roles#serviceusage.serviceUsageConsumer)
[roles/storage.objectAdmin](https://cloud.google.com/iam/docs/understanding-roles#storage.objectAdmin) | -|SERVICE_IDENTITY_cloudcomposer-accounts
serviceAccount|[roles/composer.ServiceAgentV2Ext](https://cloud.google.com/iam/docs/understanding-roles#composer.ServiceAgentV2Ext)
[roles/storage.objectAdmin](https://cloud.google.com/iam/docs/understanding-roles#storage.objectAdmin) | -|SERVICE_IDENTITY_gcp-sa-cloudbuild
serviceAccount|[roles/storage.objectAdmin](https://cloud.google.com/iam/docs/understanding-roles#storage.objectAdmin) | -|SERVICE_IDENTITY_service-networking
serviceAccount|[roles/servicenetworking.serviceAgent](https://cloud.google.com/iam/docs/understanding-roles#servicenetworking.serviceAgent) +| -|load-df-0
serviceAccount|[roles/artifactregistry.reader](https://cloud.google.com/iam/docs/understanding-roles#artifactregistry.reader)
[roles/bigquery.dataEditor](https://cloud.google.com/iam/docs/understanding-roles#bigquery.dataEditor)
[roles/storage.objectViewer](https://cloud.google.com/iam/docs/understanding-roles#storage.objectViewer) | -|orc-cmp-0
serviceAccount|[roles/bigquery.jobUser](https://cloud.google.com/iam/docs/understanding-roles#bigquery.jobUser)
[roles/composer.worker](https://cloud.google.com/iam/docs/understanding-roles#composer.worker)
[roles/iam.serviceAccountUser](https://cloud.google.com/iam/docs/understanding-roles#iam.serviceAccountUser)
[roles/storage.objectAdmin](https://cloud.google.com/iam/docs/understanding-roles#storage.objectAdmin) | -|orc-sa-df-build
serviceAccount|[roles/cloudbuild.serviceAgent](https://cloud.google.com/iam/docs/understanding-roles#cloudbuild.serviceAgent)
[roles/storage.objectAdmin](https://cloud.google.com/iam/docs/understanding-roles#storage.objectAdmin) | -|trf-df-0
serviceAccount|[roles/bigquery.dataEditor](https://cloud.google.com/iam/docs/understanding-roles#bigquery.dataEditor) | - -## Project trf - -| members | roles | -|---|---| -|gcp-data-engineers
group|[roles/bigquery.jobUser](https://cloud.google.com/iam/docs/understanding-roles#bigquery.jobUser)
[roles/dataflow.admin](https://cloud.google.com/iam/docs/understanding-roles#dataflow.admin) | -|SERVICE_IDENTITY_dataflow-service-producer-prod
serviceAccount|[roles/storage.objectAdmin](https://cloud.google.com/iam/docs/understanding-roles#storage.objectAdmin) | -|SERVICE_IDENTITY_service-networking
serviceAccount|[roles/servicenetworking.serviceAgent](https://cloud.google.com/iam/docs/understanding-roles#servicenetworking.serviceAgent) +| -|orc-cmp-0
serviceAccount|[roles/dataflow.admin](https://cloud.google.com/iam/docs/understanding-roles#dataflow.admin) | -|trf-bq-0
serviceAccount|[roles/bigquery.jobUser](https://cloud.google.com/iam/docs/understanding-roles#bigquery.jobUser) | -|trf-df-0
serviceAccount|[roles/dataflow.worker](https://cloud.google.com/iam/docs/understanding-roles#dataflow.worker)
[roles/storage.objectAdmin](https://cloud.google.com/iam/docs/understanding-roles#storage.objectAdmin) | diff --git a/blueprints/data-solutions/data-platform-foundations/OWNERS b/blueprints/data-solutions/data-platform-foundations/OWNERS deleted file mode 100644 index 7a8679e69..000000000 --- a/blueprints/data-solutions/data-platform-foundations/OWNERS +++ /dev/null @@ -1 +0,0 @@ -lcaggio diff --git a/blueprints/data-solutions/data-platform-foundations/README.md b/blueprints/data-solutions/data-platform-foundations/README.md deleted file mode 100644 index 9d21e7959..000000000 --- a/blueprints/data-solutions/data-platform-foundations/README.md +++ /dev/null @@ -1,315 +0,0 @@ -# Data Platform - -This module implements an opinionated Data Platform Architecture that creates and setup projects and related resources that compose an end-to-end data environment. - -For a minimal Data Platform, please refer to the [Minimal Data Platform](../data-platform-minimal/) blueprint. - -The code is intentionally simple, as it's intended to provide a generic initial setup and then allow easy customizations to complete the implementation of the intended design. - -The following diagram is a high-level reference of the resources created and managed here: - -![Data Platform architecture overview](./images/overview_diagram.png "Data Platform architecture overview") - -A demo Airflow pipeline is also part of this blueprint: it can be built and run on top of the foundational infrastructure to verify or test the setup quickly. - -## Design overview and choices - -Despite its simplicity, this stage implements the basics of a design that we've seen working well for various customers. - -The approach adapts to different high-level requirements: - -- boundaries for each step -- clearly defined actors -- least privilege principle -- rely on service account impersonation - -The code in this blueprint doesn't address Organization-level configurations (Organization policy, VPC-SC, centralized logs). We expect those elements to be managed by automation stages external to this script like those in [FAST](../../../fast). - -### Project structure - -The Data Platform is designed to rely on several projects, one project per data stage. The stages identified are: - -- drop off -- load -- data warehouse -- orchestration -- transformation -- exposure - -This separation into projects allows adhering to the least-privilege principle by using project-level roles. - -The script will create the following projects: - -- **Drop off** Used to store temporary data. Data is pushed to Cloud Storage, BigQuery, or Cloud PubSub. Resources are configured with a customizable lifecycle policy. -- **Load** Used to load data from the drop off zone to the data warehouse. The load is made with minimal to zero transformation logic (mainly `cast`). Anonymization or tokenization of Personally Identifiable Information (PII) can be implemented here or in the transformation stage, depending on your requirements. The use of [Cloud Dataflow templates](https://cloud.google.com/dataflow/docs/concepts/dataflow-templates) is recommended. When you need to handle workloads from different teams, if strong role separation is needed between them, we suggest to customize the script and have separate `Load` projects. -- **Data Warehouse** Several projects distributed across 3 separate layers, to host progressively processed and refined data: - - **Landing - Raw data** Structured Data, stored in relevant formats: structured data stored in BigQuery, unstructured data stored on Cloud Storage with additional metadata stored in BigQuery (for example pictures stored in Cloud Storage and analysis of the images for Cloud Vision API stored in BigQuery). - - **Curated - Cleansed, aggregated and curated data** - - **Confidential - Curated and unencrypted layer** -- **Orchestration** Used to host Cloud Composer, which orchestrates all tasks that move data across layers. -- **Transformation** Used to move data between Data Warehouse layers. We strongly suggest relying on BigQuery Engine to perform the transformations. If BigQuery doesn't have the features needed to perform your transformations, you can use Cloud Dataflow with [Cloud Dataflow templates](https://cloud.google.com/dataflow/docs/concepts/dataflow-templates). This stage can also optionally anonymize or tokenize PII. When you need to handle workloads from different teams, if strong role separation is needed between them, we suggest to customize the script and have separate `Transformation` projects. -- **Exposure** Used to host resources that share processed data with external systems. Depending on the access pattern, data can be presented via Cloud SQL, BigQuery, or Bigtable. For BigQuery data, we strongly suggest relying on [Authorized views](https://cloud.google.com/bigquery/docs/authorized-views). - -### Roles - -We assign roles on resources at the project level, granting the appropriate roles via groups (humans) and service accounts (services and applications) according to best practices. - -### Service accounts - -Service account creation follows the least privilege principle, performing a single task which requires access to a defined set of resources. The table below shows a high level overview of roles for each service account on each data layer, using `READ` or `WRITE` access patterns for simplicity. For detailed roles please refer to the code. - -|Service Account|Drop off|DWH Landing|DWH Curated|DWH Confidential| -|-|:-:|:-:|:-:|:-:| -|`drop-sa`|`WRITE`|-|-|-| -|`load-sa`|`READ`|`READ`/`WRITE`|-|-| -|`transformation-sa`|-|`READ`/`WRITE`|`READ`/`WRITE`|`READ`/`WRITE`| -|`orchestration-sa`|-|-|-|-| - -A full reference of IAM roles managed by the Data Platform [is available here](./IAM.md). - -Using of service account keys within a data pipeline exposes to several security risks deriving from a credentials leak. This blueprint shows how to leverage impersonation to avoid the need of creating keys. - -### User groups - -User groups provide a stable frame of reference that allows decoupling the final set of permissions from the stage where entities and resources are created, and their IAM bindings defined. - -We use three groups to control access to resources: - -- *Data Engineers* They handle and run the Data Hub, with read access to all resources in order to troubleshoot possible issues with pipelines. This team can also impersonate any service account. -- *Data Analysts*. They perform analysis on datasets, with read access to the Data Warehouse Confidential project, and BigQuery READ/WRITE access to the playground project. -- *Data Security*:. They handle security configurations related to the Data Hub. This team has admin access to the common project to configure Cloud DLP templates or Data Catalog policy tags. - -The table below shows a high level overview of roles for each group on each project, using `READ`, `WRITE` and `ADMIN` access patterns for simplicity. For detailed roles please refer to the code. - -|Group|Drop off|Load|Transformation|DHW Landing|DWH Curated|DWH Confidential|Orchestration|Common| -|-|:-:|:-:|:-:|:-:|:-:|:-:|:-:|:-:| -|Data Engineers|`ADMIN`|`ADMIN`|`ADMIN`|`ADMIN`|`ADMIN`|`ADMIN`|`ADMIN`|`ADMIN`| -|Data Analysts|-|-|-|-|-|`READ`|-|-| -|Data Security|-|-|-|-|-|-|-|-|`ADMIN`| - -You can configure groups via the `groups` variable. - -### Virtual Private Cloud (VPC) design - -As is often the case in real-world configurations, this blueprint accepts as input an existing [Shared-VPC](https://cloud.google.com/vpc/docs/shared-vpc) via the `network_config` variable. Make sure that the GKE API (`container.googleapis.com`) is enabled in the VPC host project. - -If the `network_config` variable is not provided, one VPC will be created in each project that supports network resources (load, transformation and orchestration). - -### IP ranges and subnetting - -To deploy this blueprint with self-managed VPCs you need the following ranges: - -- one /24 for the load project VPC subnet used for Cloud Dataflow workers -- one /24 for the transformation VPC subnet used for Cloud Dataflow workers -- one /24 range for the orchestration VPC subnet used for Composer workers -- one /22 and one /24 ranges for the secondary ranges associated with the orchestration VPC subnet - -If you are using Shared VPC, you need one subnet with one /22 and one /24 secondary range defined for Composer pods and services. - -In both VPC scenarios, you also need these ranges for Composer: - -- one /24 for Cloud SQL -- one /28 for the GKE control plane - -### Resource naming conventions - -Resources follow the naming convention described below. - -- `prefix-layer` for projects -- `prefix-layer-product` for resources -- `prefix-layer[2]-gcp-product[2]-counter` for services and service accounts - -### Encryption - -We suggest a centralized approach to key management, where Organization Security is the only team that can access encryption material, and keyrings and keys are managed in a project external to the Data Platform. - -![Centralized Cloud Key Management high-level diagram](./images/kms_diagram.png "Centralized Cloud Key Management high-level diagram") - -To configure the use of Cloud KMS on resources, you have to specify the key id on the `service_encryption_keys` variable. Key locations should match resource locations. Example: - -```tfvars -service_encryption_keys = { - bq = "KEY_URL_MULTIREGIONAL" - composer = "KEY_URL_REGIONAL" - dataflow = "KEY_URL_REGIONAL" - storage = "KEY_URL_MULTIREGIONAL" - pubsub = "KEY_URL_MULTIREGIONAL" -} -``` - -This step is optional and depends on customer policies and security best practices. - -## Data Anonymization - -We suggest using Cloud Data Loss Prevention to identify/mask/tokenize your confidential data. - -While implementing a Data Loss Prevention strategy is out of scope for this blueprint, we enable the service in two different projects so that [Cloud Data Loss Prevention templates](https://cloud.google.com/dlp/docs/concepts-templates) can be configured in one of two ways: - -- during the ingestion phase, from Dataflow -- during the transformation phase, from [BigQuery](https://cloud.google.com/bigquery/docs/scan-with-dlp) or [Cloud Dataflow](https://cloud.google.com/architecture/running-automated-dataflow-pipeline-de-identify-pii-dataset) - -Cloud Data Loss Prevention resources and templates should be stored in the security project: - -![Centralized Cloud Data Loss Prevention high-level diagram](./images/dlp_diagram.png "Centralized Cloud Data Loss Prevention high-level diagram") - -You can find more details and best practices on using DLP to De-identification and re-identification of PII in large-scale datasets in the [GCP documentation](https://cloud.google.com/architecture/de-identification-re-identification-pii-using-cloud-dlp). - -## Data Catalog - -[Data Catalog](https://cloud.google.com/data-catalog) helps you to document your data entry at scale. Data Catalog relies on [tags](https://cloud.google.com/data-catalog/docs/tags-and-tag-templates#tags) and [tag template](https://cloud.google.com/data-catalog/docs/tags-and-tag-templates#tag-templates) to manage metadata for all data entries in a unified and centralized service. To implement [column-level security](https://cloud.google.com/bigquery/docs/column-level-security-intro) on BigQuery, we suggest to use `Tags` and `Tag templates`. - -The default configuration will implement 3 tags: - -- `3_Confidential`: policy tag for columns that include very sensitive information, such as credit card numbers. -- `2_Private`: policy tag for columns that include sensitive personal identifiable information (PII) information, such as a person's first name. -- `1_Sensitive`: policy tag for columns that include data that cannot be made public, such as the credit limit. - -Anything that is not tagged is available to all users who have access to the data warehouse. - -For the purpose of the blueprint no groups has access to tagged data. You can configure your tags and roles associated by configuring the `data_catalog_tags` variable. We suggest using the "[Best practices for using policy tags in BigQuery](https://cloud.google.com/bigquery/docs/best-practices-policy-tags)" article as a guide to designing your tags structure and access pattern. - -## How to run this script - -To deploy this blueprint on your GCP organization, you will need - -- a folder or organization where new projects will be created -- a billing account that will be associated with the new projects -- user groups defined within the organization (provided as `organization_domain` variable): - - gcp-data-analysts - - gcp-data-engineers - - gcp-data-security - -The Data Platform is meant to be executed by a Service Account (or a regular user) having this minimal set of permission: - -- **Billing account** - - `roles/billing.user` -- **Folder level**: - - `roles/resourcemanager.folderAdmin` - - `roles/resourcemanager.projectCreator` -- **KMS Keys** (If CMEK encryption in use): - - `roles/cloudkms.admin` or a custom role with `cloudkms.cryptoKeys.getIamPolicy`, `cloudkms.cryptoKeys.list`, `cloudkms.cryptoKeys.setIamPolicy` permissions -- **Shared VPC host project** (if configured):\ - - `roles/compute.xpnAdmin` on the host project folder or org - - `roles/resourcemanager.projectIamAdmin` on the host project, either with no conditions or with a condition allowing [delegated role grants](https://medium.com/google-cloud/managing-gcp-service-usage-through-delegated-role-grants-a843610f2226#:~:text=Delegated%20role%20grants%20is%20a,setIamPolicy%20permission%20on%20a%20resource.) for `roles/compute.networkUser`, `roles/composer.sharedVpcAgent`, `roles/container.hostServiceAgentUser` - -## Variable configuration - -There are three sets of variables you will need to fill in: - -```tfvars -prefix = "dat-plat" -project_config = { - parent = "folders/1111111111" - billing_account_id = "1111111-2222222-33333333" -} -organization_domain = "domain.com" -``` - -For more fine details check variables on [`variables.tf`](./variables.tf) and update according to the desired configuration. Remember to create team groups described [below](#groups). - -Once the configuration is complete, run the project factory by running - -```bash -terraform init -terraform apply -``` - -## How to use this blueprint from Terraform - -While this blueprint can be used as a standalone deployment, it can also be called directly as a Terraform module by providing the variables values as show below: - -```hcl -module "data-platform" { - source = "./fabric/blueprints/data-solutions/data-platform-foundations" - organization_domain = "example.com" - project_config = { - billing_account_id = "123456-123456-123456" - parent = "folders/12345678" - } - # test 12-chars long prefix for FAST mt compatibility - prefix = "test-0123456" -} -# tftest modules=43 resources=347 -``` - -## Customizations - -### Create Cloud Key Management keys as part of the Data Platform - -To create Cloud Key Management keys in the Data Platform you can uncomment the Cloud Key Management resources configured in the [`06-common.tf`](./06-common.tf) file and update Cloud Key Management keys pointers on `local.service_encryption_keys.*` to the local resource created. - -### Assign roles at BQ Dataset level - -To handle multiple groups of `data-analysts` accessing the same Data Warehouse layer projects but only to the dataset belonging to a specific group, you may want to assign roles at BigQuery dataset level instead of at project-level. -To do this, you need to remove IAM binging at project-level for the `data-analysts` group and give roles at BigQuery dataset level using the `iam` variable on `bigquery-dataset` modules. - -### Project Configuration - -The solution can be deployed by creating projects on a given parent (organization or folder) or on existing projects. Configure variable `project_config` accordingly. - -When you rely on existing projects, the blueprint is designed to rely on different projects configuring IAM binding with an additive approach. For discovery or experimentation purposes, you may also configure `project_config.project_ids` to point different projects to one project with the granularity you need. For example, deploy resources from the 'load' project with resources in the 'transformation' project. - -Once you have identified the required project granularity for your use case, we suggest adapting the terraform script accordingly and relying on authoritative IAM binding. - -## Demo pipeline - -The application layer is out of scope of this script. As a demo purpuse only, several Cloud Composer DAGs are provided. Demos will import data from the `drop off` area to the `Data Warehouse Confidential` dataset suing different features. - -You can find examples in the `[demo](./demo)` folder. - -## Cleanup - -If you want to destroy the Data Platform deployment, follow these steps. - -**ATTENTION**: The following procedure will permanently delete all of your data in an irreversible manner. - -```bash -# remove GCS buckets and BQ dataset manually. Projects will be destroyed anyway -for x in $(terraform state list | grep google_storage_bucket.bucket); do - terraform state rm "$x"; -done - -for x in $(terraform state list | grep google_bigquery_dataset); do - terraform state rm "$x"; -done - -terraform destroy -``` - -## Variables - -| name | description | type | required | default | -|---|---|:---:|:---:|:---:| -| [organization_domain](variables.tf#L166) | Organization domain. | string | ✓ | | -| [prefix](variables.tf#L171) | Prefix used for resource names. | string | ✓ | | -| [project_config](variables.tf#L180) | Provide 'billing_account_id' value if project creation is needed, uses existing 'project_ids' if null. Parent is in 'folders/nnn' or 'organizations/nnn' format. | object({…}) | ✓ | | -| [composer_config](variables.tf#L17) | Cloud Composer config. | object({…}) | | {…} | -| [data_catalog_tags](variables.tf#L106) | List of Data Catalog Policy tags to be created with optional IAM binging configuration in {tag => {ROLE => [MEMBERS]}} format. | map(object({…})) | | {…} | -| [deletion_protection](variables.tf#L120) | Prevent Terraform from destroying data storage resources (storage buckets, GKE clusters, CloudSQL instances) in this blueprint. When this field is set in Terraform state, a terraform destroy or terraform apply that would delete data storage resources will fail. | bool | | false | -| [groups](variables.tf#L127) | User groups. | map(string) | | {…} | -| [location](variables.tf#L137) | Location used for multi-regional resources. | string | | "eu" | -| [network_config](variables.tf#L143) | Shared VPC network configurations to use. If null networks will be created in projects with preconfigured values. | object({…}) | | null | -| [project_services](variables.tf#L215) | List of core services enabled on all projects. | list(string) | | […] | -| [project_suffix](variables.tf#L226) | Suffix used only for project ids. | string | | null | -| [region](variables.tf#L232) | Region used for regional resources. | string | | "europe-west1" | -| [service_encryption_keys](variables.tf#L238) | Cloud KMS to use to encrypt different services. Key location should match service region. | object({…}) | | {} | - -## Outputs - -| name | description | sensitive | -|---|---|:---:| -| [bigquery-datasets](outputs.tf#L16) | BigQuery datasets. | | -| [demo_commands](outputs.tf#L26) | Demo commands. Relevant only if Composer is deployed. | | -| [df_template](outputs.tf#L49) | Dataflow template image and template details. | | -| [gcs-buckets](outputs.tf#L58) | GCS buckets. | | -| [projects](outputs.tf#L71) | GCP Projects information. | | -| [vpc_network](outputs.tf#L97) | VPC network. | | -| [vpc_subnet](outputs.tf#L106) | VPC subnetworks. | | - -## TODOs - -Features to add in future releases: - -- Add example on how to use Cloud Data Loss Prevention -- Add solution to handle Tables, Views, and Authorized Views lifecycle -- Add solution to handle Metadata lifecycle diff --git a/blueprints/data-solutions/data-platform-foundations/backend.tf.sample b/blueprints/data-solutions/data-platform-foundations/backend.tf.sample deleted file mode 100644 index 1e1c012a4..000000000 --- a/blueprints/data-solutions/data-platform-foundations/backend.tf.sample +++ /dev/null @@ -1,30 +0,0 @@ -# Copyright 2023 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# The `impersonate_service_account` option require the identity launching terraform -# role `roles/iam.serviceAccountTokenCreator` on the Service Account specified. - -terraform { - backend "gcs" { - bucket = "BUCKET_NAME" - prefix = "PREFIX" - impersonate_service_account = "SERVICE_ACCOUNT@PROJECT_ID.iam.gserviceaccount.com" - } -} -provider "google" { - impersonate_service_account = "SERVICE_ACCOUNT@PROJECT_ID.iam.gserviceaccount.com" -} -provider "google-beta" { - impersonate_service_account = "SERVICE_ACCOUNT@PROJECT_ID.iam.gserviceaccount.com" -} \ No newline at end of file diff --git a/blueprints/data-solutions/data-platform-foundations/demo/README.md b/blueprints/data-solutions/data-platform-foundations/demo/README.md deleted file mode 100644 index 639549fca..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/README.md +++ /dev/null @@ -1,33 +0,0 @@ -# Data ingestion Demo - -In this folder, you can find an example to ingest data on the `data platform` instantiated [here](../). - -The example is not intended to be a production-ready code. - -## Demo use case -The demo imports purchase data generated by a store. - -## Input files -Data are uploaded to the `drop off` GCS bucket. File structure: - - `customers.csv`: Comma separate value with customer information in the following format: Customer ID, Name, Surname, Registration Timestamp - - `purchases.csv`: Comma separate value with customer information in the following format: Item ID, Customer ID, Item, Item price, Purchase Timestamp - -## Data processing pipelines -Different data pipelines are provided to highlight different features and patterns. For the purpose of the example, a single pipeline handle all data lifecycles. When adapting them to your real use case, you may want to evaluate the option to handle each functional step on a separate pipeline or a dedicated tool. For example, you may want to use `Dataform` to handle data schemas lifecycle. - -Below you can find a description of each example: - - Simple import data: [`datapipeline.py`](./datapipeline.py) is a simple pipeline to import provided data from the `drop off` Google Cloud Storage bucket to the Data Hub Confidential layer joining `customers` and `purchases` tables into `customerpurchase` table. - - Import data with Policy Tags: [`datapipeline_dc_tags.py`](./datapipeline.py) imports provided data from `drop off` bucket to the Data Hub Confidential layer protecting sensitive data using Data Catalog policy Tags. - - Delete tables: [`delete_table.py`](./delete_table.py) deletes BigQuery tables created by import pipelines. - -## Running the demo -To run demo examples, please follow the following steps: - -- 01: Copy sample data to the `drop off` Cloud Storage bucket impersonating the `load` service account. -- 02: Copy sample data structure definition in the `orchestration` Cloud Storage bucket impersonating the `orchestration` service account. -- 03: Copy the Cloud Composer DAG to the Cloud Composer Storage bucket impersonating the `orchestration` service account. -- 04: Build the Dataflow Flex template and image via a Cloud Build pipeline -- 05: Open the Cloud Composer Airflow UI and run the imported DAG. -- 06: Run the BigQuery query to see results. - -You can find pre-computed commands in the `demo_commands` output variable of the deployed terraform [data pipeline](../). diff --git a/blueprints/data-solutions/data-platform-foundations/demo/data/customer_purchase.json b/blueprints/data-solutions/data-platform-foundations/demo/data/customer_purchase.json deleted file mode 100644 index d97e228f1..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/data/customer_purchase.json +++ /dev/null @@ -1,50 +0,0 @@ -[ - { - "mode": "REQUIRED", - "name": "id", - "type": "INTEGER", - "description": "ID" - }, - { - "mode": "REQUIRED", - "name": "customer_id", - "type": "INTEGER", - "description": "ID" - }, - { - "mode": "REQUIRED", - "name": "purchase_id", - "type": "INTEGER", - "description": "ID" - }, - { - "mode": "REQUIRED", - "name": "customer_name", - "type": "STRING", - "description": "Name" - }, - { - "mode": "REQUIRED", - "name": "customer_surname", - "type": "STRING", - "description": "Surname" - }, - { - "mode": "REQUIRED", - "name": "purchase_item", - "type": "STRING", - "description": "Item Name" - }, - { - "mode": "REQUIRED", - "name": "price", - "type": "FLOAT", - "description": "Item Price" - }, - { - "mode": "REQUIRED", - "name": "purchase_timestamp", - "type": "TIMESTAMP", - "description": "Timestamp" - } -] \ No newline at end of file diff --git a/blueprints/data-solutions/data-platform-foundations/demo/data/customers.csv b/blueprints/data-solutions/data-platform-foundations/demo/data/customers.csv deleted file mode 100644 index ea6aa7533..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/data/customers.csv +++ /dev/null @@ -1,12 +0,0 @@ -1,Name1,Surname1,1636972001 -2,Name2,Surname2,1636972002 -3,Name3,Surname3,1636972003 -4,Name4,Surname4,1636972004 -5,Name5,Surname5,1636972005 -6,Name6,Surname6,1636972006 -7,Name7,Surname7,1636972007 -8,Name8,Surname8,1636972008 -9,Name9,Surname9,1636972009 -10,Name11,Surname11,1636972010 -11,Name12,Surname12,1636972011 -12,Name13,Surname13,1636972012 \ No newline at end of file diff --git a/blueprints/data-solutions/data-platform-foundations/demo/data/customers.json b/blueprints/data-solutions/data-platform-foundations/demo/data/customers.json deleted file mode 100644 index c685279d1..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/data/customers.json +++ /dev/null @@ -1,26 +0,0 @@ -[ - { - "mode": "REQUIRED", - "name": "id", - "type": "INTEGER", - "description": "ID" - }, - { - "mode": "REQUIRED", - "name": "name", - "type": "STRING", - "description": "Name" - }, - { - "mode": "REQUIRED", - "name": "surname", - "type": "STRING", - "description": "Surname" - }, - { - "mode": "REQUIRED", - "name": "timestamp", - "type": "TIMESTAMP", - "description": "Timestamp" - } -] \ No newline at end of file diff --git a/blueprints/data-solutions/data-platform-foundations/demo/data/customers_schema.json b/blueprints/data-solutions/data-platform-foundations/demo/data/customers_schema.json deleted file mode 100644 index b751a5114..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/data/customers_schema.json +++ /dev/null @@ -1,28 +0,0 @@ -{ - "BigQuery Schema": [ - { - "mode": "REQUIRED", - "name": "id", - "type": "INTEGER", - "description": "ID" - }, - { - "mode": "REQUIRED", - "name": "name", - "type": "STRING", - "description": "Name" - }, - { - "mode": "REQUIRED", - "name": "surname", - "type": "STRING", - "description": "Surname" - }, - { - "mode": "REQUIRED", - "name": "timestamp", - "type": "TIMESTAMP", - "description": "Timestamp" - } - ] -} \ No newline at end of file diff --git a/blueprints/data-solutions/data-platform-foundations/demo/data/customers_udf.js b/blueprints/data-solutions/data-platform-foundations/demo/data/customers_udf.js deleted file mode 100644 index 11e1cfe4b..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/data/customers_udf.js +++ /dev/null @@ -1,12 +0,0 @@ -function transform(line) { - var values = line.split(','); - - var obj = new Object(); - obj.id = values[0] - obj.name = values[1]; - obj.surname = values[2]; - obj.timestamp = values[3]; - var jsonString = JSON.stringify(obj); - - return jsonString; -} \ No newline at end of file diff --git a/blueprints/data-solutions/data-platform-foundations/demo/data/purchases.csv b/blueprints/data-solutions/data-platform-foundations/demo/data/purchases.csv deleted file mode 100644 index 0b0de75cd..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/data/purchases.csv +++ /dev/null @@ -1,20 +0,0 @@ -1,1,Car1,5000,1636972012 -1,1,Car1,7000,1636972045 -1,2,Car1,6000,1636972088 -1,2,Car1,8000,16369720099 -1,3,Car1,10000,1636972102 -1,3,Car1,50000,1636972180 -1,4,Car1,13000,1636972260 -1,4,Car1,5000,1636972302 -1,5,Car1,2000,1636972408 -1,1,Car1,77000,1636972501 -1,1,Car1,64000,1636975001 -1,8,Car1,2000,1636976001 -1,9,Car1,4000,1636977001 -1,10,Car1,18000,1636982001 -1,11,Car1,21000,1636992001 -1,11,Car1,33000,1636932001 -1,11,Car1,37000,1636872001 -1,11,Car1,26000,1636772001 -1,12,Car1,22000,1636672001 -1,4,Car1,11000,1636952001 \ No newline at end of file diff --git a/blueprints/data-solutions/data-platform-foundations/demo/data/purchases.json b/blueprints/data-solutions/data-platform-foundations/demo/data/purchases.json deleted file mode 100644 index 78eb024ce..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/data/purchases.json +++ /dev/null @@ -1,32 +0,0 @@ -[ - { - "mode": "REQUIRED", - "name": "id", - "type": "INTEGER", - "description": "ID" - }, - { - "mode": "REQUIRED", - "name": "customer_id", - "type": "INTEGER", - "description": "ID" - }, - { - "mode": "REQUIRED", - "name": "item", - "type": "STRING", - "description": "Item Name" - }, - { - "mode": "REQUIRED", - "name": "price", - "type": "FLOAT", - "description": "Item Price" - }, - { - "mode": "REQUIRED", - "name": "timestamp", - "type": "TIMESTAMP", - "description": "Timestamp" - } -] \ No newline at end of file diff --git a/blueprints/data-solutions/data-platform-foundations/demo/data/purchases_schema.json b/blueprints/data-solutions/data-platform-foundations/demo/data/purchases_schema.json deleted file mode 100644 index 68731743a..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/data/purchases_schema.json +++ /dev/null @@ -1,34 +0,0 @@ -{ - "BigQuery Schema": [ - { - "mode": "REQUIRED", - "name": "id", - "type": "INTEGER", - "description": "ID" - }, - { - "mode": "REQUIRED", - "name": "customer_id", - "type": "INTEGER", - "description": "ID" - }, - { - "mode": "REQUIRED", - "name": "item", - "type": "STRING", - "description": "Item Name" - }, - { - "mode": "REQUIRED", - "name": "price", - "type": "FLOAT", - "description": "Item Price" - }, - { - "mode": "REQUIRED", - "name": "timestamp", - "type": "TIMESTAMP", - "description": "Timestamp" - } - ] -} \ No newline at end of file diff --git a/blueprints/data-solutions/data-platform-foundations/demo/data/purchases_udf.js b/blueprints/data-solutions/data-platform-foundations/demo/data/purchases_udf.js deleted file mode 100644 index d6ffde53c..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/data/purchases_udf.js +++ /dev/null @@ -1,13 +0,0 @@ -function transform(line) { - var values = line.split(','); - - var obj = new Object(); - obj.id = values[0]; - obj.customer_id = values[1]; - obj.item = values[2]; - obj.price = values[3]; - obj.timestamp = values[4]; - var jsonString = JSON.stringify(obj); - - return jsonString; -} \ No newline at end of file diff --git a/blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/.gitignore b/blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/.gitignore deleted file mode 100644 index 68bc17f9f..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/.gitignore +++ /dev/null @@ -1,160 +0,0 @@ -# Byte-compiled / optimized / DLL files -__pycache__/ -*.py[cod] -*$py.class - -# C extensions -*.so - -# Distribution / packaging -.Python -build/ -develop-eggs/ -dist/ -downloads/ -eggs/ -.eggs/ -lib/ -lib64/ -parts/ -sdist/ -var/ -wheels/ -share/python-wheels/ -*.egg-info/ -.installed.cfg -*.egg -MANIFEST - -# PyInstaller -# Usually these files are written by a python script from a template -# before PyInstaller builds the exe, so as to inject date/other infos into it. -*.manifest -*.spec - -# Installer logs -pip-log.txt -pip-delete-this-directory.txt - -# Unit test / coverage reports -htmlcov/ -.tox/ -.nox/ -.coverage -.coverage.* -.cache -nosetests.xml -coverage.xml -*.cover -*.py,cover -.hypothesis/ -.pytest_cache/ -cover/ - -# Translations -*.mo -*.pot - -# Django stuff: -*.log -local_settings.py -db.sqlite3 -db.sqlite3-journal - -# Flask stuff: -instance/ -.webassets-cache - -# Scrapy stuff: -.scrapy - -# Sphinx documentation -docs/_build/ - -# PyBuilder -.pybuilder/ -target/ - -# Jupyter Notebook -.ipynb_checkpoints - -# IPython -profile_default/ -ipython_config.py - -# pyenv -# For a library or package, you might want to ignore these files since the code is -# intended to run in multiple environments; otherwise, check them in: -# .python-version - -# pipenv -# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control. -# However, in case of collaboration, if having platform-specific dependencies or dependencies -# having no cross-platform support, pipenv may install dependencies that don't work, or not -# install all needed dependencies. -#Pipfile.lock - -# poetry -# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control. -# This is especially recommended for binary packages to ensure reproducibility, and is more -# commonly ignored for libraries. -# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control -#poetry.lock - -# pdm -# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control. -#pdm.lock -# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it -# in version control. -# https://pdm.fming.dev/#use-with-ide -.pdm.toml - -# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm -__pypackages__/ - -# Celery stuff -celerybeat-schedule -celerybeat.pid - -# SageMath parsed files -*.sage.py - -# Environments -.env -.venv -env/ -venv/ -ENV/ -env.bak/ -venv.bak/ - -# Spyder project settings -.spyderproject -.spyproject - -# Rope project settings -.ropeproject - -# mkdocs documentation -/site - -# mypy -.mypy_cache/ -.dmypy.json -dmypy.json - -# Pyre type checker -.pyre/ - -# pytype static type analyzer -.pytype/ - -# Cython debug symbols -cython_debug/ - -# PyCharm -# JetBrains specific template is maintained in a separate JetBrains.gitignore that can -# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore -# and can be added to the global gitignore or merged into this file. For a more nuclear -# option (not recommended) you can uncomment the following to ignore the entire idea folder. -#.idea/ diff --git a/blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/Dockerfile b/blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/Dockerfile deleted file mode 100644 index 69c6d2eef..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/Dockerfile +++ /dev/null @@ -1,29 +0,0 @@ -# Copyright 2023 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -FROM gcr.io/dataflow-templates-base/python39-template-launcher-base - -ENV FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE="/template/requirements.txt" -ENV FLEX_TEMPLATE_PYTHON_PY_FILE="/template/csv2bq.py" - -COPY ./src/ /template - -RUN apt-get update \ - && apt-get install -y libffi-dev git \ - && rm -rf /var/lib/apt/lists/* \ - && pip install --no-cache-dir --upgrade pip \ - && pip install --no-cache-dir -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE \ - && pip download --no-cache-dir --dest /tmp/dataflow-requirements-cache -r $FLEX_TEMPLATE_PYTHON_REQUIREMENTS_FILE - -ENV PIP_NO_DEPS=True diff --git a/blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/README.md b/blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/README.md deleted file mode 100644 index b052fab05..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/README.md +++ /dev/null @@ -1,122 +0,0 @@ -## Pipeline summary -This demo serves as a simple example of building and launching a Flex Template Dataflow pipeline. The code mainly focuses on reading a CSV file as input along with a JSON schema file as side input. The pipeline Parses both inputs and writes the data to the relevant BigQuery table while applying the schema passed from input. - -![Dataflow pipeline overview](../../images/df_demo_pipeline.png "Dataflow pipeline overview") - - -## Local development run - -For local development, the pipeline can be launched from the local machine for testing purposes using different runners depending on the scope of the test. - -### Using the Beam DirectRunner -The below example uses the Beam DirectRunner. The use case for this runner is mainly for quick local tests on the development environment with low volume of data. - -``` -CSV_FILE=gs://[TEST-BUCKET]/customers.csv -JSON_SCHEMA=gs://[TEST-BUCKET]/customers_schema.json -OUTPUT_TABLE=[TEST-PROJ].[TEST-DATASET].customers -PIPELINE_STAGIN_PATH="gs://[TEST-STAGING-BUCKET]" - -python src/csv2bq.py \ ---runner="DirectRunner" \ ---csv_file=$CSV_FILE \ ---json_schema=$JSON_SCHEMA \ ---output_table=$OUTPUT_TABLE \ ---temp_location=$PIPELINE_STAGIN_PATH/tmp -``` - -*Note:* All paths mentioned can be local paths or on GCS. For cloud resources referenced (GCS and BigQuery), make sure that the user launching the command is authenticated to GCP via `gcloud auth application-default login` and has the required access privileges to those resources. - -### Using the DataflowRunner with a local CLI launch - -The below example triggers the pipeline on Dataflow from your local development environment. The use case for this is for running local tests on larger volumes of test data and verifying that the pipeline runs well on Dataflow, before compiling it into a template. - -``` -PROJECT_ID=[TEST-PROJECT] -REGION=[REGION] -SUBNET=[SUBNET-NAME] -DEV_SERVICE_ACCOUNT=[DEV-SA] - -PIPELINE_STAGIN_PATH="gs://[TEST-STAGING-BUCKET]" -CSV_FILE=gs://[TEST-BUCKET]/customers.csv -JSON_SCHEMA=gs://[TEST-BUCKET]/customers_schema.json -OUTPUT_TABLE=[TEST-PROJ].[TEST-DATASET].customers - -python src/csv2bq.py \ ---runner="Dataflow" \ ---project=$PROJECT_ID \ ---region=$REGION \ ---csv_file=$CSV_FILE \ ---json_schema=$JSON_SCHEMA \ ---output_table=$OUTPUT_TABLE \ ---temp_location=$PIPELINE_STAGIN_PATH/tmp ---staging_location=$PIPELINE_STAGIN_PATH/stage \ ---subnetwork="regions/$REGION/subnetworks/$SUBNET" \ ---impersonate_service_account=$DEV_SERVICE_ACCOUNT \ ---no_use_public_ips -``` - -In terms of resource access privilege, you can choose to impersonate another service account, which could be defined for development resource access. The authenticated user launching this pipeline will need to have the role `roles/iam.serviceAccountTokenCreator`. If you choose to launch the pipeline without service account impersonation, it will use the default compute service account assigned of the target project. - -## Dataflow Flex Template run - -For production, and as outline in the Data Platform demo, we build and launch the pipeline as a Flex Template, making it available for other cloud services(such as Apache Airflow) and users to trigger launch instances of it on demand. - -### Build launch - -Below is an example for triggering the Dataflow flex template build pipeline defined in `cloudbuild.yaml`. The Terraform output provides an example as well filled with the parameters values based on the generated resources in the data platform. - -``` -GCP_PROJECT="[ORCHESTRATION-PROJECT]" -TEMPLATE_IMAGE="[REGION].pkg.dev/[ORCHESTRATION-PROJECT]/[REPOSITORY]/csv2bq:latest" -TEMPLATE_PATH="gs://[DATAFLOW-TEMPLATE-BUCKEt]/csv2bq.json" -STAGIN_PATH="gs://[ORCHESTRATION-STAGING-BUCKET]/build" -LOG_PATH="gs://[ORCHESTRATION-LOGS-BUCKET]/logs" -REGION="[REGION]" -BUILD_SERVICE_ACCOUNT=orc-sa-df-build@[SERVICE_PROJECT_ID].iam.gserviceaccount.com - -gcloud builds submit \ - --config=cloudbuild.yaml \ - --project=$GCP_PROJECT \ - --region=$REGION \ - --gcs-log-dir=$LOG_PATH \ - --gcs-source-staging-dir=$STAGIN_PATH \ - --substitutions=_TEMPLATE_IMAGE=$TEMPLATE_IMAGE,_TEMPLATE_PATH=$TEMPLATE_PATH,_DOCKER_DIR="." \ - --impersonate-service-account=$BUILD_SERVICE_ACCOUNT -``` - -**Note:** For the scope of the demo, the launch of this build is manual, but in production, this build would be launched via a configured cloud build trigger when new changes are merged into the code branch of the Dataflow template. - -### Dataflow Flex Template run - -After the build step succeeds. You can launch dataflow pipeline from CLI (outline in this example) or the API via Airflow's operator. For the use case of the data platform, the Dataflow pipeline would be launched via the orchestration service account, which is what the Airflow DAG is also using in the scope of this demo. - -**Note:** In the data platform demo, the launch of this Dataflow pipeline is handled by the airflow operator (DataflowStartFlexTemplateOperator). - -``` -#!/bin/bash - -PROJECT_ID=[LOAD-PROJECT] -REGION=[REGION] -ORCH_SERVICE_ACCOUNT=orchestrator@[SERVICE_PROJECT_ID].iam.gserviceaccount.com -SUBNET=[SUBNET-NAME] - -PIPELINE_STAGIN_PATH="gs://[LOAD-STAGING-BUCKET]/build" -CSV_FILE=gs://[DROP-ZONE-BUCKET]/customers.csv -JSON_SCHEMA=gs://[ORCHESTRATION-BUCKET]/customers_schema.json -OUTPUT_TABLE=[DESTINATION-PROJ].[DESTINATION-DATASET].customers -TEMPLATE_PATH=gs://[ORCHESTRATION-DF-GCS]/csv2bq.json - - -gcloud dataflow flex-template run "csv2bq-`date +%Y%m%d-%H%M%S`" \ - --template-file-gcs-location $TEMPLATE_PATH \ - --parameters temp_location="$PIPELINE_STAGIN_PATH/tmp" \ - --parameters staging_location="$PIPELINE_STAGIN_PATH/stage" \ - --parameters csv_file=$CSV_FILE \ - --parameters json_schema=$JSON_SCHEMA\ - --parameters output_table=$OUTPUT_TABLE \ - --region $REGION \ - --project $PROJECT_ID \ - --subnetwork="regions/$REGION/subnetworks/$SUBNET" \ - --service-account-email=$ORCH_SERVICE_ACCOUNT -``` diff --git a/blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/cloudbuild.yaml b/blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/cloudbuild.yaml deleted file mode 100644 index 11354c2ed..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/cloudbuild.yaml +++ /dev/null @@ -1,30 +0,0 @@ -# Copyright 2023 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -steps: -- name: gcr.io/cloud-builders/gcloud - id: "Build docker image" - args: ['builds', 'submit', '--tag', '$_TEMPLATE_IMAGE', '.'] - dir: '$_DOCKER_DIR' - waitFor: ['-'] -- name: gcr.io/cloud-builders/gcloud - id: "Build template" - args: ['dataflow', - 'flex-template', - 'build', - '$_TEMPLATE_PATH', - '--image=$_TEMPLATE_IMAGE', - '--sdk-language=PYTHON' - ] - waitFor: ['Build docker image'] diff --git a/blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/src/csv2bq.py b/blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/src/csv2bq.py deleted file mode 100644 index c162f9241..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/src/csv2bq.py +++ /dev/null @@ -1,74 +0,0 @@ -# Copyright 2023 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -import apache_beam as beam -from apache_beam.io import ReadFromText, Read, WriteToBigQuery, \ - BigQueryDisposition -from apache_beam.options.pipeline_options import PipelineOptions, SetupOptions -from apache_beam.io.filesystems import FileSystems -import json -import argparse - - -class ParseRow(beam.DoFn): - """ - Splits a given csv row by a separator, validates fields and returns a dict - structure compatible with the BigQuery transform - """ - - def process(self, element: str, table_fields: list, delimiter: str): - split_row = element.split(delimiter) - parsed_row = {} - - for i, field in enumerate(table_fields['BigQuery Schema']): - parsed_row[field['name']] = split_row[i] - - yield parsed_row - - -def run(argv=None, save_main_session=True): - parser = argparse.ArgumentParser() - parser.add_argument('--csv_file', type=str, required=True, - help='Path to the CSV file') - parser.add_argument('--json_schema', type=str, required=True, - help='Path to the JSON schema') - parser.add_argument('--output_table', type=str, required=True, - help='BigQuery path for the output table') - - args, pipeline_args = parser.parse_known_args(argv) - pipeline_options = PipelineOptions(pipeline_args) - pipeline_options.view_as(SetupOptions).save_main_session = save_main_session - - with beam.Pipeline(options=pipeline_options) as p: - - def get_table_schema(table_path, table_schema): - return {'fields': table_schema['BigQuery Schema']} - - csv_input = p | 'Read CSV' >> ReadFromText(args.csv_file) - schema_input = p | 'Load Schema' >> beam.Create( - json.loads(FileSystems.open(args.json_schema).read())) - - table_fields = beam.pvalue.AsDict(schema_input) - parsed = csv_input | 'Parse and validate rows' >> beam.ParDo( - ParseRow(), table_fields, ',') - - parsed | 'Write to BigQuery' >> WriteToBigQuery( - args.output_table, schema=get_table_schema, - create_disposition=BigQueryDisposition.CREATE_IF_NEEDED, - write_disposition=BigQueryDisposition.WRITE_TRUNCATE, - schema_side_inputs=(table_fields,)) - - -if __name__ == "__main__": - run() diff --git a/blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/src/requirements.txt b/blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/src/requirements.txt deleted file mode 100644 index 21c569a0d..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/dataflow-csv2bq/src/requirements.txt +++ /dev/null @@ -1 +0,0 @@ -apache-beam==2.44.0 diff --git a/blueprints/data-solutions/data-platform-foundations/demo/datapipeline.py b/blueprints/data-solutions/data-platform-foundations/demo/datapipeline.py deleted file mode 100644 index 124982536..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/datapipeline.py +++ /dev/null @@ -1,213 +0,0 @@ -# Copyright 2023 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# -------------------------------------------------------------------------------- -# Load The Dependencies -# -------------------------------------------------------------------------------- - -import datetime - -from airflow import models -from airflow.models.variable import Variable -from airflow.providers.google.cloud.operators.dataflow import \ - DataflowTemplatedJobStartOperator -from airflow.operators import empty -from airflow.providers.google.cloud.operators.bigquery import \ - BigQueryInsertJobOperator - -# -------------------------------------------------------------------------------- -# Set variables - Needed for the DEMO -# -------------------------------------------------------------------------------- -BQ_LOCATION = Variable.get("BQ_LOCATION") -DATA_CAT_TAGS = Variable.get("DATA_CAT_TAGS", deserialize_json=True) -DWH_LAND_PRJ = Variable.get("DWH_LAND_PRJ") -DWH_LAND_BQ_DATASET = Variable.get("DWH_LAND_BQ_DATASET") -DWH_LAND_GCS = Variable.get("DWH_LAND_GCS") -DWH_CURATED_PRJ = Variable.get("DWH_CURATED_PRJ") -DWH_CURATED_BQ_DATASET = Variable.get("DWH_CURATED_BQ_DATASET") -DWH_CURATED_GCS = Variable.get("DWH_CURATED_GCS") -DWH_CONFIDENTIAL_PRJ = Variable.get("DWH_CONFIDENTIAL_PRJ") -DWH_CONFIDENTIAL_BQ_DATASET = Variable.get("DWH_CONFIDENTIAL_BQ_DATASET") -DWH_CONFIDENTIAL_GCS = Variable.get("DWH_CONFIDENTIAL_GCS") -GCP_REGION = Variable.get("GCP_REGION") -DRP_PRJ = Variable.get("DRP_PRJ") -DRP_BQ = Variable.get("DRP_BQ") -DRP_GCS = Variable.get("DRP_GCS") -DRP_PS = Variable.get("DRP_PS") -LOD_PRJ = Variable.get("LOD_PRJ") -LOD_GCS_STAGING = Variable.get("LOD_GCS_STAGING") -LOD_NET_VPC = Variable.get("LOD_NET_VPC") -LOD_NET_SUBNET = Variable.get("LOD_NET_SUBNET") -LOD_SA_DF = Variable.get("LOD_SA_DF") -ORC_PRJ = Variable.get("ORC_PRJ") -ORC_GCS = Variable.get("ORC_GCS") -TRF_PRJ = Variable.get("TRF_PRJ") -TRF_GCS_STAGING = Variable.get("TRF_GCS_STAGING") -TRF_NET_VPC = Variable.get("TRF_NET_VPC") -TRF_NET_SUBNET = Variable.get("TRF_NET_SUBNET") -TRF_SA_DF = Variable.get("TRF_SA_DF") -TRF_SA_BQ = Variable.get("TRF_SA_BQ") -DF_KMS_KEY = Variable.get("DF_KMS_KEY", "") -DF_REGION = Variable.get("GCP_REGION") -DF_ZONE = Variable.get("GCP_REGION") + "-b" - -# -------------------------------------------------------------------------------- -# Set default arguments -# -------------------------------------------------------------------------------- - -# If you are running Airflow in more than one time zone -# see https://airflow.apache.org/docs/apache-airflow/stable/timezone.html -# for best practices -yesterday = datetime.datetime.now() - datetime.timedelta(days=1) - -default_args = { - 'owner': 'airflow', - 'start_date': yesterday, - 'depends_on_past': False, - 'email': [''], - 'email_on_failure': False, - 'email_on_retry': False, - 'retries': 1, - 'retry_delay': datetime.timedelta(minutes=5), - 'dataflow_default_options': { - 'location': DF_REGION, - 'zone': DF_ZONE, - 'stagingLocation': LOD_GCS_STAGING, - 'tempLocation': LOD_GCS_STAGING + "/tmp", - 'serviceAccountEmail': LOD_SA_DF, - 'subnetwork': LOD_NET_SUBNET, - 'ipConfiguration': "WORKER_IP_PRIVATE", - 'kmsKeyName': DF_KMS_KEY - }, -} - -# -------------------------------------------------------------------------------- -# Main DAG -# -------------------------------------------------------------------------------- - -with models.DAG('data_pipeline_dag', default_args=default_args, - schedule_interval=None) as dag: - start = empty.EmptyOperator(task_id='start', trigger_rule='all_success') - - end = empty.EmptyOperator(task_id='end', trigger_rule='all_success') - - # Bigquery Tables automatically created for demo porpuse. - # Consider a dedicated pipeline or tool for a real life scenario. - customers_import = DataflowTemplatedJobStartOperator( - task_id="dataflow_customers_import", - template="gs://dataflow-templates/latest/GCS_Text_to_BigQuery", - project_id=LOD_PRJ, - location=DF_REGION, - parameters={ - "javascriptTextTransformFunctionName": - "transform", - "JSONPath": - ORC_GCS + "/customers_schema.json", - "javascriptTextTransformGcsPath": - ORC_GCS + "/customers_udf.js", - "inputFilePattern": - DRP_GCS + "/customers.csv", - "outputTable": - DWH_LAND_PRJ + ":" + DWH_LAND_BQ_DATASET + ".customers", - "bigQueryLoadingTemporaryDirectory": - LOD_GCS_STAGING + "/tmp/bq/", - }, - ) - - purchases_import = DataflowTemplatedJobStartOperator( - task_id="dataflow_purchases_import", - template="gs://dataflow-templates/latest/GCS_Text_to_BigQuery", - project_id=LOD_PRJ, - location=DF_REGION, - parameters={ - "javascriptTextTransformFunctionName": - "transform", - "JSONPath": - ORC_GCS + "/purchases_schema.json", - "javascriptTextTransformGcsPath": - ORC_GCS + "/purchases_udf.js", - "inputFilePattern": - DRP_GCS + "/purchases.csv", - "outputTable": - DWH_LAND_PRJ + ":" + DWH_LAND_BQ_DATASET + ".purchases", - "bigQueryLoadingTemporaryDirectory": - LOD_GCS_STAGING + "/tmp/bq/", - }, - ) - - join_customer_purchase = BigQueryInsertJobOperator( - task_id='bq_join_customer_purchase', gcp_conn_id='bigquery_default', - project_id=TRF_PRJ, location=BQ_LOCATION, configuration={ - 'jobType': 'QUERY', - 'query': { - 'query': - """SELECT - c.id as customer_id, - p.id as purchase_id, - p.item as item, - p.price as price, - p.timestamp as timestamp - FROM `{dwh_0_prj}.{dwh_0_dataset}.customers` c - JOIN `{dwh_0_prj}.{dwh_0_dataset}.purchases` p ON c.id = p.customer_id - """.format( - dwh_0_prj=DWH_LAND_PRJ, - dwh_0_dataset=DWH_LAND_BQ_DATASET, - ), - 'destinationTable': { - 'projectId': DWH_CURATED_PRJ, - 'datasetId': DWH_CURATED_BQ_DATASET, - 'tableId': 'customer_purchase' - }, - 'writeDisposition': - 'WRITE_TRUNCATE', - "useLegacySql": - False - } - }, impersonation_chain=[TRF_SA_BQ]) - - confidential_customer_purchase = BigQueryInsertJobOperator( - task_id='bq_confidential_customer_purchase', - gcp_conn_id='bigquery_default', project_id=TRF_PRJ, location=BQ_LOCATION, - configuration={ - 'jobType': 'QUERY', - 'query': { - 'query': - """SELECT - c.id as customer_id, - p.id as purchase_id, - c.name as name, - c.surname as surname, - p.item as item, - p.price as price, - p.timestamp as timestamp - FROM `{dwh_0_prj}.{dwh_0_dataset}.customers` c - JOIN `{dwh_0_prj}.{dwh_0_dataset}.purchases` p ON c.id = p.customer_id - """.format( - dwh_0_prj=DWH_LAND_PRJ, - dwh_0_dataset=DWH_LAND_BQ_DATASET, - ), - 'destinationTable': { - 'projectId': DWH_CONFIDENTIAL_PRJ, - 'datasetId': DWH_CONFIDENTIAL_BQ_DATASET, - 'tableId': 'customer_purchase' - }, - 'writeDisposition': - 'WRITE_TRUNCATE', - "useLegacySql": - False - } - }, impersonation_chain=[TRF_SA_BQ]) - - start >> [customers_import, purchases_import - ] >> join_customer_purchase >> confidential_customer_purchase >> end diff --git a/blueprints/data-solutions/data-platform-foundations/demo/datapipeline_dc_tags.py b/blueprints/data-solutions/data-platform-foundations/demo/datapipeline_dc_tags.py deleted file mode 100644 index 55d57a093..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/datapipeline_dc_tags.py +++ /dev/null @@ -1,419 +0,0 @@ -# Copyright 2023 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# -------------------------------------------------------------------------------- -# Load The Dependencies -# -------------------------------------------------------------------------------- - -import datetime - -from airflow import models -from airflow.models.variable import Variable -from airflow.providers.google.cloud.operators.dataflow import \ - DataflowTemplatedJobStartOperator -from airflow.operators import empty -from airflow.providers.google.cloud.operators.bigquery import \ - BigQueryInsertJobOperator, BigQueryUpsertTableOperator, \ - BigQueryUpdateTableSchemaOperator -from airflow.utils.task_group import TaskGroup - -# -------------------------------------------------------------------------------- -# Set variables - Needed for the DEMO -# -------------------------------------------------------------------------------- -BQ_LOCATION = Variable.get("BQ_LOCATION") -DATA_CAT_TAGS = Variable.get("DATA_CAT_TAGS", deserialize_json=True) -DWH_LAND_PRJ = Variable.get("DWH_LAND_PRJ") -DWH_LAND_BQ_DATASET = Variable.get("DWH_LAND_BQ_DATASET") -DWH_LAND_GCS = Variable.get("DWH_LAND_GCS") -DWH_CURATED_PRJ = Variable.get("DWH_CURATED_PRJ") -DWH_CURATED_BQ_DATASET = Variable.get("DWH_CURATED_BQ_DATASET") -DWH_CURATED_GCS = Variable.get("DWH_CURATED_GCS") -DWH_CONFIDENTIAL_PRJ = Variable.get("DWH_CONFIDENTIAL_PRJ") -DWH_CONFIDENTIAL_BQ_DATASET = Variable.get("DWH_CONFIDENTIAL_BQ_DATASET") -DWH_CONFIDENTIAL_GCS = Variable.get("DWH_CONFIDENTIAL_GCS") -GCP_REGION = Variable.get("GCP_REGION") -DRP_PRJ = Variable.get("DRP_PRJ") -DRP_BQ = Variable.get("DRP_BQ") -DRP_GCS = Variable.get("DRP_GCS") -DRP_PS = Variable.get("DRP_PS") -LOD_PRJ = Variable.get("LOD_PRJ") -LOD_GCS_STAGING = Variable.get("LOD_GCS_STAGING") -LOD_NET_VPC = Variable.get("LOD_NET_VPC") -LOD_NET_SUBNET = Variable.get("LOD_NET_SUBNET") -LOD_SA_DF = Variable.get("LOD_SA_DF") -ORC_PRJ = Variable.get("ORC_PRJ") -ORC_GCS = Variable.get("ORC_GCS") -TRF_PRJ = Variable.get("TRF_PRJ") -TRF_GCS_STAGING = Variable.get("TRF_GCS_STAGING") -TRF_NET_VPC = Variable.get("TRF_NET_VPC") -TRF_NET_SUBNET = Variable.get("TRF_NET_SUBNET") -TRF_SA_DF = Variable.get("TRF_SA_DF") -TRF_SA_BQ = Variable.get("TRF_SA_BQ") -DF_KMS_KEY = Variable.get("DF_KMS_KEY", "") -DF_REGION = Variable.get("GCP_REGION") -DF_ZONE = Variable.get("GCP_REGION") + "-b" - -# -------------------------------------------------------------------------------- -# Set default arguments -# -------------------------------------------------------------------------------- - -# If you are running Airflow in more than one time zone -# see https://airflow.apache.org/docs/apache-airflow/stable/timezone.html -# for best practices -yesterday = datetime.datetime.now() - datetime.timedelta(days=1) - -default_args = { - 'owner': 'airflow', - 'start_date': yesterday, - 'depends_on_past': False, - 'email': [''], - 'email_on_failure': False, - 'email_on_retry': False, - 'retries': 1, - 'retry_delay': datetime.timedelta(minutes=5), - 'dataflow_default_options': { - 'location': DF_REGION, - 'zone': DF_ZONE, - 'stagingLocation': LOD_GCS_STAGING, - 'tempLocation': LOD_GCS_STAGING + "/tmp", - 'serviceAccountEmail': LOD_SA_DF, - 'subnetwork': LOD_NET_SUBNET, - 'ipConfiguration': "WORKER_IP_PRIVATE", - 'kmsKeyName': DF_KMS_KEY - }, -} - -# -------------------------------------------------------------------------------- -# Main DAG -# -------------------------------------------------------------------------------- - -with models.DAG('data_pipeline_dc_tags_dag', default_args=default_args, - schedule_interval=None) as dag: - start = empty.EmptyOperator(task_id='start', trigger_rule='all_success') - - end = empty.EmptyOperator(task_id='end', trigger_rule='all_success') - - # Bigquery Tables created here for demo porpuse. - # Consider a dedicated pipeline or tool for a real life scenario. - with TaskGroup('upsert_table') as upsert_table: - upsert_table_customers = BigQueryUpsertTableOperator( - task_id="upsert_table_customers", - project_id=DWH_LAND_PRJ, - dataset_id=DWH_LAND_BQ_DATASET, - impersonation_chain=[LOD_SA_DF], - table_resource={ - "tableReference": { - "tableId": "customers" - }, - }, - ) - - upsert_table_purchases = BigQueryUpsertTableOperator( - task_id="upsert_table_purchases", - project_id=DWH_LAND_PRJ, - dataset_id=DWH_LAND_BQ_DATASET, - impersonation_chain=[LOD_SA_DF], - table_resource={"tableReference": { - "tableId": "purchases" - }}, - ) - - upsert_table_customer_purchase_curated = BigQueryUpsertTableOperator( - task_id="upsert_table_customer_purchase_curated", - project_id=DWH_CURATED_PRJ, - dataset_id=DWH_CURATED_BQ_DATASET, - impersonation_chain=[TRF_SA_BQ], - table_resource={"tableReference": { - "tableId": "customer_purchase" - }}, - ) - - upsert_table_customer_purchase_confidential = BigQueryUpsertTableOperator( - task_id="upsert_table_customer_purchase_confidential", - project_id=DWH_CONFIDENTIAL_PRJ, - dataset_id=DWH_CONFIDENTIAL_BQ_DATASET, - impersonation_chain=[TRF_SA_BQ], - table_resource={"tableReference": { - "tableId": "customer_purchase" - }}, - ) - - # Bigquery Tables schema defined here for demo porpuse. - # Consider a dedicated pipeline or tool for a real life scenario. - with TaskGroup('update_schema_table') as update_schema_table: - update_table_schema_customers = BigQueryUpdateTableSchemaOperator( - task_id="update_table_schema_customers", project_id=DWH_LAND_PRJ, - dataset_id=DWH_LAND_BQ_DATASET, table_id="customers", - impersonation_chain=[LOD_SA_DF], include_policy_tags=True, - schema_fields_updates=[{ - "mode": "REQUIRED", - "name": "id", - "type": "INTEGER", - "description": "ID" - }, { - "mode": "REQUIRED", - "name": "name", - "type": "STRING", - "description": "Name", - "policyTags": { - "names": [DATA_CAT_TAGS.get('2_Private', None)] - } - }, { - "mode": "REQUIRED", - "name": "surname", - "type": "STRING", - "description": "Surname", - "policyTags": { - "names": [DATA_CAT_TAGS.get('2_Private', None)] - } - }, { - "mode": "REQUIRED", - "name": "timestamp", - "type": "TIMESTAMP", - "description": "Timestamp" - }]) - - update_table_schema_customers = BigQueryUpdateTableSchemaOperator( - task_id="update_table_schema_purchases", project_id=DWH_LAND_PRJ, - dataset_id=DWH_LAND_BQ_DATASET, table_id="purchases", - impersonation_chain=[LOD_SA_DF], include_policy_tags=True, - schema_fields_updates=[{ - "mode": "REQUIRED", - "name": "id", - "type": "INTEGER", - "description": "ID" - }, { - "mode": "REQUIRED", - "name": "customer_id", - "type": "INTEGER", - "description": "ID" - }, { - "mode": "REQUIRED", - "name": "item", - "type": "STRING", - "description": "Item Name" - }, { - "mode": "REQUIRED", - "name": "price", - "type": "FLOAT", - "description": "Item Price" - }, { - "mode": "REQUIRED", - "name": "timestamp", - "type": "TIMESTAMP", - "description": "Timestamp" - }]) - - update_table_schema_customer_purchase_curated = BigQueryUpdateTableSchemaOperator( - task_id="update_table_schema_customer_purchase_curated", - project_id=DWH_CURATED_PRJ, dataset_id=DWH_CURATED_BQ_DATASET, - table_id="customer_purchase", impersonation_chain=[TRF_SA_BQ], - include_policy_tags=True, schema_fields_updates=[{ - "mode": "REQUIRED", - "name": "customer_id", - "type": "INTEGER", - "description": "ID" - }, { - "mode": "REQUIRED", - "name": "purchase_id", - "type": "INTEGER", - "description": "ID" - }, { - "mode": "REQUIRED", - "name": "name", - "type": "STRING", - "description": "Name", - "policyTags": { - "names": [DATA_CAT_TAGS.get('2_Private', None)] - } - }, { - "mode": "REQUIRED", - "name": "surname", - "type": "STRING", - "description": "Surname", - "policyTags": { - "names": [DATA_CAT_TAGS.get('2_Private', None)] - } - }, { - "mode": "REQUIRED", - "name": "item", - "type": "STRING", - "description": "Item Name" - }, { - "mode": "REQUIRED", - "name": "price", - "type": "FLOAT", - "description": "Item Price" - }, { - "mode": "REQUIRED", - "name": "timestamp", - "type": "TIMESTAMP", - "description": "Timestamp" - }]) - - update_table_schema_customer_purchase_confidential = BigQueryUpdateTableSchemaOperator( - task_id="update_table_schema_customer_purchase_confidential", - project_id=DWH_CONFIDENTIAL_PRJ, dataset_id=DWH_CONFIDENTIAL_BQ_DATASET, - table_id="customer_purchase", impersonation_chain=[TRF_SA_BQ], - include_policy_tags=True, schema_fields_updates=[{ - "mode": "REQUIRED", - "name": "customer_id", - "type": "INTEGER", - "description": "ID" - }, { - "mode": "REQUIRED", - "name": "purchase_id", - "type": "INTEGER", - "description": "ID" - }, { - "mode": "REQUIRED", - "name": "name", - "type": "STRING", - "description": "Name", - "policyTags": { - "names": [DATA_CAT_TAGS.get('2_Private', None)] - } - }, { - "mode": "REQUIRED", - "name": "surname", - "type": "STRING", - "description": "Surname", - "policyTags": { - "names": [DATA_CAT_TAGS.get('2_Private', None)] - } - }, { - "mode": "REQUIRED", - "name": "item", - "type": "STRING", - "description": "Item Name" - }, { - "mode": "REQUIRED", - "name": "price", - "type": "FLOAT", - "description": "Item Price" - }, { - "mode": "REQUIRED", - "name": "timestamp", - "type": "TIMESTAMP", - "description": "Timestamp" - }]) - - customers_import = DataflowTemplatedJobStartOperator( - task_id="dataflow_customers_import", - template="gs://dataflow-templates/latest/GCS_Text_to_BigQuery", - project_id=LOD_PRJ, - location=DF_REGION, - parameters={ - "javascriptTextTransformFunctionName": - "transform", - "JSONPath": - ORC_GCS + "/customers_schema.json", - "javascriptTextTransformGcsPath": - ORC_GCS + "/customers_udf.js", - "inputFilePattern": - DRP_GCS + "/customers.csv", - "outputTable": - DWH_LAND_PRJ + ":" + DWH_LAND_BQ_DATASET + ".customers", - "bigQueryLoadingTemporaryDirectory": - LOD_GCS_STAGING + "/tmp/bq/", - }, - ) - - purchases_import = DataflowTemplatedJobStartOperator( - task_id="dataflow_purchases_import", - template="gs://dataflow-templates/latest/GCS_Text_to_BigQuery", - project_id=LOD_PRJ, - location=DF_REGION, - parameters={ - "javascriptTextTransformFunctionName": - "transform", - "JSONPath": - ORC_GCS + "/purchases_schema.json", - "javascriptTextTransformGcsPath": - ORC_GCS + "/purchases_udf.js", - "inputFilePattern": - DRP_GCS + "/purchases.csv", - "outputTable": - DWH_LAND_PRJ + ":" + DWH_LAND_BQ_DATASET + ".purchases", - "bigQueryLoadingTemporaryDirectory": - LOD_GCS_STAGING + "/tmp/bq/", - }, - ) - - join_customer_purchase = BigQueryInsertJobOperator( - task_id='bq_join_customer_purchase', gcp_conn_id='bigquery_default', - project_id=TRF_PRJ, location=BQ_LOCATION, configuration={ - 'jobType': 'QUERY', - 'query': { - 'query': - """SELECT - c.id as customer_id, - p.id as purchase_id, - c.name as name, - c.surname as surname, - p.item as item, - p.price as price, - p.timestamp as timestamp - FROM `{dwh_0_prj}.{dwh_0_dataset}.customers` c - JOIN `{dwh_0_prj}.{dwh_0_dataset}.purchases` p ON c.id = p.customer_id - """.format( - dwh_0_prj=DWH_LAND_PRJ, - dwh_0_dataset=DWH_LAND_BQ_DATASET, - ), - 'destinationTable': { - 'projectId': DWH_CURATED_PRJ, - 'datasetId': DWH_CURATED_BQ_DATASET, - 'tableId': 'customer_purchase' - }, - 'writeDisposition': - 'WRITE_APPEND', - "useLegacySql": - False - } - }, impersonation_chain=[TRF_SA_BQ]) - - confidential_customer_purchase = BigQueryInsertJobOperator( - task_id='bq_confidential_customer_purchase', - gcp_conn_id='bigquery_default', project_id=TRF_PRJ, location=BQ_LOCATION, - configuration={ - 'jobType': 'QUERY', - 'query': { - 'query': - """SELECT - customer_id, - purchase_id, - name, - surname, - item, - price, - timestamp - FROM `{dwh_cur_prj}.{dwh_cur_dataset}.customer_purchase` - """.format( - dwh_cur_prj=DWH_CURATED_PRJ, - dwh_cur_dataset=DWH_CURATED_BQ_DATASET, - ), - 'destinationTable': { - 'projectId': DWH_CONFIDENTIAL_PRJ, - 'datasetId': DWH_CONFIDENTIAL_BQ_DATASET, - 'tableId': 'customer_purchase' - }, - 'writeDisposition': - 'WRITE_APPEND', - "useLegacySql": - False - } - }, impersonation_chain=[TRF_SA_BQ]) - start >> upsert_table >> update_schema_table >> [ - customers_import, purchases_import - ] >> join_customer_purchase >> confidential_customer_purchase >> end diff --git a/blueprints/data-solutions/data-platform-foundations/demo/datapipeline_dc_tags_flex.py b/blueprints/data-solutions/data-platform-foundations/demo/datapipeline_dc_tags_flex.py deleted file mode 100644 index 57a28c12e..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/datapipeline_dc_tags_flex.py +++ /dev/null @@ -1,432 +0,0 @@ -# Copyright 2023 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# -------------------------------------------------------------------------------- -# Load The Dependencies -# -------------------------------------------------------------------------------- - -import datetime -import time - -from airflow import models -from airflow.models.variable import Variable -from airflow.operators import empty -from airflow.providers.google.cloud.operators.dataflow import \ - DataflowStartFlexTemplateOperator -from airflow.providers.google.cloud.operators.bigquery import \ - BigQueryInsertJobOperator, BigQueryUpsertTableOperator, \ - BigQueryUpdateTableSchemaOperator -from airflow.utils.task_group import TaskGroup - -# -------------------------------------------------------------------------------- -# Set variables - Needed for the DEMO -# -------------------------------------------------------------------------------- -BQ_LOCATION = Variable.get("BQ_LOCATION") -DATA_CAT_TAGS = Variable.get("DATA_CAT_TAGS", deserialize_json=True) -DWH_LAND_PRJ = Variable.get("DWH_LAND_PRJ") -DWH_LAND_BQ_DATASET = Variable.get("DWH_LAND_BQ_DATASET") -DWH_LAND_GCS = Variable.get("DWH_LAND_GCS") -DWH_CURATED_PRJ = Variable.get("DWH_CURATED_PRJ") -DWH_CURATED_BQ_DATASET = Variable.get("DWH_CURATED_BQ_DATASET") -DWH_CURATED_GCS = Variable.get("DWH_CURATED_GCS") -DWH_CONFIDENTIAL_PRJ = Variable.get("DWH_CONFIDENTIAL_PRJ") -DWH_CONFIDENTIAL_BQ_DATASET = Variable.get("DWH_CONFIDENTIAL_BQ_DATASET") -DWH_CONFIDENTIAL_GCS = Variable.get("DWH_CONFIDENTIAL_GCS") -GCP_REGION = Variable.get("GCP_REGION") -DRP_PRJ = Variable.get("DRP_PRJ") -DRP_BQ = Variable.get("DRP_BQ") -DRP_GCS = Variable.get("DRP_GCS") -DRP_PS = Variable.get("DRP_PS") -LOD_PRJ = Variable.get("LOD_PRJ") -LOD_GCS_STAGING = Variable.get("LOD_GCS_STAGING") -LOD_NET_VPC = Variable.get("LOD_NET_VPC") -LOD_NET_SUBNET = Variable.get("LOD_NET_SUBNET") -LOD_SA_DF = Variable.get("LOD_SA_DF") -ORC_PRJ = Variable.get("ORC_PRJ") -ORC_GCS = Variable.get("ORC_GCS") -ORC_GCS_TMP_DF = Variable.get("ORC_GCS_TMP_DF") -TRF_PRJ = Variable.get("TRF_PRJ") -TRF_GCS_STAGING = Variable.get("TRF_GCS_STAGING") -TRF_NET_VPC = Variable.get("TRF_NET_VPC") -TRF_NET_SUBNET = Variable.get("TRF_NET_SUBNET") -TRF_SA_DF = Variable.get("TRF_SA_DF") -TRF_SA_BQ = Variable.get("TRF_SA_BQ") -DF_KMS_KEY = Variable.get("DF_KMS_KEY", "") -DF_REGION = Variable.get("GCP_REGION") -DF_ZONE = Variable.get("GCP_REGION") + "-b" - -# -------------------------------------------------------------------------------- -# Set default arguments -# -------------------------------------------------------------------------------- - -# If you are running Airflow in more than one time zone -# see https://airflow.apache.org/docs/apache-airflow/stable/timezone.html -# for best practices -yesterday = datetime.datetime.now() - datetime.timedelta(days=1) - -default_args = { - 'owner': 'airflow', - 'start_date': yesterday, - 'depends_on_past': False, - 'email': [''], - 'email_on_failure': False, - 'email_on_retry': False, - 'retries': 1, - 'retry_delay': datetime.timedelta(minutes=5), -} - -dataflow_environment = { - 'serviceAccountEmail': LOD_SA_DF, - 'workerZone': DF_ZONE, - 'stagingLocation': f'{LOD_GCS_STAGING}/staging', - 'tempLocation': f'{LOD_GCS_STAGING}/tmp', - 'subnetwork': LOD_NET_SUBNET, - 'kmsKeyName': DF_KMS_KEY, - 'ipConfiguration': 'WORKER_IP_PRIVATE' -} - -# -------------------------------------------------------------------------------- -# Main DAG -# -------------------------------------------------------------------------------- - -with models.DAG('data_pipeline_dc_tags_dag_flex', default_args=default_args, - schedule_interval=None) as dag: - start = empty.EmptyOperator(task_id='start', trigger_rule='all_success') - - end = empty.EmptyOperator(task_id='end', trigger_rule='all_success') - - # Bigquery Tables created here for demo porpuse. - # Consider a dedicated pipeline or tool for a real life scenario. - with TaskGroup('upsert_table') as upsert_table: - upsert_table_customers = BigQueryUpsertTableOperator( - task_id="upsert_table_customers", - project_id=DWH_LAND_PRJ, - dataset_id=DWH_LAND_BQ_DATASET, - impersonation_chain=[TRF_SA_DF], - table_resource={ - "tableReference": { - "tableId": "customers" - }, - }, - ) - - upsert_table_purchases = BigQueryUpsertTableOperator( - task_id="upsert_table_purchases", - project_id=DWH_LAND_PRJ, - dataset_id=DWH_LAND_BQ_DATASET, - impersonation_chain=[TRF_SA_BQ], - table_resource={"tableReference": { - "tableId": "purchases" - }}, - ) - - upsert_table_customer_purchase_curated = BigQueryUpsertTableOperator( - task_id="upsert_table_customer_purchase_curated", - project_id=DWH_CURATED_PRJ, - dataset_id=DWH_CURATED_BQ_DATASET, - impersonation_chain=[TRF_SA_BQ], - table_resource={"tableReference": { - "tableId": "customer_purchase" - }}, - ) - - upsert_table_customer_purchase_confidential = BigQueryUpsertTableOperator( - task_id="upsert_table_customer_purchase_confidential", - project_id=DWH_CONFIDENTIAL_PRJ, - dataset_id=DWH_CONFIDENTIAL_BQ_DATASET, - impersonation_chain=[TRF_SA_BQ], - table_resource={"tableReference": { - "tableId": "customer_purchase" - }}, - ) - - # Bigquery Tables schema defined here for demo porpuse. - # Consider a dedicated pipeline or tool for a real life scenario. - with TaskGroup('update_schema_table') as update_schema_table: - update_table_schema_customers = BigQueryUpdateTableSchemaOperator( - task_id="update_table_schema_customers", project_id=DWH_LAND_PRJ, - dataset_id=DWH_LAND_BQ_DATASET, table_id="customers", - impersonation_chain=[TRF_SA_BQ], include_policy_tags=True, - schema_fields_updates=[{ - "mode": "REQUIRED", - "name": "id", - "type": "INTEGER", - "description": "ID" - }, { - "mode": "REQUIRED", - "name": "name", - "type": "STRING", - "description": "Name", - "policyTags": { - "names": [DATA_CAT_TAGS.get('2_Private', None)] - } - }, { - "mode": "REQUIRED", - "name": "surname", - "type": "STRING", - "description": "Surname", - "policyTags": { - "names": [DATA_CAT_TAGS.get('2_Private', None)] - } - }, { - "mode": "REQUIRED", - "name": "timestamp", - "type": "TIMESTAMP", - "description": "Timestamp" - }]) - - update_table_schema_purchases = BigQueryUpdateTableSchemaOperator( - task_id="update_table_schema_purchases", project_id=DWH_LAND_PRJ, - dataset_id=DWH_LAND_BQ_DATASET, table_id="purchases", - impersonation_chain=[TRF_SA_BQ], include_policy_tags=True, - schema_fields_updates=[{ - "mode": "REQUIRED", - "name": "id", - "type": "INTEGER", - "description": "ID" - }, { - "mode": "REQUIRED", - "name": "customer_id", - "type": "INTEGER", - "description": "ID" - }, { - "mode": "REQUIRED", - "name": "item", - "type": "STRING", - "description": "Item Name" - }, { - "mode": "REQUIRED", - "name": "price", - "type": "FLOAT", - "description": "Item Price" - }, { - "mode": "REQUIRED", - "name": "timestamp", - "type": "TIMESTAMP", - "description": "Timestamp" - }]) - - update_table_schema_customer_purchase_curated = BigQueryUpdateTableSchemaOperator( - task_id="update_table_schema_customer_purchase_curated", - project_id=DWH_CURATED_PRJ, dataset_id=DWH_CURATED_BQ_DATASET, - table_id="customer_purchase", impersonation_chain=[TRF_SA_BQ], - include_policy_tags=True, schema_fields_updates=[{ - "mode": "REQUIRED", - "name": "customer_id", - "type": "INTEGER", - "description": "ID" - }, { - "mode": "REQUIRED", - "name": "purchase_id", - "type": "INTEGER", - "description": "ID" - }, { - "mode": "REQUIRED", - "name": "name", - "type": "STRING", - "description": "Name", - "policyTags": { - "names": [DATA_CAT_TAGS.get('2_Private', None)] - } - }, { - "mode": "REQUIRED", - "name": "surname", - "type": "STRING", - "description": "Surname", - "policyTags": { - "names": [DATA_CAT_TAGS.get('2_Private', None)] - } - }, { - "mode": "REQUIRED", - "name": "item", - "type": "STRING", - "description": "Item Name" - }, { - "mode": "REQUIRED", - "name": "price", - "type": "FLOAT", - "description": "Item Price" - }, { - "mode": "REQUIRED", - "name": "timestamp", - "type": "TIMESTAMP", - "description": "Timestamp" - }]) - - update_table_schema_customer_purchase_confidential = BigQueryUpdateTableSchemaOperator( - task_id="update_table_schema_customer_purchase_confidential", - project_id=DWH_CONFIDENTIAL_PRJ, dataset_id=DWH_CONFIDENTIAL_BQ_DATASET, - table_id="customer_purchase", impersonation_chain=[TRF_SA_BQ], - include_policy_tags=True, schema_fields_updates=[{ - "mode": "REQUIRED", - "name": "customer_id", - "type": "INTEGER", - "description": "ID" - }, { - "mode": "REQUIRED", - "name": "purchase_id", - "type": "INTEGER", - "description": "ID" - }, { - "mode": "REQUIRED", - "name": "name", - "type": "STRING", - "description": "Name", - "policyTags": { - "names": [DATA_CAT_TAGS.get('2_Private', None)] - } - }, { - "mode": "REQUIRED", - "name": "surname", - "type": "STRING", - "description": "Surname", - "policyTags": { - "names": [DATA_CAT_TAGS.get('2_Private', None)] - } - }, { - "mode": "REQUIRED", - "name": "item", - "type": "STRING", - "description": "Item Name" - }, { - "mode": "REQUIRED", - "name": "price", - "type": "FLOAT", - "description": "Item Price" - }, { - "mode": "REQUIRED", - "name": "timestamp", - "type": "TIMESTAMP", - "description": "Timestamp" - }]) - - customers_import = DataflowStartFlexTemplateOperator( - task_id='dataflow_customers_import', project_id=LOD_PRJ, - location=DF_REGION, body={ - 'launchParameter': { - 'jobName': f'dataflow-customers-import-{round(time.time())}', - 'containerSpecGcsPath': f'{ORC_GCS_TMP_DF}/csv2bq.json', - 'environment': { - 'serviceAccountEmail': LOD_SA_DF, - 'workerZone': DF_ZONE, - 'stagingLocation': f'{LOD_GCS_STAGING}/staging', - 'tempLocation': f'{LOD_GCS_STAGING}/tmp', - 'subnetwork': LOD_NET_SUBNET, - 'kmsKeyName': DF_KMS_KEY, - 'ipConfiguration': 'WORKER_IP_PRIVATE' - }, - 'parameters': { - 'csv_file': - f'{DRP_GCS}/customers.csv', - 'json_schema': - f'{ORC_GCS}/customers_schema.json', - 'output_table': - f'{DWH_LAND_PRJ}:{DWH_LAND_BQ_DATASET}.customers', - } - } - }) - - purchases_import = DataflowStartFlexTemplateOperator( - task_id='dataflow_purchases_import', project_id=LOD_PRJ, - location=DF_REGION, body={ - 'launchParameter': { - 'jobName': f'dataflow-purchases-import-{round(time.time())}', - 'containerSpecGcsPath': f'{ORC_GCS_TMP_DF}/csv2bq.json', - 'environment': { - 'serviceAccountEmail': LOD_SA_DF, - 'workerZone': DF_ZONE, - 'stagingLocation': f'{LOD_GCS_STAGING}/staging', - 'tempLocation': f'{LOD_GCS_STAGING}/tmp', - 'subnetwork': LOD_NET_SUBNET, - 'kmsKeyName': DF_KMS_KEY, - 'ipConfiguration': 'WORKER_IP_PRIVATE' - }, - 'parameters': { - 'csv_file': - f'{DRP_GCS}/purchases.csv', - 'json_schema': - f'{ORC_GCS}/purchases_schema.json', - 'output_table': - f'{DWH_LAND_PRJ}:{DWH_LAND_BQ_DATASET}.purchases', - } - } - }) - - join_customer_purchase = BigQueryInsertJobOperator( - task_id='bq_join_customer_purchase', gcp_conn_id='bigquery_default', - project_id=TRF_PRJ, location=BQ_LOCATION, configuration={ - 'jobType': 'QUERY', - 'query': { - 'query': - """SELECT - c.id as customer_id, - p.id as purchase_id, - c.name as name, - c.surname as surname, - p.item as item, - p.price as price, - p.timestamp as timestamp - FROM `{dwh_0_prj}.{dwh_0_dataset}.customers` c - JOIN `{dwh_0_prj}.{dwh_0_dataset}.purchases` p ON c.id = p.customer_id - """.format( - dwh_0_prj=DWH_LAND_PRJ, - dwh_0_dataset=DWH_LAND_BQ_DATASET, - ), - 'destinationTable': { - 'projectId': DWH_CURATED_PRJ, - 'datasetId': DWH_CURATED_BQ_DATASET, - 'tableId': 'customer_purchase' - }, - 'writeDisposition': - 'WRITE_APPEND', - "useLegacySql": - False - } - }, impersonation_chain=[TRF_SA_BQ]) - - confidential_customer_purchase = BigQueryInsertJobOperator( - task_id='bq_confidential_customer_purchase', - gcp_conn_id='bigquery_default', project_id=TRF_PRJ, location=BQ_LOCATION, - configuration={ - 'jobType': 'QUERY', - 'query': { - 'query': - """SELECT - customer_id, - purchase_id, - name, - surname, - item, - price, - timestamp - FROM `{dwh_cur_prj}.{dwh_cur_dataset}.customer_purchase` - """.format( - dwh_cur_prj=DWH_CURATED_PRJ, - dwh_cur_dataset=DWH_CURATED_BQ_DATASET, - ), - 'destinationTable': { - 'projectId': DWH_CONFIDENTIAL_PRJ, - 'datasetId': DWH_CONFIDENTIAL_BQ_DATASET, - 'tableId': 'customer_purchase' - }, - 'writeDisposition': - 'WRITE_APPEND', - "useLegacySql": - False - } - }, impersonation_chain=[TRF_SA_BQ]) - -start >> upsert_table >> update_schema_table >> [ - customers_import, purchases_import -] >> join_customer_purchase >> confidential_customer_purchase >> end diff --git a/blueprints/data-solutions/data-platform-foundations/demo/datapipeline_flex.py b/blueprints/data-solutions/data-platform-foundations/demo/datapipeline_flex.py deleted file mode 100644 index e00af82ac..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/datapipeline_flex.py +++ /dev/null @@ -1,209 +0,0 @@ -# Copyright 2023 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# -------------------------------------------------------------------------------- -# Load The Dependencies -# -------------------------------------------------------------------------------- - -import datetime -import time - -from airflow import models -from airflow.models.variable import Variable -from airflow.providers.google.cloud.operators.dataflow import \ - DataflowStartFlexTemplateOperator -from airflow.operators import empty -from airflow.providers.google.cloud.operators.bigquery import \ - BigQueryInsertJobOperator - -# -------------------------------------------------------------------------------- -# Set variables - Needed for the DEMO -# -------------------------------------------------------------------------------- -BQ_LOCATION = Variable.get("BQ_LOCATION") -DATA_CAT_TAGS = Variable.get("DATA_CAT_TAGS", deserialize_json=True) -DWH_LAND_PRJ = Variable.get("DWH_LAND_PRJ") -DWH_LAND_BQ_DATASET = Variable.get("DWH_LAND_BQ_DATASET") -DWH_LAND_GCS = Variable.get("DWH_LAND_GCS") -DWH_CURATED_PRJ = Variable.get("DWH_CURATED_PRJ") -DWH_CURATED_BQ_DATASET = Variable.get("DWH_CURATED_BQ_DATASET") -DWH_CURATED_GCS = Variable.get("DWH_CURATED_GCS") -DWH_CONFIDENTIAL_PRJ = Variable.get("DWH_CONFIDENTIAL_PRJ") -DWH_CONFIDENTIAL_BQ_DATASET = Variable.get("DWH_CONFIDENTIAL_BQ_DATASET") -DWH_CONFIDENTIAL_GCS = Variable.get("DWH_CONFIDENTIAL_GCS") -GCP_REGION = Variable.get("GCP_REGION") -DRP_PRJ = Variable.get("DRP_PRJ") -DRP_BQ = Variable.get("DRP_BQ") -DRP_GCS = Variable.get("DRP_GCS") -DRP_PS = Variable.get("DRP_PS") -LOD_PRJ = Variable.get("LOD_PRJ") -LOD_GCS_STAGING = Variable.get("LOD_GCS_STAGING") -LOD_NET_VPC = Variable.get("LOD_NET_VPC") -LOD_NET_SUBNET = Variable.get("LOD_NET_SUBNET") -LOD_SA_DF = Variable.get("LOD_SA_DF") -ORC_PRJ = Variable.get("ORC_PRJ") -ORC_GCS = Variable.get("ORC_GCS") -ORC_GCS_TMP_DF = Variable.get("ORC_GCS_TMP_DF") -TRF_PRJ = Variable.get("TRF_PRJ") -TRF_GCS_STAGING = Variable.get("TRF_GCS_STAGING") -TRF_NET_VPC = Variable.get("TRF_NET_VPC") -TRF_NET_SUBNET = Variable.get("TRF_NET_SUBNET") -TRF_SA_DF = Variable.get("TRF_SA_DF") -TRF_SA_BQ = Variable.get("TRF_SA_BQ") -DF_KMS_KEY = Variable.get("DF_KMS_KEY", "") -DF_REGION = Variable.get("GCP_REGION") -DF_ZONE = Variable.get("GCP_REGION") + "-b" - -# -------------------------------------------------------------------------------- -# Set default arguments -# -------------------------------------------------------------------------------- - -# If you are running Airflow in more than one time zone -# see https://airflow.apache.org/docs/apache-airflow/stable/timezone.html -# for best practices -yesterday = datetime.datetime.now() - datetime.timedelta(days=1) - -default_args = { - 'owner': 'airflow', - 'start_date': yesterday, - 'depends_on_past': False, - 'email': [''], - 'email_on_failure': False, - 'email_on_retry': False, - 'retries': 1, - 'retry_delay': datetime.timedelta(minutes=5), -} - -dataflow_environment = { - 'serviceAccountEmail': LOD_SA_DF, - 'workerZone': DF_ZONE, - 'stagingLocation': f'{LOD_GCS_STAGING}/staging', - 'tempLocation': f'{LOD_GCS_STAGING}/tmp', - 'subnetwork': LOD_NET_SUBNET, - 'kmsKeyName': DF_KMS_KEY, - 'ipConfiguration': 'WORKER_IP_PRIVATE' -} - -# -------------------------------------------------------------------------------- -# Main DAG -# -------------------------------------------------------------------------------- - -with models.DAG('data_pipeline_dag_flex', default_args=default_args, - schedule_interval=None) as dag: - start = empty.EmptyOperator(task_id='start', trigger_rule='all_success') - - end = empty.EmptyOperator(task_id='end', trigger_rule='all_success') - - # Bigquery Tables automatically created for demo purposes. - # Consider a dedicated pipeline or tool for a real life scenario. - customers_import = DataflowStartFlexTemplateOperator( - task_id='dataflow_customers_import', project_id=LOD_PRJ, - location=DF_REGION, body={ - 'launchParameter': { - 'jobName': f'dataflow-customers-import-{round(time.time())}', - 'containerSpecGcsPath': f'{ORC_GCS_TMP_DF}/csv2bq.json', - 'environment': dataflow_environment, - 'parameters': { - 'csv_file': - f'{DRP_GCS}/customers.csv', - 'json_schema': - f'{ORC_GCS}/customers_schema.json', - 'output_table': - f'{DWH_LAND_PRJ}:{DWH_LAND_BQ_DATASET}.customers', - } - } - }) - - purchases_import = DataflowStartFlexTemplateOperator( - task_id='dataflow_purchases_import', project_id=LOD_PRJ, - location=DF_REGION, body={ - 'launchParameter': { - 'jobName': f'dataflow-purchases-import-{round(time.time())}', - 'containerSpecGcsPath': f'{ORC_GCS_TMP_DF}/csv2bq.json', - 'environment': dataflow_environment, - 'parameters': { - 'csv_file': - f'{DRP_GCS}/purchases.csv', - 'json_schema': - f'{ORC_GCS}/purchases_schema.json', - 'output_table': - f'{DWH_LAND_PRJ}:{DWH_LAND_BQ_DATASET}.purchases', - } - } - }) - - join_customer_purchase = BigQueryInsertJobOperator( - task_id='bq_join_customer_purchase', gcp_conn_id='bigquery_default', - project_id=TRF_PRJ, location=BQ_LOCATION, configuration={ - 'jobType': 'QUERY', - 'query': { - 'query': - """SELECT - c.id as customer_id, - p.id as purchase_id, - p.item as item, - p.price as price, - p.timestamp as timestamp - FROM `{dwh_0_prj}.{dwh_0_dataset}.customers` c - JOIN `{dwh_0_prj}.{dwh_0_dataset}.purchases` p ON c.id = p.customer_id - """.format( - dwh_0_prj=DWH_LAND_PRJ, - dwh_0_dataset=DWH_LAND_BQ_DATASET, - ), - 'destinationTable': { - 'projectId': DWH_CURATED_PRJ, - 'datasetId': DWH_CURATED_BQ_DATASET, - 'tableId': 'customer_purchase' - }, - 'writeDisposition': - 'WRITE_TRUNCATE', - "useLegacySql": - False - } - }, impersonation_chain=[TRF_SA_BQ]) - - confidential_customer_purchase = BigQueryInsertJobOperator( - task_id='bq_confidential_customer_purchase', - gcp_conn_id='bigquery_default', project_id=TRF_PRJ, location=BQ_LOCATION, - configuration={ - 'jobType': 'QUERY', - 'query': { - 'query': - """SELECT - c.id as customer_id, - p.id as purchase_id, - c.name as name, - c.surname as surname, - p.item as item, - p.price as price, - p.timestamp as timestamp - FROM `{dwh_0_prj}.{dwh_0_dataset}.customers` c - JOIN `{dwh_0_prj}.{dwh_0_dataset}.purchases` p ON c.id = p.customer_id - """.format( - dwh_0_prj=DWH_LAND_PRJ, - dwh_0_dataset=DWH_LAND_BQ_DATASET, - ), - 'destinationTable': { - 'projectId': DWH_CONFIDENTIAL_PRJ, - 'datasetId': DWH_CONFIDENTIAL_BQ_DATASET, - 'tableId': 'customer_purchase' - }, - 'writeDisposition': - 'WRITE_TRUNCATE', - "useLegacySql": - False - } - }, impersonation_chain=[TRF_SA_BQ]) - - start >> [customers_import, purchases_import - ] >> join_customer_purchase >> confidential_customer_purchase >> end diff --git a/blueprints/data-solutions/data-platform-foundations/demo/delete_table.py b/blueprints/data-solutions/data-platform-foundations/demo/delete_table.py deleted file mode 100644 index cc841e38e..000000000 --- a/blueprints/data-solutions/data-platform-foundations/demo/delete_table.py +++ /dev/null @@ -1,133 +0,0 @@ -# Copyright 2023 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# -------------------------------------------------------------------------------- -# Load The Dependencies -# -------------------------------------------------------------------------------- - -import csv -import datetime -import io -import json -import logging -import os - -from airflow import models -from airflow.models.variable import Variable -from airflow.operators import empty -from airflow.providers.google.cloud.operators.bigquery import \ - BigQueryDeleteTableOperator -from airflow.utils.task_group import TaskGroup - -# -------------------------------------------------------------------------------- -# Set variables - Needed for the DEMO -# -------------------------------------------------------------------------------- -BQ_LOCATION = Variable.get("BQ_LOCATION") -DATA_CAT_TAGS = Variable.get("DATA_CAT_TAGS", deserialize_json=True) -DWH_LAND_PRJ = Variable.get("DWH_LAND_PRJ") -DWH_LAND_BQ_DATASET = Variable.get("DWH_LAND_BQ_DATASET") -DWH_LAND_GCS = Variable.get("DWH_LAND_GCS") -DWH_CURATED_PRJ = Variable.get("DWH_CURATED_PRJ") -DWH_CURATED_BQ_DATASET = Variable.get("DWH_CURATED_BQ_DATASET") -DWH_CURATED_GCS = Variable.get("DWH_CURATED_GCS") -DWH_CONFIDENTIAL_PRJ = Variable.get("DWH_CONFIDENTIAL_PRJ") -DWH_CONFIDENTIAL_BQ_DATASET = Variable.get("DWH_CONFIDENTIAL_BQ_DATASET") -DWH_CONFIDENTIAL_GCS = Variable.get("DWH_CONFIDENTIAL_GCS") -GCP_REGION = Variable.get("GCP_REGION") -DRP_PRJ = Variable.get("DRP_PRJ") -DRP_BQ = Variable.get("DRP_BQ") -DRP_GCS = Variable.get("DRP_GCS") -DRP_PS = Variable.get("DRP_PS") -LOD_PRJ = Variable.get("LOD_PRJ") -LOD_GCS_STAGING = Variable.get("LOD_GCS_STAGING") -LOD_NET_VPC = Variable.get("LOD_NET_VPC") -LOD_NET_SUBNET = Variable.get("LOD_NET_SUBNET") -LOD_SA_DF = Variable.get("LOD_SA_DF") -ORC_PRJ = Variable.get("ORC_PRJ") -ORC_GCS = Variable.get("ORC_GCS") -TRF_PRJ = Variable.get("TRF_PRJ") -TRF_GCS_STAGING = Variable.get("TRF_GCS_STAGING") -TRF_NET_VPC = Variable.get("TRF_NET_VPC") -TRF_NET_SUBNET = Variable.get("TRF_NET_SUBNET") -TRF_SA_DF = Variable.get("TRF_SA_DF") -TRF_SA_BQ = Variable.get("TRF_SA_BQ") -DF_KMS_KEY = Variable.get("DF_KMS_KEY", "") -DF_REGION = Variable.get("GCP_REGION") -DF_ZONE = Variable.get("GCP_REGION") + "-b" - -# -------------------------------------------------------------------------------- -# Set default arguments -# -------------------------------------------------------------------------------- - -# If you are running Airflow in more than one time zone -# see https://airflow.apache.org/docs/apache-airflow/stable/timezone.html -# for best practices -yesterday = datetime.datetime.now() - datetime.timedelta(days=1) - -default_args = { - 'owner': 'airflow', - 'start_date': yesterday, - 'depends_on_past': False, - 'email': [''], - 'email_on_failure': False, - 'email_on_retry': False, - 'retries': 1, - 'retry_delay': datetime.timedelta(minutes=5), - 'dataflow_default_options': { - 'location': DF_REGION, - 'zone': DF_ZONE, - 'stagingLocation': LOD_GCS_STAGING, - 'tempLocation': LOD_GCS_STAGING + "/tmp", - 'serviceAccountEmail': LOD_SA_DF, - 'subnetwork': LOD_NET_SUBNET, - 'ipConfiguration': "WORKER_IP_PRIVATE", - 'kmsKeyName': DF_KMS_KEY - }, -} - -# -------------------------------------------------------------------------------- -# Main DAG -# -------------------------------------------------------------------------------- - -with models.DAG('delete_tables_dag', default_args=default_args, - schedule_interval=None) as dag: - start = empty.EmptyOperator(task_id='start', trigger_rule='all_success') - - end = empty.EmptyOperator(task_id='end', trigger_rule='all_success') - - # Bigquery Tables deleted here for demo porpuse. - # Consider a dedicated pipeline or tool for a real life scenario. - with TaskGroup('delete_table') as delete_table: - delete_table_customers = BigQueryDeleteTableOperator( - task_id="delete_table_customers", deletion_dataset_table=DWH_LAND_PRJ + - "." + DWH_LAND_BQ_DATASET + ".customers", - impersonation_chain=[LOD_SA_DF]) - - delete_table_purchases = BigQueryDeleteTableOperator( - task_id="delete_table_purchases", deletion_dataset_table=DWH_LAND_PRJ + - "." + DWH_LAND_BQ_DATASET + ".purchases", - impersonation_chain=[LOD_SA_DF]) - - delete_table_customer_purchase_curated = BigQueryDeleteTableOperator( - task_id="delete_table_customer_purchase_curated", - deletion_dataset_table=DWH_CURATED_PRJ + "." + DWH_CURATED_BQ_DATASET + - ".customer_purchase", impersonation_chain=[TRF_SA_DF]) - - delete_table_customer_purchase_confidential = BigQueryDeleteTableOperator( - task_id="delete_table_customer_purchase_confidential", - deletion_dataset_table=DWH_CONFIDENTIAL_PRJ + "." + - DWH_CONFIDENTIAL_BQ_DATASET + ".customer_purchase", - impersonation_chain=[TRF_SA_DF]) - - start >> delete_table >> end diff --git a/blueprints/data-solutions/data-platform-foundations/images/df_demo_pipeline.png b/blueprints/data-solutions/data-platform-foundations/images/df_demo_pipeline.png deleted file mode 100644 index 541532b41..000000000 Binary files a/blueprints/data-solutions/data-platform-foundations/images/df_demo_pipeline.png and /dev/null differ diff --git a/blueprints/data-solutions/data-platform-foundations/images/dlp_diagram.png b/blueprints/data-solutions/data-platform-foundations/images/dlp_diagram.png deleted file mode 100644 index 65467b2be..000000000 Binary files a/blueprints/data-solutions/data-platform-foundations/images/dlp_diagram.png and /dev/null differ diff --git a/blueprints/data-solutions/data-platform-foundations/images/kms_diagram.png b/blueprints/data-solutions/data-platform-foundations/images/kms_diagram.png deleted file mode 100644 index e2ef9a6e4..000000000 Binary files a/blueprints/data-solutions/data-platform-foundations/images/kms_diagram.png and /dev/null differ diff --git a/blueprints/data-solutions/data-platform-foundations/images/overview_diagram.png b/blueprints/data-solutions/data-platform-foundations/images/overview_diagram.png deleted file mode 100644 index 073ec870c..000000000 Binary files a/blueprints/data-solutions/data-platform-foundations/images/overview_diagram.png and /dev/null differ diff --git a/blueprints/data-solutions/data-platform-foundations/locals-01-dropoff.tf b/blueprints/data-solutions/data-platform-foundations/locals-01-dropoff.tf deleted file mode 100644 index 02b533d5f..000000000 --- a/blueprints/data-solutions/data-platform-foundations/locals-01-dropoff.tf +++ /dev/null @@ -1,37 +0,0 @@ -/** - * Copyright 2023 Google LLC - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -locals { - _drp_iam = flatten([ - for principal, roles in local.drp_iam : [ - for role in roles : { - key = "${principal}-${role}" - principal = principal - role = role - } - ] - ]) - drp_iam_additive = { - for binding in local._drp_iam : binding.key => { - role = binding.role - member = local.iam_principals[binding.principal] - } - } - drp_iam_auth = { - for binding in local._drp_iam : - binding.role => local.iam_principals[binding.principal]... - } -} diff --git a/blueprints/data-solutions/data-platform-foundations/locals-02-load.tf b/blueprints/data-solutions/data-platform-foundations/locals-02-load.tf deleted file mode 100644 index 743883393..000000000 --- a/blueprints/data-solutions/data-platform-foundations/locals-02-load.tf +++ /dev/null @@ -1,47 +0,0 @@ -/** - * Copyright 2023 Google LLC - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -locals { - _load_iam = flatten([ - for principal, roles in local.load_iam : [ - for role in roles : { - key = "${principal}-${role}" - principal = principal - role = role - } - ] - ]) - load_iam_additive = { - for binding in local._load_iam : binding.key => { - role = binding.role - member = local.iam_principals[binding.principal] - } - } - load_iam_auth = { - for binding in local._load_iam : - binding.role => local.iam_principals[binding.principal]... - } - load_subnet = ( - local.use_shared_vpc - ? var.network_config.subnet_self_links.orchestration - : values(module.load-vpc[0].subnet_self_links)[0] - ) - load_vpc = ( - local.use_shared_vpc - ? var.network_config.network_self_link - : module.load-vpc[0].self_link - ) -} diff --git a/blueprints/data-solutions/data-platform-foundations/locals-03-orchestration.tf b/blueprints/data-solutions/data-platform-foundations/locals-03-orchestration.tf deleted file mode 100644 index 11777c821..000000000 --- a/blueprints/data-solutions/data-platform-foundations/locals-03-orchestration.tf +++ /dev/null @@ -1,50 +0,0 @@ -/** - * Copyright 2023 Google LLC - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -locals { - _orch_iam = flatten([ - for principal, roles in local.orch_iam : [ - for role in roles : { - key = "${principal}-${role}" - principal = principal - role = role - } - ] - ]) - orch_iam_additive = { - for binding in local._orch_iam : binding.key => { - role = binding.role - member = local.iam_principals[binding.principal] - } - } - orch_iam_auth = { - for binding in local._orch_iam : - binding.role => local.iam_principals[binding.principal]... - } - orch_subnet = ( - local.use_shared_vpc - ? var.network_config.subnet_self_links.orchestration - : values(module.orch-vpc[0].subnet_self_links)[0] - ) - orch_vpc = ( - local.use_shared_vpc - ? var.network_config.network_self_link - : module.orch-vpc[0].self_link - ) - # TODO: use new artifact registry module output - orch_docker_path = format("%s-docker.pkg.dev/%s/%s", - var.region, module.orch-project.project_id, module.orch-artifact-reg.name) -} diff --git a/blueprints/data-solutions/data-platform-foundations/locals-04-transformation.tf b/blueprints/data-solutions/data-platform-foundations/locals-04-transformation.tf deleted file mode 100644 index e6c92d8ef..000000000 --- a/blueprints/data-solutions/data-platform-foundations/locals-04-transformation.tf +++ /dev/null @@ -1,47 +0,0 @@ -/** - * Copyright 2023 Google LLC - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -locals { - _trf_iam = flatten([ - for principal, roles in local.trf_iam : [ - for role in roles : { - key = "${principal}-${role}" - principal = principal - role = role - } - ] - ]) - trf_iam_additive = { - for binding in local._trf_iam : binding.key => { - role = binding.role - member = local.iam_principals[binding.principal] - } - } - trf_iam_auth = { - for binding in local._trf_iam : - binding.role => local.iam_principals[binding.principal]... - } - transf_subnet = ( - local.use_shared_vpc - ? var.network_config.subnet_self_links.orchestration - : values(module.transf-vpc[0].subnet_self_links)[0] - ) - transf_vpc = ( - local.use_shared_vpc - ? var.network_config.network_self_link - : module.transf-vpc[0].self_link - ) -} diff --git a/blueprints/data-solutions/data-platform-foundations/locals-05-datawarehouse.tf b/blueprints/data-solutions/data-platform-foundations/locals-05-datawarehouse.tf deleted file mode 100644 index 47c91b1ac..000000000 --- a/blueprints/data-solutions/data-platform-foundations/locals-05-datawarehouse.tf +++ /dev/null @@ -1,69 +0,0 @@ -/** - * Copyright 2023 Google LLC - * - * Licensed under the Apache License, Version 2.0 (the "License"); - * you may not use this file except in compliance with the License. - * You may obtain a copy of the License at - * - * http://www.apache.org/licenses/LICENSE-2.0 - * - * Unless required by applicable law or agreed to in writing, software - * distributed under the License is distributed on an "AS IS" BASIS, - * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. - * See the License for the specific language governing permissions and - * limitations under the License. - */ - -locals { - _dwh_iam = flatten([ - for principal, roles in local.dwh_iam : [ - for role in roles : { - key = "${principal}-${role}" - principal = principal - role = role - } - ] - ]) - _lnd_iam = flatten([ - for principal, roles in local.lnd_iam : [ - for role in roles : { - key = "${principal}-${role}" - principal = principal - role = role - } - ] - ]) - dwh_iam_additive = { - for binding in local._dwh_iam : binding.key => { - role = binding.role - member = local.iam_principals[binding.principal] - } - } - dwh_iam_auth = { - for binding in local._dwh_iam : - binding.role => local.iam_principals[binding.principal]... - } - dwh_services = concat(var.project_services, [ - "bigquery.googleapis.com", - "bigqueryreservation.googleapis.com", - "bigquerystorage.googleapis.com", - "cloudkms.googleapis.com", - "compute.googleapis.com", - "dataflow.googleapis.com", - "datalineage.googleapis.com", - "pubsub.googleapis.com", - "servicenetworking.googleapis.com", - "storage.googleapis.com", - "storage-component.googleapis.com" - ]) - lnd_iam_additive = { - for binding in local._lnd_iam : binding.key => { - role = binding.role - member = local.iam_principals[binding.principal] - } - } - lnd_iam_auth = { - for binding in local._lnd_iam : - binding.role => local.iam_principals[binding.principal]... - } -} diff --git a/blueprints/data-solutions/data-platform-foundations/main.tf b/blueprints/data-solutions/data-platform-foundations/main.tf deleted file mode 100644 index d3004ee4b..000000000 --- a/blueprints/data-solutions/data-platform-foundations/main.tf +++ /dev/null @@ -1,85 +0,0 @@ -# Copyright 2024 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# tfdoc:file:description Core locals. - -locals { - # we cannot reference service accounts directly as they are dynamic - _shared_vpc_bindings = { - "roles/compute.networkUser" = [ - "load-robot-df", "load-sa-df-worker", - "orch-cloudservices", "orch-robot-df", "orch-robot-gke", - "transf-robot-df", "transf-sa-df-worker", - ] - "roles/composer.sharedVpcAgent" = [ - "orch-robot-cs" - ] - "roles/container.hostServiceAgentUser" = [ - "orch-robot-df", "orch-robot-gke" - ] - } - groups = { - for k, v in var.groups : k => "${v}@${var.organization_domain}" - } - groups_iam = { - for k, v in local.groups : k => "group:${v}" - } - iam_principals = { - data_analysts = "group:${local.groups.data-analysts}" - data_engineers = "group:${local.groups.data-engineers}" - data_security = "group:${local.groups.data-security}" - robots_cloudbuild = module.orch-project.service_agents.cloudbuild.iam_email - robots_composer = module.orch-project.service_agents.composer.iam_email - robots_dataflow_load = module.load-project.service_agents.dataflow.iam_email - robots_dataflow_trf = module.transf-project.service_agents.dataflow.iam_email - sa_df_build = module.orch-sa-df-build.iam_email - sa_drop_bq = module.drop-sa-bq-0.iam_email - sa_drop_cs = module.drop-sa-cs-0.iam_email - sa_drop_ps = module.drop-sa-ps-0.iam_email - sa_load = module.load-sa-df-0.iam_email - sa_orch = module.orch-sa-cmp-0.iam_email - sa_transf_bq = module.transf-sa-bq-0.iam_email, - sa_transf_df = module.transf-sa-df-0.iam_email, - } - project_suffix = var.project_suffix == null ? "" : "-${var.project_suffix}" - shared_vpc_project = try(var.network_config.host_project, null) - # this is needed so that for_each only uses static values - shared_vpc_role_members = { - load-robot-df = module.load-project.service_agents.dataflow.iam_email - load-sa-df-worker = module.load-sa-df-0.iam_email - orch-cloudservices = module.orch-project.service_agents.cloudservices.iam_email - orch-robot-cs = module.orch-project.service_agents.composer.iam_email - orch-robot-df = module.orch-project.service_agents.dataflow.iam_email - orch-robot-gke = module.orch-project.service_agents.container-engine.iam_email - transf-robot-df = module.transf-project.service_agents.dataflow.iam_email - transf-sa-df-worker = module.transf-sa-df-0.iam_email - } - # reassemble in a format suitable for for_each - shared_vpc_bindings_map = { - for binding in flatten([ - for role, members in local._shared_vpc_bindings : [ - for member in members : { role = role, member = member } - ] - ]) : "${binding.role}-${binding.member}" => binding - } - use_projects = !var.project_config.project_create - use_shared_vpc = var.network_config != null -} - -resource "google_project_iam_member" "shared_vpc" { - for_each = local.use_shared_vpc ? local.shared_vpc_bindings_map : {} - project = var.network_config.host_project - role = each.value.role - member = lookup(local.shared_vpc_role_members, each.value.member) -} diff --git a/blueprints/data-solutions/data-platform-foundations/outputs.tf b/blueprints/data-solutions/data-platform-foundations/outputs.tf deleted file mode 100644 index ad0f9c4cd..000000000 --- a/blueprints/data-solutions/data-platform-foundations/outputs.tf +++ /dev/null @@ -1,114 +0,0 @@ -# Copyright 2024 Google LLC -# -# Licensed under the Apache License, Version 2.0 (the "License"); -# you may not use this file except in compliance with the License. -# You may obtain a copy of the License at -# -# https://www.apache.org/licenses/LICENSE-2.0 -# -# Unless required by applicable law or agreed to in writing, software -# distributed under the License is distributed on an "AS IS" BASIS, -# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. -# See the License for the specific language governing permissions and -# limitations under the License. - -# tfdoc:file:description Output variables. -output "bigquery-datasets" { - description = "BigQuery datasets." - value = { - drop-bq-0 = module.drop-bq-0.dataset_id, - dwh-landing-bq-0 = module.dwh-lnd-bq-0.dataset_id, - dwh-curated-bq-0 = module.dwh-cur-bq-0.dataset_id, - dwh-confidential-bq-0 = module.dwh-conf-bq-0.dataset_id, - } -} - -output "demo_commands" { - description = "Demo commands. Relevant only if Composer is deployed." - value = { - 01 = "gsutil -i ${module.drop-sa-cs-0.email} cp demo/data/*.csv gs://${module.drop-cs-0.name}" - 02 = try("gsutil -i ${module.orch-sa-cmp-0.email} cp demo/data/*.j* gs://${module.orch-cs-0.name}", "Composer not deployed.") - 03 = try("gsutil -i ${module.orch-sa-cmp-0.email} cp demo/*.py ${google_composer_environment.orch-cmp-0[0].config[0].dag_gcs_prefix}/", "Composer not deployed") - 04 = < diff --git a/fast/stages/1-resman/data/stage-2/networking.yaml b/fast/stages/1-resman/data/stage-2/networking.yaml index 24b472de1..08c9216ea 100644 --- a/fast/stages/1-resman/data/stage-2/networking.yaml +++ b/fast/stages/1-resman/data/stage-2/networking.yaml @@ -50,8 +50,53 @@ folder_config: 'roles/compute.networkUser', 'roles/composer.sharedVpcAgent', 'roles/container.hostServiceAgentUser', 'roles/vpcaccess.user' ]) - # iam_bindings_additive for stage 3 are added here when needed - # refer to each stage 3 documentation for snippets and examples + # example conditional grants for stage 3s + iam_bindings_additive: {} + # Data Platform (dev) + # dp_dev_net_admin: + # role: service_project_network_admin + # member: data-platform-dev-rw + # condition: + # title: Data platform dev service project admin. + # expression: | + # resource.matchTag('${organization.id}/${tag_names.environment}', 'development') + # dp_dev_net_viewer: + # role: roles/compute.networkViewer + # member: data-platform-dev-ro + # condition: + # title: Data platform dev network viewer. + # expression: | + # resource.matchTag('${organization.id}/${tag_names.environment}', 'development') + # GCVE (dev) + # gcve_dev_net_admin: + # role: gcve_network_admin + # member: gcve-dev-rw + # condition: + # title: GCVE dev network admin. + # expression: | + # resource.matchTag('${organization.id}/${tag_names.environment}', 'development') + # gcve_dev_net_viewer: + # role: gcve_network_viewer + # member: gcve-dev-ro + # condition: + # title: GCVE dev network viewer. + # expression: | + # resource.matchTag('${organization.id}/${tag_names.environment}', 'development') + # GKE (dev) + # gke_dns_admin: + # role: roles/dns.admin + # member: gke-dev-ro + # condition: + # title: GKE dev DNS admin. + # expression: | + # resource.matchTag('${organization.id}/${tag_names.environment}', 'development') + # gke_dns_reader: + # role: roles/dns.reader + # member: gke-dev-ro + # condition: + # title: GKE dev DNS reader. + # expression: | + # resource.matchTag('${organization.id}/${tag_names.environment}', 'development') organization_config: iam_bindings_additive: sa_net_rw_fw_policy_admin: @@ -69,5 +114,15 @@ organization_config: sa_net_ro_ngfw_enterprise_viewer: member: ro role: ngfw_enterprise_viewer -# stage_3_config for IAM delegation are added here when needed -# refer to each stage 3 documentation for snippets and examples +# example configuration for stage 3s needing environment-level conditional grants +# stage3_config: +# iam_admin_delegated: +# - environment: dev +# principal: gcve-dev-rw +# - environment: dev +# principal: data-platform-dev-rw +# iam_viewer: +# - environment: dev +# principal: gcve-dev-ro +# - environment: dev +# principal: data-platform-dev-ro diff --git a/fast/stages/1-resman/data/stage-2/security.yaml b/fast/stages/1-resman/data/stage-2/security.yaml index b0c42fc2c..1c7531989 100644 --- a/fast/stages/1-resman/data/stage-2/security.yaml +++ b/fast/stages/1-resman/data/stage-2/security.yaml @@ -35,7 +35,6 @@ folder_config: - project_iam_viewer gcp-security-admins: - roles/editor - # project factory delegated IAM grant iam_bindings: project_factory: @@ -53,3 +52,13 @@ organization_config: sa_sec_cloudasset: member: rw role: roles/cloudasset.viewer +# example configuration for stage 3s needing environment-level conditional grants +# stage3_config: + # iam_admin_delegated: + # - environment: dev + # principal: data-platform-dev-rw + # iam_viewer: + # - environment: dev + # principal: data-platform-dev-ro + # - environment: dev + # principal: data-platform-dev-rw diff --git a/fast/stages/1-resman/data/stage-3/data-platform-dev.yaml b/fast/stages/1-resman/data/stage-3/data-platform-dev.yaml new file mode 100644 index 000000000..246f381ee --- /dev/null +++ b/fast/stages/1-resman/data/stage-3/data-platform-dev.yaml @@ -0,0 +1,21 @@ +# Copyright 2024 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# yaml-language-server: $schema=../../schemas/fast-stage3.schema.json + +short_name: dp +environment: dev +folder_config: + name: Development + parent_id: data-platform diff --git a/fast/stages/1-resman/data/top-level-folders/data-platform.yaml b/fast/stages/1-resman/data/top-level-folders/data-platform.yaml new file mode 100644 index 000000000..850273d3e --- /dev/null +++ b/fast/stages/1-resman/data/top-level-folders/data-platform.yaml @@ -0,0 +1,17 @@ +# Copyright 2024 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# yaml-language-server: $schema=../../schemas/top-level-folder.schema.json + +name: Data Platform diff --git a/fast/stages/1-resman/outputs.tf b/fast/stages/1-resman/outputs.tf index ca55dc9be..27f3abdd7 100644 --- a/fast/stages/1-resman/outputs.tf +++ b/fast/stages/1-resman/outputs.tf @@ -91,6 +91,11 @@ output "service_accounts" { value = local.service_accounts } +output "tag_values" { + description = "Tag values." + value = local.tfvars.tag_values +} + # ready to use variable values for subsequent stages output "tfvars" { description = "Terraform variable files for the following stages." diff --git a/fast/stages/2-networking-a-simple/README.md b/fast/stages/2-networking-a-simple/README.md index 55a57fbd7..25e48401c 100644 --- a/fast/stages/2-networking-a-simple/README.md +++ b/fast/stages/2-networking-a-simple/README.md @@ -513,7 +513,7 @@ DNS configurations are centralised in the `dns-*.tf` files. Spokes delegate DNS | [regions](variables.tf#L106) | Region definitions. | object({…}) | | {…} | | | [security_profile_groups](variables-fast.tf#L86) | Security profile group ids used for policy rule substitutions. | map(string) | | {} | 2-networking-ngfw | | [spoke_configs](variables.tf#L118) | Spoke connectivity configurations. | object({…}) | | {…} | | -| [stage_config](variables-fast.tf#L94) | FAST stage configuration. | object({…}) | | {} | 1-resman | +| [stage_configs](variables-fast.tf#L94) | FAST stage configuration. | object({…}) | | {} | 1-resman | | [tag_values](variables-fast.tf#L108) | Root-level tag values. | map(string) | | {} | 1-resman | | [vpc_configs](variables.tf#L187) | Optional VPC network configurations. | object({…}) | | {} | | | [vpn_onprem_primary_config](variables.tf#L240) | VPN gateway configuration for onprem interconnection in the primary region. | object({…}) | | null | | diff --git a/fast/stages/2-networking-a-simple/main.tf b/fast/stages/2-networking-a-simple/main.tf index d3666731b..2e2d75e64 100644 --- a/fast/stages/2-networking-a-simple/main.tf +++ b/fast/stages/2-networking-a-simple/main.tf @@ -31,10 +31,10 @@ locals { "roles/vpcaccess.user", ])) iam_admin_delegated = try( - var.stage_config["networking"].iam_admin_delegated, {} + var.stage_configs["networking"].iam_admin_delegated, {} ) iam_viewer = try( - var.stage_config["networking"].iam_viewer, {} + var.stage_configs["networking"].iam_viewer, {} ) # combine all regions from variables and subnets regions = distinct(concat( diff --git a/fast/stages/2-networking-a-simple/variables-fast.tf b/fast/stages/2-networking-a-simple/variables-fast.tf index 56c29440b..aeb17e92e 100644 --- a/fast/stages/2-networking-a-simple/variables-fast.tf +++ b/fast/stages/2-networking-a-simple/variables-fast.tf @@ -91,7 +91,7 @@ variable "security_profile_groups" { default = {} } -variable "stage_config" { +variable "stage_configs" { # tfdoc:variable:source 1-resman description = "FAST stage configuration." type = object({ diff --git a/fast/stages/2-networking-b-nva/README.md b/fast/stages/2-networking-b-nva/README.md index b085e9249..d75bcfece 100644 --- a/fast/stages/2-networking-b-nva/README.md +++ b/fast/stages/2-networking-b-nva/README.md @@ -575,7 +575,7 @@ DNS configurations are centralised in the `dns-*.tf` files. Spokes delegate DNS | [psa_ranges](variables.tf#L114) | IP ranges used for Private Service Access (e.g. CloudSQL). Ranges is in name => range format. | object({…}) | | {} | | | [regions](variables.tf#L134) | Region definitions. | object({…}) | | {…} | | | [security_profile_groups](variables-fast.tf#L86) | Security profile group ids used for policy rule substitutions. | map(string) | | {} | 2-networking-ngfw | -| [stage_config](variables-fast.tf#L94) | FAST stage configuration. | object({…}) | | {} | 1-resman | +| [stage_configs](variables-fast.tf#L94) | FAST stage configuration. | object({…}) | | {} | 1-resman | | [tag_values](variables-fast.tf#L108) | Root-level tag values. | map(string) | | {} | 1-resman | | [vpc_configs](variables.tf#L146) | Optional VPC network configurations. | object({…}) | | {} | | | [vpn_onprem_primary_config](variables.tf#L229) | VPN gateway configuration for onprem interconnection in the primary region. | object({…}) | | null | | diff --git a/fast/stages/2-networking-b-nva/main.tf b/fast/stages/2-networking-b-nva/main.tf index 447bb3b1a..c500bbc1d 100644 --- a/fast/stages/2-networking-b-nva/main.tf +++ b/fast/stages/2-networking-b-nva/main.tf @@ -30,10 +30,10 @@ locals { "roles/vpcaccess.user", ])) iam_admin_delegated = try( - var.stage_config["networking"].iam_admin_delegated, {} + var.stage_configs["networking"].iam_admin_delegated, {} ) iam_viewer = try( - var.stage_config["networking"].iam_viewer, {} + var.stage_configs["networking"].iam_viewer, {} ) # select the NVA ILB as next hop for spoke VPC routing depending on net mode nva_load_balancers = (var.network_mode == "ncc_ra") ? null : { diff --git a/fast/stages/2-networking-b-nva/variables-fast.tf b/fast/stages/2-networking-b-nva/variables-fast.tf index 56c29440b..aeb17e92e 100644 --- a/fast/stages/2-networking-b-nva/variables-fast.tf +++ b/fast/stages/2-networking-b-nva/variables-fast.tf @@ -91,7 +91,7 @@ variable "security_profile_groups" { default = {} } -variable "stage_config" { +variable "stage_configs" { # tfdoc:variable:source 1-resman description = "FAST stage configuration." type = object({ diff --git a/fast/stages/2-networking-c-separate-envs/README.md b/fast/stages/2-networking-c-separate-envs/README.md index 2bd8365c4..b4450a51a 100644 --- a/fast/stages/2-networking-c-separate-envs/README.md +++ b/fast/stages/2-networking-c-separate-envs/README.md @@ -371,7 +371,7 @@ Regions are defined via the `regions` variable which sets up a mapping between t | [psa_ranges](variables.tf#L87) | IP ranges used for Private Service Access (e.g. CloudSQL). | object({…}) | | {} | | | [regions](variables.tf#L107) | Region definitions. | object({…}) | | {…} | | | [security_profile_groups](variables-fast.tf#L86) | Security profile group ids used for policy rule substitutions. | map(string) | | {} | 2-networking-ngfw | -| [stage_config](variables-fast.tf#L94) | FAST stage configuration. | object({…}) | | {} | 1-resman | +| [stage_configs](variables-fast.tf#L94) | FAST stage configuration. | object({…}) | | {} | 1-resman | | [tag_values](variables-fast.tf#L108) | Root-level tag values. | map(string) | | {} | 1-resman | | [vpc_configs](variables.tf#L117) | Optional VPC network configurations. | object({…}) | | {} | | | [vpn_onprem_dev_primary_config](variables.tf#L155) | VPN gateway configuration for onprem interconnection from dev in the primary region. | object({…}) | | null | | diff --git a/fast/stages/2-networking-c-separate-envs/main.tf b/fast/stages/2-networking-c-separate-envs/main.tf index 5422cfaa4..8861a798f 100644 --- a/fast/stages/2-networking-c-separate-envs/main.tf +++ b/fast/stages/2-networking-c-separate-envs/main.tf @@ -31,10 +31,10 @@ locals { "roles/vpcaccess.user", ])) iam_admin_delegated = try( - var.stage_config["networking"].iam_admin_delegated, {} + var.stage_configs["networking"].iam_admin_delegated, {} ) iam_viewer = try( - var.stage_config["networking"].iam_viewer, {} + var.stage_configs["networking"].iam_viewer, {} ) # combine all regions from variables and subnets regions = distinct(concat( diff --git a/fast/stages/2-networking-c-separate-envs/variables-fast.tf b/fast/stages/2-networking-c-separate-envs/variables-fast.tf index 56c29440b..aeb17e92e 100644 --- a/fast/stages/2-networking-c-separate-envs/variables-fast.tf +++ b/fast/stages/2-networking-c-separate-envs/variables-fast.tf @@ -91,7 +91,7 @@ variable "security_profile_groups" { default = {} } -variable "stage_config" { +variable "stage_configs" { # tfdoc:variable:source 1-resman description = "FAST stage configuration." type = object({ diff --git a/fast/stages/2-security/README.md b/fast/stages/2-security/README.md index d158cd162..b6b589356 100644 --- a/fast/stages/2-security/README.md +++ b/fast/stages/2-security/README.md @@ -285,16 +285,15 @@ tls_inspection = { |---|---|:---:|:---:|:---:|:---:| | [automation](variables-fast.tf#L17) | Automation resources created by the bootstrap stage. | object({…}) | ✓ | | 0-bootstrap | | [billing_account](variables-fast.tf#L25) | Billing account id. If billing account is not part of the same org set `is_org_level` to false. | object({…}) | ✓ | | 0-bootstrap | -| [environments](variables-fast.tf#L47) | Environment names. | map(object({…})) | ✓ | | 0-globals | -| [folder_ids](variables-fast.tf#L65) | Folder name => id mappings, the 'security' folder name must exist. | object({…}) | ✓ | | 1-resman | -| [prefix](variables-fast.tf#L75) | Prefix used for resources that need unique names. Use a maximum of 9 chars for organizations, and 11 chars for tenants. | string | ✓ | | 0-bootstrap | +| [environments](variables-fast.tf#L38) | Environment names. | map(object({…})) | ✓ | | 0-globals | +| [folder_ids](variables-fast.tf#L56) | Folder name => id mappings, the 'security' folder name must exist. | object({…}) | ✓ | | 1-resman | +| [prefix](variables-fast.tf#L66) | Prefix used for resources that need unique names. Use a maximum of 9 chars for organizations, and 11 chars for tenants. | string | ✓ | | 0-bootstrap | | [certificate_authorities](variables.tf#L17) | Certificate Authority Service pool and CAs. If environments is null identical pools and CAs are created in all environments. | map(object({…})) | | {} | | -| [custom_roles](variables-fast.tf#L38) | Custom roles defined at the org level, in key => id format. | object({…}) | | null | 0-bootstrap | | [essential_contacts](variables.tf#L98) | Email used for essential contacts, unset if null. | string | | null | | | [kms_keys](variables.tf#L104) | KMS keys to create, keyed by name. | map(object({…})) | | {} | | | [outputs_location](variables.tf#L142) | Path where providers, tfvars files, and lists for the following stages are written. Leave empty to disable. | string | | null | | -| [stage_configs](variables-fast.tf#L85) | FAST stage configuration. | object({…}) | | {} | 1-resman | -| [tag_values](variables-fast.tf#L99) | Root-level tag values. | map(string) | | {} | 1-resman | +| [stage_configs](variables-fast.tf#L76) | FAST stage configuration. | object({…}) | | {} | 1-resman | +| [tag_values](variables-fast.tf#L90) | Root-level tag values. | map(string) | | {} | 1-resman | ## Outputs diff --git a/fast/stages/2-security/main.tf b/fast/stages/2-security/main.tf index 139e2992e..bbb653925 100644 --- a/fast/stages/2-security/main.tf +++ b/fast/stages/2-security/main.tf @@ -63,18 +63,18 @@ module "project" { } # optionally delegate a fixed set of IAM roles to selected principals iam = { - (var.custom_roles.project_iam_viewer) = try( + "roles/iam.securityReviewer" = try( local.iam_viewer[each.key], [] ) } iam_bindings = ( lookup(local.iam_admin_delegated, each.key, null) == null ? {} : { sa_delegated_grants = { - role = "roles/resourcemanager.projectIamAdmin" + role = "roles/cloudkms.admin" members = try(local.iam_admin_delegated[each.key], []) condition = { title = "${each.key}_stage3_sa_delegated_grants" - description = "${var.environments[each.key].name} project delegated grants." + description = "${var.environments[each.key].name} KMS delegated grants." expression = format( "api.getAttribute('iam.googleapis.com/modifiedGrantsByRole', []).hasOnly([%s])", local.iam_delegated diff --git a/fast/stages/2-security/variables-fast.tf b/fast/stages/2-security/variables-fast.tf index 406c449ab..0930cbfec 100644 --- a/fast/stages/2-security/variables-fast.tf +++ b/fast/stages/2-security/variables-fast.tf @@ -35,15 +35,6 @@ variable "billing_account" { } } -variable "custom_roles" { - # tfdoc:variable:source 0-bootstrap - description = "Custom roles defined at the org level, in key => id format." - type = object({ - project_iam_viewer = string - }) - default = null -} - variable "environments" { # tfdoc:variable:source 0-globals description = "Environment names." diff --git a/fast/stages/3-data-platform-dev/.fast-stage.env b/fast/stages/3-data-platform-dev/.fast-stage.env new file mode 100644 index 000000000..c9f86b414 --- /dev/null +++ b/fast/stages/3-data-platform-dev/.fast-stage.env @@ -0,0 +1,5 @@ +FAST_STAGE_DESCRIPTION="Data Platform (dev)" +FAST_STAGE_LEVEL=3 +FAST_STAGE_NAME=data-platform-dev +FAST_STAGE_DEPS="0-globals 0-bootstrap 1-resman" +FAST_STAGE_OPTIONAL="2-networking 2-security" diff --git a/fast/stages/3-data-platform-dev/README.md b/fast/stages/3-data-platform-dev/README.md new file mode 100644 index 000000000..23ea0952b --- /dev/null +++ b/fast/stages/3-data-platform-dev/README.md @@ -0,0 +1,261 @@ +# Data Platform + +This stage allows creation and management of a Data Platform, which enables the implementation of a reliable, robust, and scalable environment to support the onboarding of new data products (or data workloads) over time. + +The code provided here sets up the foundational design and centralizes sharing patterns for data, leaving the specifics of data handling, computation, and processing to the individual data products. + +This solution implements the [Data Mesh principles on Google Cloud Platform](https://cloud.google.com/architecture/data-mesh) and relies on the higher level FAST stages for the resource hierarchy, networking, and security. It's also possible to run this stage in isolation by providing it the required prerequisites. + + +- [Project Structure](#project-structure) + - [Central Shared Services](#central-shared-services) + - [Data Domains](#data-domains) + - [Data Products](#data-products) +- [Teams and personas](#teams-and-personas) +- [Configuration](#configuration) + - [FAST prerequisites](#fast-prerequisites) + - [Stage Variables](#stage-variables) + - [Data Domain and Product Data Files](#data-domain-and-product-data-files) + - [Context replacements](#context-replacements) +- [Files](#files) +- [Variables](#variables) +- [Outputs](#outputs) + + +## Project Structure + +The stage manages three separate high level components: + +- a central project, where aspect types, policy tags, and resource manager tags are defined +- one or more data domains, each composed of a folder, a project hosting resources shared at the product level (Composer), and a folder hosting data products +- one or more data products per domain, each composed of a project, and optional exposed resources + +The platform high level approach is represented in the following diagram: + +

+ High level diagram. +

+ +### Central Shared Services + +Central teams manage the data mesh by providing cross-domain oversight, services, and governance. They reduce the operational burden for data domains in producing and consuming data products, and facilitate the cross-domain relationships that are required for the data mesh to operate. + +A central project is created to host resources managed by the central team, which provide core and platform-wide capabilities such as Secure Tags, [Dataplex Catalog aspects)[https://cloud.google.com/dataplex/docs/enrich-entries-metadata], and [Policy tags](https://cloud.google.com/bigquery/docs/best-practices-policy-tags). + +### Data Domains + +Data Domains are usually aligned with business or functional units within an enterprise. Common examples of business domains might be the mortgage department in a bank, or the customer, distribution, finance, or HR departments of an enterprise. + +Data Domain creation is centrally managed by this stage, with a dedicated folder sub-hierarchy and project for each logical domain. This provides a clear organizational boundary and allows for IAM and resource separation, which usually maps to an actual line of business. + +A dedicated Data Domain project is created as the primary container for all services and resources specific to each domain. A shared Cloud Composer environment is also created to orchestrate domain-specific workflows, and provided by default with access to the domain's Data Products via impersonation. + +### Data Products + +One or more Data Products can be mapped to each Data Domain. A dedicated project is created for each product in its domain's hierarchy, enforcing modularity, scalability, flexibility and clear ownership boundaries. + +The per-product BigQuery and Cloud Storage exposure layers can then be deployed in each project, by binding the centrally managed secure tags connected to platform-level IAM bindings. + +## Teams and personas + +Clear operational role profiles must be defined for a data mesh to operate well, with each profile mapping to a team archetype or function. These profiles implement the core user journeys for each data mesh actor. This stage comes with three predefined profiles, which are meant as a starting example open to customizations. + +> TODO: add folder/project roles + +The three main functions identified here are: + +- **Central data team** + Defines and enforces the data platform structure and data governance policies among data producers, ensuring high data quality and data trustworthiness for consumers. This team is often referred to as the Data Governance team. +- **Data domain teams** + Aligned with specific business domains, these teams are responsible for creating and maintaining data products over their lifecycle. This includes defining the data product's purpose, scope, and boundaries, developing and maintaining a product roadmap, implementing data security measures, ensuring compliance, and monitoring usage and performance. +- **Data Product teams** + Aligned with a specific data products, these teams are responsible for developing, operating and maintaing the data product. + +## Configuration + +### FAST prerequisites + +This stage needs specific permissions granted to its automation service accounts on networking and security resources. + +Network permissions are needed to associate data domain or product projects to Shared VPC hosts and grant network permissions to data platform managed service accounts. They are mandatory when deploying Composer. + +Security permissions are only needed when using CMEK encryption, to grant the relevant IAM roles to data platform service agents on the encryption keys used. + +The networking and security configuration need to be defined in the resource management stage via specific YAML code blocks: two are needed for networking, and one for security. + +The first networking code block grants the relevant roles on the Networking folder to the Data Platform service accounts, with a condition on the environment tag. + +```yaml +# make sure this block exists in the data/stage-2/networking.yaml file + iam_bindings_additive: + # Data Platform (dev) + dp_dev_net_admin: + role: service_project_network_admin + member: data-platform-dev-rw + condition: + title: Data platform dev service project admin. + expression: | + resource.matchTag('${organization.id}/${tag_names.environment}', 'development') + dp_dev_net_viewer: + role: roles/compute.networkViewer + member: data-platform-dev-ro + condition: + title: Data platform dev network viewer. + expression: | + resource.matchTag('${organization.id}/${tag_names.environment}', 'development') +``` + +The second networking code block signals the networking stage that the Data Platform service accounts need delegated IAM grants on the dev network project, in order to be able to assign specific roles on it. + +```yaml +# make sure this block exists in the data/stage-2/networking.yaml file +stage3_config: + iam_admin_delegated: + - environment: dev + principal: data-platform-dev-rw + iam_viewer: + - environment: dev + principal: data-platform-dev-ro +``` + +For security, a block similar to the one above is needed. + +```yaml +# make sure this block exists in the data/stage-2/security.yaml file +stage3_config: + iam_admin_delegated: + - environment: dev + principal: data-platform-dev-rw + iam_viewer: + - environment: dev + principal: data-platform-dev-ro +``` + +Once the two above configurations are in place, apply the resource management, networking and security stages in succession. Be sure to refresh the tfvars files in the network and security stages if needed (e.g. by re-running `fast-links.sh`). + +### Stage Variables + +The default data files provided as an example makes a few assumptions that needs to be matched by corresponding variables configured for the stage: + +- the `location` variable needs to be explicitly configured, as it's used as a default location for buckets, datasets, and Composer; locations can be individually overridden but a default needs to be in place +- the domain `deploy_config.composer.node_config.subnetwork` attribute neeeds to match the location defined above; Composer network and subnetwork use interpolation from FAST networking outputs, explicit IDs can be used instead if needed +- IAM roles for the domain and product refer to generic `dp-product-a-0` and `data-consumer-bi` groups, these need to be defined via the `factories_config.context.iam_principals` variable, or changed to explicit IAM principals (e.g. `group:foo@example.com`) + +### Data Domain and Product Data Files + +The formats for both types of data files are controlled via [schemas](./schemas/), which can generally be used directly in development environments to provide error checking and autocompletion. + +### Context replacements + +This stage is designed so that factory files are as much as possible organization and resource agnostic, so that they can be portable across installations (e.g. for different environments, or partner/customer organizations). + +This is mostly achieved via context replacements in factory files, where IAM principals and a few other attributes can use short names from the `factories_config.context` variable or from internally managed resources, which are then expanded to full principals at runtime. + +For example, configuring the `factories_config.context` variable: + +```hcl +factories_config = { + context = { + iam_principals = { + data-consumer-bi = "group:data-consumer-bi@example.com" + } + } +} +``` + +Allows using the group short name in templates: + +```yaml +folder_config: + iam_by_principals: + data-consumer-bi: + - roles/datacatalog.viewer + - roles/dataplex.catalogViewer + - roles/datalineage.viewer +``` + +Or within a data domain definition, service accounts can be referenced in project-level IAM via their short name: + +```yaml +service_accounts: + rw: + description: Automation (rw). +project_config: + iam: + roles/owner: + - rw +``` + +The following table lists the available substitutions. + +| resource | attributes | context expansions | +| --------------- | ----------------------- | -------------------------------------------------------------------------------------- | +| central project | IAM principals | `var.factories_config.context.iam_principals` | +| central project | tag IAM principals | `var.factories_config.context.iam_principals` | +| domain folder | IAM principals | `var.factories_config.context.iam_principals` | +| domain project | shared VPC host project | FAST VPC hosts | +| domain project | IAM principals | `var.factories_config.context.iam_principals` | +| domain sa | IAM principals | `var.factories_config.context.iam_principals`
domain service accounts | +| product project | shared VPC host project | FAST VPC hosts | +| product project | IAM principals | `var.factories_config.context.iam_principals`
product service accounts | +| product project | IAM conditions | `var.factories_config.context.iam_tag_values`
FAST tag values
exposure tag value | +| product sa | IAM principals | `var.factories_config.context.iam_principals` | +| composer | shared VPC network | FAST VPCs | +| composer | shared VPC subnetwork | FAST subnets | +| composer | encryption key | `var.factories_config.context.encryption_keys`
FAST KMS keys | +| exposed bucket | encryption key | `var.factories_config.context.encryption_keys`
FAST KMS keys | +| exposed dataset | encryption key | `var.factories_config.context.encryption_keys`
FAST KMS keys | + + + +## Files + +| name | description | modules | resources | +|---|---|---|---| +| [data-domains-automation.tf](./data-domains-automation.tf) | Data product automation resources. | gcs · iam-service-account | | +| [data-domains-composer.tf](./data-domains-composer.tf) | None | iam-service-account | google_composer_environment | +| [data-domains.tf](./data-domains.tf) | None | folder · iam-service-account · project | | +| [data-products-automation.tf](./data-products-automation.tf) | Data product automation resources. | gcs · iam-service-account | | +| [data-products-exposure.tf](./data-products-exposure.tf) | Data product exposure layer resources. | bigquery-dataset · gcs | | +| [data-products.tf](./data-products.tf) | Data product project, service account and exposed resources. | iam-service-account · project | | +| [factory.tf](./factory.tf) | None | | | +| [main.tf](./main.tf) | Locals and project-level resources. | data-catalog-policy-tag · dataplex-aspect-types · project | | +| [outputs.tf](./outputs.tf) | Stage outputs. | | google_storage_bucket_object · local_file | +| [variables-fast.tf](./variables-fast.tf) | None | | | +| [variables.tf](./variables.tf) | Module variables. | | | + +## Variables + +| name | description | type | required | default | producer | +|---|---|:---:|:---:|:---:|:---:| +| [automation](variables-fast.tf#L17) | Automation resources created by the bootstrap stage. | object({…}) | ✓ | | 0-bootstrap | +| [billing_account](variables-fast.tf#L25) | Billing account id. If billing account is not part of the same org set `is_org_level` to false. | object({…}) | ✓ | | 0-bootstrap | +| [environments](variables-fast.tf#L33) | Environment names. | object({…}) | ✓ | | 1-resman | +| [prefix](variables-fast.tf#L68) | Prefix used for resources that need unique names. Use a maximum of 9 chars for organizations, and 11 chars for tenants. | string | ✓ | | 0-bootstrap | +| [aspect_types](variables.tf#L17) | Aspect templates. Merged with those defined via the factory. | map(object({…})) | | {} | | +| [central_project_config](variables.tf#L48) | Configuration for the top-level central project. | object({…}) | | {} | | +| [encryption_keys](variables.tf#L84) | Default encryption keys for services, in service => { region => key id } format. Overridable on a per-object basis. | object({…}) | | {} | | +| [exposure_config](variables.tf#L95) | Data exposure configuration. | object({…}) | | {} | | +| [factories_config](variables.tf#L113) | Configuration for the resource factories. | object({…}) | | {} | | +| [folder_ids](variables-fast.tf#L44) | Folder name => id mappings. | map(string) | | {} | 1-resman | +| [host_project_ids](variables-fast.tf#L52) | Shared VPC host project name => id mappings. | map(string) | | {} | 2-networking | +| [kms_keys](variables-fast.tf#L60) | KMS key ids. | map(string) | | {} | 2-security | +| [location](variables.tf#L128) | Default location used when no location is specified. | string | | "europe-west1" | | +| [outputs_location](variables.tf#L135) | Enable writing provider, tfvars and CI/CD workflow files to local filesystem. Leave null to disable. | string | | null | | +| [regions](variables-fast.tf#L78) | Region mappings. | map(string) | | {} | 2-networking | +| [secure_tags](variables.tf#L141) | Resource manager tags created in the central project. | map(object({…})) | | {} | | +| [stage_config](variables.tf#L162) | Stage configuration used to find environment and resource ids, and to generate names. | object({…}) | | {…} | | +| [subnet_self_links](variables-fast.tf#L86) | Subnet VPC name => { name => self link } mappings. | map(map(string)) | | {} | 2-networking | +| [tag_values](variables-fast.tf#L94) | FAST-managed resource manager tag values. | map(string) | | {} | 1-resman | +| [vpc_self_links](variables-fast.tf#L102) | Shared VPC name => self link mappings. | map(string) | | {} | 2-networking | + +## Outputs + +| name | description | sensitive | consumers | +|---|---|:---:|---| +| [aspect_types](outputs.tf#L191) | Aspect types defined in central project. | | | +| [central_project](outputs.tf#L196) | Central project attributes. | | | +| [data_domains](outputs.tf#L201) | Data domain attributes. | | | +| [policy_tags](outputs.tf#L206) | Policy tags defined in central project. | | | +| [secure_tags](outputs.tf#L211) | Secure tags defined in central project. | | | + diff --git a/fast/stages/3-data-platform-dev/data-domains-automation.tf b/fast/stages/3-data-platform-dev/data-domains-automation.tf new file mode 100644 index 000000000..841363a66 --- /dev/null +++ b/fast/stages/3-data-platform-dev/data-domains-automation.tf @@ -0,0 +1,63 @@ +/** + * Copyright 2025 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +# tfdoc:file:description Data product automation resources. + +module "dd-automation-bucket" { + source = "../../../modules/gcs" + for_each = { + for k, v in local.data_domains : + k => v if v.automation != null + } + project_id = module.dd-projects[each.key].project_id + prefix = local.prefix + name = "${each.value.short_name}-state" + location = try( + each.value.automation.location, + var.location + ) + iam = { + "roles/storage.admin" = [ + module.dd-automation-sa["${each.key}/rw"].iam_email + ] + "roles/storage.objectViewer" = concat( + [ + module.dd-automation-sa["${each.key}/ro"].iam_email + ], + [ + for m in each.value.automation.impersonation_principals : lookup( + var.factories_config.context.iam_principals, m, m + ) + ] + ) + } +} + +module "dd-automation-sa" { + source = "../../../modules/iam-service-account" + for_each = { for v in local.dd_automation_sa : v.key => v } + project_id = module.dd-projects[each.value.dd].project_id + prefix = each.value.prefix + name = each.value.name + description = each.value.description + iam = { + "roles/iam.serviceAccountTokenCreator" = [ + for m in each.value.impersonation_principals : lookup( + var.factories_config.context.iam_principals, m, m + ) + ] + } +} diff --git a/fast/stages/3-data-platform-dev/data-domains-composer.tf b/fast/stages/3-data-platform-dev/data-domains-composer.tf new file mode 100644 index 000000000..5d4d2df40 --- /dev/null +++ b/fast/stages/3-data-platform-dev/data-domains-composer.tf @@ -0,0 +1,123 @@ +/** + * Copyright 2025 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +locals { + dd_composer = { + for k, v in local.data_domains : k => merge( + { region = var.location, short_name = v.short_name }, + try(v.deploy_config.composer, {}) + ) + if( + try(v.deploy_config.composer.node_config.network, null) != null && + try(v.deploy_config.composer.node_config.subnetwork, null) != null + ) + } + dd_composer_keys = { + for k, v in local.dd_composer : k => try( + v.encryption_key, + var.encryption_keys.composer[v.region], + null + ) + } +} + +module "dd-composer-sa" { + source = "../../../modules/iam-service-account" + for_each = local.dd_composer + project_id = module.dd-projects[each.key].project_id + prefix = local.prefix + name = "${each.value.short_name}-cmp-sa" + description = "Composer Service Account." +} + +resource "google_composer_environment" "default" { + for_each = local.dd_composer + project = module.dd-projects-iam[each.key].project_id + name = "${var.prefix}-${each.key}" + region = each.value.region + config { + enable_private_builds_only = try(each.value.private_builds, true) + enable_private_environment = try(each.value.private_environment, true) + environment_size = try( + each.value.environment_size, + "ENVIRONMENT_SIZE_SMALL" + ) + dynamic "encryption_config" { + for_each = local.dd_composer_keys[each.key] == null ? [] : [""] + content { + kms_key_name = lookup( + local.kms_keys, + local.dd_composer_keys[each.key], + local.dd_composer_keys[each.key] + ) + } + } + # TODO: implement the same context fail mode used in the project factory + node_config { + service_account = try( + each.value.node_config.service_account, + module.dd-composer-sa[each.key].email + ) + network = try( + var.vpc_self_links[each.value.node_config.network], + each.value.node_config.network, + "-" + ) + subnetwork = try( + var.subnet_self_links[each.value.node_config.network][each.value.node_config.subnetwork], + each.value.node_config.subnetwork, + "-" + ) + } + software_config { + image_version = "composer-3-airflow-2" + cloud_data_lineage_integration { + enabled = true + } + } + workloads_config { + dag_processor { + cpu = try(each.value.workloads_config.dag_processor.cpu, 0.5) + memory_gb = try(each.value.workloads_config.dag_processor.memory_gb, 2) + storage_gb = try(each.value.workloads_config.dag_processor.storage_gb, 1) + count = try(each.value.workloads_config.dag_processor.count, 1) + } + scheduler { + cpu = try(each.value.workloads_config.scheduler.cpu, 0.5) + memory_gb = try(each.value.workloads_config.scheduler.memory_gb, 2) + storage_gb = try(each.value.workloads_config.scheduler.storage_gb, 1) + count = try(each.value.workloads_config.scheduler.count, 1) + } + triggerer { + cpu = try(each.value.workloads_config.triggerer.cpu, 0.5) + memory_gb = try(each.value.workloads_config.triggerer.memory_gb, 2) + count = try(each.value.workloads_config.triggerer.count, 1) + } + web_server { + cpu = try(each.value.workloads_config.web_server.cpu, 0.5) + memory_gb = try(each.value.workloads_config.web_server.memory_gb, 2) + storage_gb = try(each.value.workloads_config.web_server.storage_gb, 1) + } + worker { + cpu = try(each.value.workloads_config.worker.cpu, 0.5) + memory_gb = try(each.value.workloads_config.worker.memory_gb, 2) + storage_gb = try(each.value.workloads_config.worker.storage_gb, 1) + min_count = try(each.value.workloads_config.worker.min_count, 1) + max_count = try(each.value.workloads_config.worker.max_count, 1) + } + } + } +} diff --git a/fast/stages/3-data-platform-dev/data-domains.tf b/fast/stages/3-data-platform-dev/data-domains.tf new file mode 100644 index 000000000..64837b061 --- /dev/null +++ b/fast/stages/3-data-platform-dev/data-domains.tf @@ -0,0 +1,246 @@ +/** + * Copyright 2025 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +locals { + dd_services = { + for k, v in local.data_domains : k => distinct(concat( + v.project_config.services, + lookup(local.dd_composer, k, null) == null ? [] : [ + "composer.googleapis.com", + "storage.googleapis.com" + ] + )) + } +} + +module "dd-folders" { + source = "../../../modules/folder" + for_each = local.data_domains + parent = var.folder_ids[var.stage_config.name] + name = each.value.name + iam = { + for k, v in each.value.folder_config.iam : k => [ + for m in v : lookup( + var.factories_config.context.iam_principals, m, m + ) + ] + } + iam_bindings = { + for k, v in each.value.folder_config.iam_bindings : k => merge(v, { + members = [ + for m in v.members : lookup( + var.factories_config.context.iam_principals, m, m + ) + ] + condition = try(v.condition, null) == null ? null : { + title = v.condition.title + description = try(v.condition.description, null) + expression = templatestring(v.condition.expression, { + tag_values = local.tag_values + }) + } + }) + } + iam_bindings_additive = { + for k, v in each.value.folder_config.iam_bindings_additive : k => merge(v, { + member = lookup( + var.factories_config.context.iam_principals, v.member, v.member + ) + condition = try(v.condition, null) == null ? null : { + title = v.condition.title + description = try(v.condition.description, null) + expression = templatestring(v.condition.expression, { + tag_values = local.tag_values + }) + } + }) + } + iam_by_principals = { + for k, v in each.value.folder_config.iam_by_principals : + lookup(var.factories_config.context.iam_principals, k, k) => v + } +} + +module "dd-dp-folders" { + source = "../../../modules/folder" + for_each = local.data_domains + parent = module.dd-folders[each.key].id + name = "Data Products" + iam = try(each.value.deploy_config.composer, null) == null ? {} : { + "roles/iam.serviceAccountTokenCreator" = [ + module.dd-composer-sa[each.key].iam_email + ] + } +} + +module "dd-projects" { + source = "../../../modules/project" + for_each = local.data_domains + billing_account = var.billing_account.id + name = "${each.value.short_name}-shared-0" + parent = module.dd-folders[each.key].id + prefix = local.prefix + labels = { + data_domain = each.key + } + services = local.dd_services[each.key] + service_encryption_key_ids = ( + lookup(local.dd_composer, each.key, null) == null ? {} : { + "composer.googleapis.com" = compact([ + try(local.dd_composer_keys[each.key], null) == null + ? null + : lookup( + local.kms_keys, + local.dd_composer_keys[each.key], + local.dd_composer_keys[each.key] + ) + ]) + } + ) +} + +module "dd-projects-iam" { + source = "../../../modules/project" + for_each = local.data_domains + name = module.dd-projects[each.key].project_id + project_reuse = { + use_data_source = false + project_attributes = { + name = module.dd-projects[each.key].name + number = module.dd-projects[each.key].number + services_enabled = local.dd_services[each.key] + } + } + iam = { + for k, v in each.value.project_config.iam : k => [ + for m in v : try( + var.factories_config.context.iam_principals[m], + module.dd-automation-sa["${each.key}/${m}"].iam_email, + module.dd-service-accounts["${each.key}/${m}"].iam_email, + m + ) + ] + } + iam_bindings = { + for k, v in each.value.project_config.iam_bindings : k => merge(v, { + members = [ + for m in v.members : try( + var.factories_config.context.iam_principals[m], + module.dd-automation-sa["${each.key}/${m}"].iam_email, + module.dd-service-accounts["${each.key}/${m}"].iam_email, + m + ) + ] + condition = try(v.condition, null) == null ? null : { + title = v.condition.title + description = try(v.condition.description, null) + expression = templatestring(v.condition.expression, { + tag_values = local.tag_values + }) + } + }) + } + iam_bindings_additive = merge( + { + for k, v in each.value.project_config.iam_bindings_additive : k => merge(v, { + member = try( + var.factories_config.context.iam_principals[v.member], + module.dd-automation-sa["${each.key}/${v.member}"].iam_email, + module.dd-service-accounts["${each.key}/${v.member}"].iam_email, + v.member + ) + condition = try(v.condition, null) == null ? null : { + title = v.condition.title + description = try(v.condition.description, null) + expression = templatestring(v.condition.expression, { + tag_values = local.tag_values + }) + } + }) + }, + try(each.value.deploy_config.composer, null) == null ? {} : { + composer_worker = { + member = module.dd-composer-sa[each.key].iam_email + role = "roles/composer.worker" + } + } + ) + iam_by_principals = { + for k, v in each.value.project_config.iam_by_principals : + lookup(var.factories_config.context.iam_by_principals, k, k) => v + } + shared_vpc_service_config = ( + each.value.project_config.shared_vpc_service_config == null + ? null + : { + host_project = lookup( + var.host_project_ids, + each.value.project_config.shared_vpc_service_config.host_project, + each.value.project_config.shared_vpc_service_config.host_project + ) + network_users = [ + for m in try(each.value.project_config.shared_vpc_service_config.network_users, []) : + try( + var.factories_config.context.iam_principals[m], + module.dd-automation-sa["${each.key}/${m}"].iam_email, + module.dd-service-accounts["${each.key}/${m}"].iam_email, + m + ) + ] + service_agent_iam = try( + each.value.project_config.shared_vpc_service_config.service_agent_iam, + {} + ) + service_iam_grants = try( + each.value.project_config.shared_vpc_service_config.service_iam_grants, + [] + ) + } + ) +} + +module "dd-service-accounts" { + source = "../../../modules/iam-service-account" + for_each = { for v in local.dd_service_accounts : v.key => v } + project_id = module.dd-projects[each.value.dd].project_id + prefix = local.prefix + name = each.value.name + description = each.value.description + iam = { + for k, v in each.value.iam : k => [ + for m in v : lookup( + var.factories_config.context.iam_principals, m, m + ) + ] + } + iam_bindings = { + for k, v in each.value.iam_bindings : k => merge(v, { + members = [ + for m in v.members : lookup( + var.factories_config.context.iam_principals, m, m + ) + ] + }) + } + iam_bindings_additive = { + for k, v in each.value.iam_bindings_additive : k => merge(v, { + member = lookup( + var.factories_config.context.iam_principals, v.member, v.member + ) + }) + } + iam_storage_roles = each.value.iam_storage_roles +} diff --git a/fast/stages/3-data-platform-dev/data-products-automation.tf b/fast/stages/3-data-platform-dev/data-products-automation.tf new file mode 100644 index 000000000..884f626d9 --- /dev/null +++ b/fast/stages/3-data-platform-dev/data-products-automation.tf @@ -0,0 +1,63 @@ +/** + * Copyright 2025 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +# tfdoc:file:description Data product automation resources. + +module "dp-automation-bucket" { + source = "../../../modules/gcs" + for_each = { + for k, v in local.data_products : + k => v if v.automation != null + } + project_id = module.dd-projects[each.value.dd].project_id + prefix = local.prefix + name = "${each.value.short_name}-state" + location = try( + each.value.automation.location, + var.location + ) + iam = { + "roles/storage.admin" = [ + module.dp-automation-sa["${each.key}/rw"].iam_email + ] + "roles/storage.objectViewer" = concat( + [ + module.dp-automation-sa["${each.key}/ro"].iam_email + ], + [ + for m in each.value.automation.impersonation_principals : lookup( + var.factories_config.context.iam_principals, m, m + ) + ] + ) + } +} + +module "dp-automation-sa" { + source = "../../../modules/iam-service-account" + for_each = { for v in local.dp_automation_sa : v.key => v } + project_id = module.dp-projects[each.value.dp].project_id + prefix = each.value.prefix + name = each.value.name + description = each.value.description + iam = { + "roles/iam.serviceAccountTokenCreator" = [ + for m in each.value.impersonation_principals : lookup( + var.factories_config.context.iam_principals, m, m + ) + ] + } +} diff --git a/fast/stages/3-data-platform-dev/data-products-exposure.tf b/fast/stages/3-data-platform-dev/data-products-exposure.tf new file mode 100644 index 000000000..f5c2e0956 --- /dev/null +++ b/fast/stages/3-data-platform-dev/data-products-exposure.tf @@ -0,0 +1,86 @@ +/** + * Copyright 2025 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +# tfdoc:file:description Data product exposure layer resources. + +module "dp-buckets" { + source = "../../../modules/gcs" + for_each = { + for v in local.dp_buckets : "${v.dp}/${v.key}" => v + } + project_id = module.dp-projects[each.value.dp].project_id + prefix = local.prefix + name = "${each.value.dps}-${each.value.short_name}-0" + location = each.value.location + encryption_key = ( + local.dp_bucket_keys[each.key] == null + ? null + : lookup( + local.kms_keys, + local.dp_bucket_keys[each.key], + local.dp_bucket_keys[each.key] + ) + ) + iam = { + for k, v in each.value.iam : k => [ + for m in v : try( + var.factories_config.context.iam_principals[m], + module.dp-automation-sa["${each.key}/${m}"].iam_email, + module.dp-service-accounts["${each.key}/${m}"].iam_email, + m + ) + ] + } + tag_bindings = { + exposure = ( + module.central-project.tag_values[var.exposure_config.tag_name].id + ) + } +} + +module "dp-datasets" { + source = "../../../modules/bigquery-dataset" + for_each = { + for v in local.dp_datasets : "${v.dp}/${v.key}" => v + } + project_id = module.dp-projects[each.value.dp].project_id + id = "${local.prefix_bq}_${each.value.dps}_${each.value.short_name}_0" + location = each.value.location + encryption_key = ( + local.dp_dataset_keys[each.key] == null + ? null + : lookup( + local.kms_keys, + local.dp_dataset_keys[each.key], + local.dp_dataset_keys[each.key] + ) + ) + iam = { + for k, v in each.value.iam : k => [ + for m in v : try( + var.factories_config.context.iam_principals[m], + module.dp-automation-sa["${each.key}/${m}"].iam_email, + module.dp-service-accounts["${each.key}/${m}"].iam_email, + m + ) + ] + } + tag_bindings = { + exposure = ( + module.central-project.tag_values[var.exposure_config.tag_name].id + ) + } +} diff --git a/fast/stages/3-data-platform-dev/data-products.tf b/fast/stages/3-data-platform-dev/data-products.tf new file mode 100644 index 000000000..5095f4a8a --- /dev/null +++ b/fast/stages/3-data-platform-dev/data-products.tf @@ -0,0 +1,172 @@ +/** + * Copyright 2025 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +# tfdoc:file:description Data product project, service account and exposed resources. + +module "dp-projects" { + source = "../../../modules/project" + for_each = local.data_products + billing_account = var.billing_account.id + name = "${each.value.dds}-${each.value.short_name}-0" + parent = module.dd-dp-folders[each.value.dd].id + prefix = local.prefix + labels = { + data_domain = each.value.dd + data_product = replace(each.key, "/", "_") + } + services = each.value.services + service_encryption_key_ids = { + "bigquery.googleapis.com" = distinct([ + for k, v in local.dp_dataset_keys : + lookup(local.kms_keys, v, v) + if startswith(k, each.key) && v != null + ]) + "storage.googleapis.com" = distinct([ + for k, v in local.dp_bucket_keys : + lookup(local.kms_keys, v, v) + if startswith(k, each.key) && v != null + ]) + } +} + +module "dp-projects-iam" { + source = "../../../modules/project" + for_each = local.data_products + name = module.dp-projects[each.key].project_id + project_reuse = { + use_data_source = false + project_attributes = { + name = module.dp-projects[each.key].name + number = module.dp-projects[each.key].number + services_enabled = each.value.services + } + } + iam = { + for k, v in each.value.iam : k => [ + for m in v : try( + var.factories_config.context.iam_principals[m], + module.dp-automation-sa["${each.key}/${m}"].iam_email, + module.dp-service-accounts["${each.key}/${m}"].iam_email, + m + ) + ] + } + iam_bindings = { + for k, v in each.value.iam_bindings : k => merge(v, { + members = [ + for m in v.members : try( + var.factories_config.context.iam_principals[m], + module.dp-automation-sa["${each.key}/${m}"].iam_email, + module.dp-service-accounts["${each.key}/${m}"].iam_email, + m + ) + ] + condition = try(v.condition, null) == null ? null : { + title = v.condition.title + description = try(v.condition.description, null) + expression = templatestring(v.condition.expression, { + tag_values = local.tag_values + }) + } + }) + } + iam_bindings_additive = { + for k, v in each.value.iam_bindings_additive : k => merge(v, { + member = try( + var.factories_config.context.iam_principals[v.member], + module.dp-automation-sa["${each.key}/${v.member}"].iam_email, + module.dp-service-accounts["${each.key}/${v.member}"].iam_email, + v.member + ) + condition = try(v.condition, null) == null ? null : { + title = v.condition.title + description = try(v.condition.description, null) + expression = templatestring(v.condition.expression, { + tag_values = local.tag_values + }) + } + }) + } + iam_by_principals = { + for k, v in each.value.iam_by_principals : try( + var.factories_config.context.iam_principals[k], + module.dp-automation-sa["${each.key}/${k}"].iam_email, + module.dp-service-accounts["${each.key}/${k}"].iam_email, + k + ) => v + } + shared_vpc_service_config = ( + each.value.shared_vpc_service_config == null + ? null + : { + host_project = lookup( + var.host_project_ids, + each.value.shared_vpc_service_config.host_project, + each.value.shared_vpc_service_config.host_project + ) + network_users = [ + for m in try(each.value.shared_vpc_service_config.network_users, []) : + try( + var.factories_config.context.iam_principals[m], + module.dp-automation-sa["${each.key}/${m}"].iam_email, + module.dp-service-accounts["${each.key}/${m}"].iam_email, + m + ) + ] + service_agent_iam = try( + each.value.shared_vpc_service_config.service_agent_iam, + {} + ) + service_iam_grants = try( + each.value.shared_vpc_service_config.service_iam_grants, + [] + ) + } + ) +} + +module "dp-service-accounts" { + source = "../../../modules/iam-service-account" + for_each = { for v in local.dp_service_accounts : v.key => v } + project_id = module.dp-projects[each.value.dp].project_id + prefix = each.value.prefix + name = each.value.name + description = each.value.description + iam = { + for k, v in each.value.iam : k => [ + for m in v : lookup( + var.factories_config.context.iam_principals, m, m + ) + ] + } + iam_bindings = { + for k, v in each.value.iam_bindings : k => merge(v, { + members = [ + for m in v.members : lookup( + var.factories_config.context.iam_principals, m, m + ) + ] + }) + } + iam_bindings_additive = { + for k, v in each.value.iam_bindings_additive : k => merge(v, { + member = lookup( + var.factories_config.context.iam_principals, v.member, v.member + ) + }) + } + iam_storage_roles = each.value.iam_storage_roles +} diff --git a/fast/stages/3-data-platform-dev/data/aspect-types/test-0.yaml b/fast/stages/3-data-platform-dev/data/aspect-types/test-0.yaml new file mode 100644 index 000000000..d81db5678 --- /dev/null +++ b/fast/stages/3-data-platform-dev/data/aspect-types/test-0.yaml @@ -0,0 +1,46 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# yaml-language-server: $schema=../../schemas/aspect-type.schema.json + +display_name: "Basic template" +metadata_template: | + { + "name": "tf-basic-template", + "type": "record", + "recordFields": [ + { + "name": "source", + "type": "string", + "annotations": { + "displayName": "Source", + "description": "Specifies the source of data." + }, + "index": 1, + "constraints": { + "required": true + } + }, + { + "name": "owner", + "type": "string", + "annotations": { + "displayName": "Owner", + "description": "Specifies the data owner." + }, + "index": 2, + "constraints": {} + } + ] + } diff --git a/fast/stages/3-data-platform-dev/data/data-domains/domain-0/_config.yaml b/fast/stages/3-data-platform-dev/data/data-domains/domain-0/_config.yaml new file mode 100644 index 000000000..1dad385b7 --- /dev/null +++ b/fast/stages/3-data-platform-dev/data/data-domains/domain-0/_config.yaml @@ -0,0 +1,68 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# yaml-language-server: $schema=../../../schemas/data-domain.schema.json + +name: Domain 0 +short_name: d0 + +automation: + impersonation_principals: + - dp-product-a-0 + +deploy_config: + composer: + # region defaults to var.location + # encryption_key: composer-dev-europe-west8 + node_config: + network: dev-spoke-0 + subnetwork: europe-west8/dev-dataplatform + +project_config: + iam: + roles/owner: + - rw + roles/viewer: + - ro + roles/composer.environmentAndStorageObjectAdmin: + - dp-product-a-0 + roles/monitoring.viewer: + - dp-product-a-0 + services: + - composer.googleapis.com + - datacatalog.googleapis.com + - dataplex.googleapis.com + - datalineage.googleapis.com + shared_vpc_service_config: + host_project: dev-spoke-0 + service_agent_iam: + roles/composer.sharedVpcAgent: + - composer + +folder_config: + iam_bindings: + bigquery_metadata_viewer: + members: + - data-consumer-bi + role: roles/dataplex.catalogViewer #roles/bigquery.metadataViewer + condition: + title: exposure + description: Expose via secure tag. + expression: resource.matchTag('exposure', 'allow') + iam_by_principals: + data-consumer-bi: + - roles/datalineage.viewer + dp-product-a-0: + - "roles/logging.viewer" + - "roles/monitoring.viewer" diff --git a/fast/stages/3-data-platform-dev/data/data-domains/domain-0/product-0.yaml b/fast/stages/3-data-platform-dev/data/data-domains/domain-0/product-0.yaml new file mode 100644 index 000000000..efbcfef45 --- /dev/null +++ b/fast/stages/3-data-platform-dev/data/data-domains/domain-0/product-0.yaml @@ -0,0 +1,78 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# yaml-language-server: $schema=../../../schemas/data-product.schema.json + +short_name: p0 +services: + - bigquery.googleapis.com + - cloudaicompanion.googleapis.com + - cloudresourcemanager.googleapis.com + - composer.googleapis.com + - datacatalog.googleapis.com + - dataplex.googleapis.com + - datalineage.googleapis.com + - storage.googleapis.com +automation: + impersonation_principals: + - dp-product-a-0 +exposure_layer: + bigquery: + datasets: + exposure: {} + iam: + "roles/bigquery.dataViewer": + - data-consumer-bi + storage: + buckets: + exposed-ew8: {} + iam: + "roles/storage.objectViewer": + - data-consumer-bi +iam_by_principals: + rw: + - roles/editor + ro: + - roles/viewer + dp-product-a-0: + - "roles/dataplex.catalogEditor" + - "roles/bigquery.dataOwner" + - "roles/bigquery.jobUser" + - "roles/datalineage.viewer" + - "roles/dataplex.dataScanCreator" + - "roles/logging.viewer" + - "roles/monitoring.viewer" + - "roles/serviceusage.serviceUsageViewer" + - "roles/storage.bucketViewer" + - "roles/storage.objectAdmin" + processing: + - "roles/bigquery.dataEditor" + - "roles/bigquery.jobUser" + - "roles/dataflow.admin" + - "roles/dataproc.editor" + - "roles/dataproc.worker" + - "roles/iam.serviceAccountUser" + - "roles/storage.bucketViewer" + - "roles/storage.objectAdmin" +# iam_bindings_additive: +# test-tag: +# member: rw +# role: roles/storage.objectViewer +# condition: +# title: Storage viewer on exposed resources. +# expression: | +# resource.matchTag('${tag_values["exposure/allow"]}') +service_accounts: + processing: + description: Processing service account. diff --git a/fast/stages/3-data-platform-dev/diagram.png b/fast/stages/3-data-platform-dev/diagram.png new file mode 100644 index 000000000..c01a73853 Binary files /dev/null and b/fast/stages/3-data-platform-dev/diagram.png differ diff --git a/fast/stages/3-data-platform-dev/factory.tf b/fast/stages/3-data-platform-dev/factory.tf new file mode 100644 index 000000000..34fe9dfc4 --- /dev/null +++ b/fast/stages/3-data-platform-dev/factory.tf @@ -0,0 +1,197 @@ +/** + * Copyright 2025 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +locals { + _dd_path = try(pathexpand(var.factories_config.data_domains), null) + _dd_raw = { + for f in try(fileset(local._dd_path, "**/_config.yaml"), []) : + dirname(f) => yamldecode(file("${local._dd_path}/${f}")) + } + _dp = flatten([ + for k, v in local.data_domains : [ + for f in try(fileset("${local._dd_path}/${k}", "**/*.yaml"), []) : merge( + yamldecode(file("${local._dd_path}/${k}/${f}")), + { + dd = k + dds = v.short_name + key = trimsuffix(basename(f), ".yaml") + } + ) if !endswith(f, "_config.yaml") + ] + ]) + data_domains = { + for k, v in local._dd_raw : k => { + name = v.name + short_name = lookup(v, "short_name", reverse(split("/", k))[0]) + automation = try(v.automation, null) + deploy_config = { + composer = try(v.deploy_config.composer, null) + } + folder_config = { + iam = try(v.folder_config.iam, {}) + iam_bindings = try(v.folder_config.iam_bindings, {}) + iam_bindings_additive = try(v.folder_config.iam_bindings_additive, {}) + iam_by_principals = try(v.folder_config.iam_by_principals, {}) + } + project_config = { + name = try(v.project_config.name, k) + deploy = merge( + { composer = null }, try(v.project_config.deploy, {}) + ) + services = try(v.project_config.services, []) + iam = try(v.project_config.iam, {}) + iam_bindings = try(v.project_config.iam_bindings, {}) + iam_bindings_additive = try(v.project_config.iam_bindings_additive, {}) + iam_by_principals = try(v.project_config.iam_by_principals, {}) + shared_vpc_service_config = try( + v.project_config.shared_vpc_service_config, null + ) + } + service_accounts = lookup(v, "service_accounts", {}) + } + } + data_products = { + for v in local._dp : "${v.dd}/${v.key}" => merge(v, { + short_name = lookup(v, "short_name", v.key) + services = distinct(concat( + lookup(v, "services", []), + try(v.exposed_resources.storage_buckets, null) == null ? [] : [ + "storage.googleapis.com" + ], + try(v.exposed_resources.bigquery_datasets, null) == null ? [] : [ + "bigquery.googleapis.com" + ] + )) + automation = try(v.automation, null) + exposure_layer = { + bigquery = { + datasets = try(v.exposure_layer.bigquery.datasets, {}) + iam = try(v.exposure_layer.bigquery.iam, {}) + } + storage = { + buckets = try(v.exposure_layer.storage.buckets, {}) + iam = try(v.exposure_layer.storage.iam, {}) + } + } + iam = lookup(v, "iam", {}) + iam_bindings = lookup(v, "iam_bindings", {}) + iam_bindings_additive = lookup(v, "iam_bindings_additive", {}) + iam_by_principals = lookup(v, "iam_by_principals", {}) + service_accounts = lookup(v, "service_accounts", {}) + shared_vpc_service_config = try( + v.shared_vpc_service_config, null + ) + }) + } + dd_automation_sa = flatten([ + for k, v in local.data_domains : [ + for n in ["ro", "rw"] : { + dd = k + key = "${k}/${n}" + name = "iac-${n}" + prefix = v.short_name + description = "Automation for ${v.short_name} (${n}.)" + impersonation_principals = lookup( + v.automation, "impersonation_principals", [] + ) + } + ] if v.automation != null + ]) + dd_service_accounts = flatten([ + for k, v in local.data_domains : [ + for sk, sv in v.service_accounts : { + dd = k + key = "${k}/${sk}" + name = lookup(sv, "name", "${v.short_name}-${sk}") + description = lookup(v, "description", null) + iam = lookup(sv, "iam", {}) + iam_bindings = lookup(sv, "iam_bindings", {}) + iam_bindings_additive = lookup(sv, "iam_bindings_additive", {}) + iam_storage_roles = lookup(sv, "iam_storage_roles", {}) + } + ] + ]) + dp_automation_sa = flatten([ + for k, v in local.data_products : [ + for n in ["ro", "rw"] : { + dp = k + key = "${k}/${n}" + name = "iac-${n}" + prefix = "${v.dds}-${v.short_name}" + description = "Automation for ${k} (${n}.)" + impersonation_principals = lookup( + v.automation, "impersonation_principals", [] + ) + } + ] if v.automation != null + ]) + dp_bucket_keys = { + for v in local.dp_buckets : "${v.dp}/${v.key}" => ( + v.encryption_key != null + ? v.encryption_key + : try(var.encryption_keys.storage[v.location], null) + ) + } + dp_buckets = flatten([ + for k, v in local.data_products : [ + for bk, bv in v.exposure_layer.storage.buckets : { + dp = k + dps = "${v.dds}-${v.short_name}" + iam = v.exposure_layer.storage.iam + key = bk + encryption_key = lookup(bv, "encryption_key", null) + short_name = lookup(bv, "short_name", bk) + location = lookup(bv, "location", var.location) + storage_class = lookup(bv, "storage_class", null) + } + ] + ]) + dp_dataset_keys = { + for v in local.dp_datasets : "${v.dp}/${v.key}" => ( + v.encryption_key != null + ? v.encryption_key + : try(var.encryption_keys.bigquery[v.location], null) + ) + } + dp_datasets = flatten([ + for k, v in local.data_products : [ + for dk, dv in v.exposure_layer.bigquery.datasets : { + dp = k + dps = replace("${v.dds}-${v.short_name}", "-", "_") + encryption_key = lookup(dv, "encryption_key", null) + iam = v.exposure_layer.bigquery.iam + key = dk + short_name = replace(lookup(dv, "short_name", dk), "-", "_") + location = lookup(dv, "location", var.location) + } + ] + ]) + dp_service_accounts = flatten([ + for k, v in local.data_products : [ + for sk, sv in v.service_accounts : { + dp = k + key = "${k}/${sk}" + name = lookup(sv, "name", sk) + prefix = "${v.dds}-${v.short_name}" + description = lookup(v, "description", null) + iam = lookup(sv, "iam", {}) + iam_bindings = lookup(sv, "iam_bindings", {}) + iam_bindings_additive = lookup(sv, "iam_bindings_additive", {}) + iam_storage_roles = lookup(sv, "iam_storage_roles", {}) + } + ] + ]) +} diff --git a/fast/stages/3-data-platform-dev/main.tf b/fast/stages/3-data-platform-dev/main.tf new file mode 100644 index 000000000..8edcb0af0 --- /dev/null +++ b/fast/stages/3-data-platform-dev/main.tf @@ -0,0 +1,135 @@ +/** + * Copyright 2025 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +# tfdoc:file:description Locals and project-level resources. + +locals { + environment = var.environments[var.stage_config.environment] + exp_tag = { + key = split("/", var.exposure_config.tag_name)[0] + value = split("/", var.exposure_config.tag_name)[1] + } + kms_keys = merge( + var.kms_keys, var.factories_config.context.kms_keys + ) + location = lookup(var.regions, var.location, var.location) + prefix = ( + "${var.prefix}-${local.environment.short_name}-${var.stage_config.short_name}" + ) + prefix_bq = replace(local.prefix, "-", "_") + tag_values = merge( + var.tag_values, + var.factories_config.context.tag_values, + { for k, v in module.central-project.tag_values : k => v.id } + ) +} + +module "central-project" { + source = "../../../modules/project" + billing_account = var.billing_account.id + name = var.central_project_config.short_name + parent = var.folder_ids[var.stage_config.name] + prefix = local.prefix + iam = { + for k, v in var.central_project_config.iam : k => [ + for m in v : lookup( + var.factories_config.context.iam_principals, m, m + ) + ] + } + iam_bindings = { + for k, v in var.central_project_config.iam_bindings : k => merge(v, { + members = [ + for m in v.members : lookup( + var.factories_config.context.iam_principals, m, m + ) + ] + }) + } + iam_bindings_additive = { + for k, v in var.central_project_config.iam_bindings_additive : k => merge(v, { + member = lookup( + var.factories_config.context.iam_principals, v.member, v.member + ) + }) + } + iam_by_principals = { + for k, v in var.central_project_config.iam_by_principals : + lookup(var.factories_config.context.iam_principals, k, k) => v + } + labels = { + environment = var.stage_config.environment + } + services = var.central_project_config.services + tags = merge(var.secure_tags, { + (local.exp_tag.key) = { + description = try( + var.secure_tags[local.exp_tag.key].description, + "Managed by the Terraform project module." + ) + iam = { + for k, v in try(var.secure_tags[local.exp_tag.key].iam, {}) : + k => [ + for m in v : lookup( + var.factories_config.context.iam_principals, m, m + ) + ] + } + values = merge( + try(var.secure_tags[local.exp_tag.key].values, {}), + { + (local.exp_tag.value) = { + description = try( + var.secure_tags[local.exp_tag.key].values[local.exp_tag.value].description, + "Managed by the Terraform project module." + ) + iam = { + for k, v in try(var.secure_tags[local.exp_tag.key].values[local.exp_tag.value].iam, {}) : + k => [ + for m in v : lookup( + var.factories_config.context.iam_principals, m, m + ) + ] + } + } + } + ) + } + }) +} + +module "central-aspect-types" { + source = "../../../modules/dataplex-aspect-types" + project_id = module.central-project.project_id + location = local.location + factories_config = { + aspect_types = var.factories_config.aspect_types + } + aspect_types = var.aspect_types +} + +# TODO: Migrate to new Policy Tag on BQ. +module "central-policy-tags" { + source = "../../../modules/data-catalog-policy-tag" + project_id = module.central-project.project_id + name = "tags" + location = var.location + tags = { + low = {} + medium = {} + high = {} + } +} diff --git a/fast/stages/3-data-platform-dev/outputs.tf b/fast/stages/3-data-platform-dev/outputs.tf new file mode 100644 index 000000000..e486411da --- /dev/null +++ b/fast/stages/3-data-platform-dev/outputs.tf @@ -0,0 +1,214 @@ +# Copyright 2024 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# https://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +# tfdoc:file:description Stage outputs. + +locals { + central_project = { + id = module.central-project.project_id + number = module.central-project.number + } + dd_attrs = { + for k, v in local.data_domains : k => { + automation = v.automation == null ? null : { + bucket = module.dd-automation-bucket[k].name + service_accounts = { + ro = module.dd-automation-sa["${k}/ro"].email + rw = module.dd-automation-sa["${k}/rw"].email + } + } + deployments = { + composer = lookup(local.dd_composer, k, null) == null ? null : { + airflow_uri = try( + google_composer_environment.default[k].config[0].airflow_uri, null + ) + dag_gcs_prefix = try( + google_composer_environment.default[k].config[0].dag_gcs_prefix, null + ) + } + } + data_products = { + for pk in lookup(local.dp_by_dd, k, []) : + split("/", pk)[1] => { + for kk, kv in local.dp_attrs[pk] : kk => kv if kk != "automation" + } + } + folder_ids = { + domain = module.dd-folders[k].id + products = module.dd-dp-folders[k].id + } + project = { + id = module.dd-projects[k].project_id + number = module.dd-projects[k].number + } + service_accounts = { + for sk in keys(v.service_accounts) : + sk => module.dd-service-accounts["${k}/${sk}"].email + } + } + } + dp_attrs = { + for k, v in local.data_products : k => { + automation = local.data_products[k].automation == null ? null : { + bucket = module.dp-automation-bucket[k].name + service_accounts = { + ro = module.dp-automation-sa["${k}/ro"].email + rw = module.dp-automation-sa["${k}/rw"].email + } + } + exposure = { + bigquery = { + for vv in lookup(local.exp_datasets_by_dp, k, []) : + split("/", vv)[2] => module.dp-datasets[vv].id + } + storage = { + for vv in lookup(local.exp_buckets_by_dp, k, []) : + split("/", vv)[2] => module.dp-buckets[vv].id + } + } + project = { + id = module.dp-projects[k].project_id + number = module.dp-projects[k].number + } + service_accounts = { + for sk in keys(v.service_accounts) : + sk => module.dp-service-accounts["${k}/${sk}"].email + } + } + } + dp_by_dd = { + for k, v in local.data_products : + v.dd => k... + } + exp_buckets_by_dp = { + for k, v in module.dp-buckets : + join("/", slice(split("/", k), 0, 2)) => k... + } + exp_datasets_by_dp = { + for k, v in module.dp-datasets : + join("/", slice(split("/", k), 0, 2)) => k... + } + files_prefix = "3-${var.stage_config.name}" + providers = merge( + { + for k, v in local.dd_attrs : + "${k}-providers.tf" => templatefile("templates/providers.tf.tpl", { + backend_extra = null + bucket = v.automation.bucket + name = k + sa = v.automation.service_accounts.rw + }) if v.automation != null + }, + { + for k, v in local.dd_attrs : + "${k}-r-providers.tf" => templatefile("templates/providers.tf.tpl", { + backend_extra = null + bucket = v.automation.bucket + name = k + sa = v.automation.service_accounts.ro + }) if v.automation != null + }, + { + for k, v in local.dp_attrs : + "${replace(k, "/", "-")}-providers.tf" => templatefile("templates/providers.tf.tpl", { + backend_extra = null + bucket = v.automation.bucket + name = k + sa = v.automation.service_accounts.rw + }) if v.automation != null + }, + { + for k, v in local.dp_attrs : + "${replace(k, "/", "-")}-r-providers.tf" => templatefile("templates/providers.tf.tpl", { + backend_extra = null + bucket = v.automation.bucket + name = k + sa = v.automation.service_accounts.ro + }) if v.automation != null + } + ) + tfvars = { + aspect_types = module.central-aspect-types.ids + central_project = local.central_project + policy_tags = module.central-policy-tags.tags + secure_tags = { + for k, v in module.central-project.tag_values : k => v.id + } + } + tfvars_dd = { + for k, v in local.data_domains : k => merge(local.tfvars, { + for kk, vv in local.dd_attrs[k] : + kk => vv if kk != "automation" + }) + } +} + +# tfvars files for data domains and products + +resource "local_file" "tfvars" { + for_each = var.outputs_location == null ? {} : local.tfvars_dd + file_permission = "0644" + filename = "${try(pathexpand(var.outputs_location), "")}/tfvars/${local.files_prefix}/${each.key}.auto.tfvars.json" + content = jsonencode(each.value) +} + +resource "google_storage_bucket_object" "tfvars" { + for_each = local.tfvars_dd + bucket = var.automation.outputs_bucket + name = "tfvars/${local.files_prefix}/${each.key}.auto.tfvars.json" + content = jsonencode(each.value) +} + +# provider files for data domains and products + +resource "local_file" "providers" { + for_each = var.outputs_location == null ? {} : local.providers + file_permission = "0644" + filename = "${try(pathexpand(var.outputs_location), "")}/providers/${local.files_prefix}/${each.key}" + content = each.value +} + +resource "google_storage_bucket_object" "providers" { + for_each = local.providers + bucket = var.automation.outputs_bucket + name = "providers/${local.files_prefix}/${each.key}" + content = each.value +} + +# regular outputs + +output "aspect_types" { + description = "Aspect types defined in central project." + value = local.tfvars.aspect_types +} + +output "central_project" { + description = "Central project attributes." + value = local.central_project +} + +output "data_domains" { + description = "Data domain attributes." + value = local.dd_attrs +} + +output "policy_tags" { + description = "Policy tags defined in central project." + value = local.tfvars.policy_tags +} + +output "secure_tags" { + description = "Secure tags defined in central project." + value = local.tfvars.secure_tags +} diff --git a/fast/stages/3-data-platform-dev/schemas/aspect-type.schema.json b/fast/stages/3-data-platform-dev/schemas/aspect-type.schema.json new file mode 120000 index 000000000..3813d7b7a --- /dev/null +++ b/fast/stages/3-data-platform-dev/schemas/aspect-type.schema.json @@ -0,0 +1 @@ +../../../../modules/dataplex-aspect-types/schemas/aspect-type.schema.json \ No newline at end of file diff --git a/fast/stages/3-data-platform-dev/schemas/data-domain.schema.json b/fast/stages/3-data-platform-dev/schemas/data-domain.schema.json new file mode 100644 index 000000000..d8002c94d --- /dev/null +++ b/fast/stages/3-data-platform-dev/schemas/data-domain.schema.json @@ -0,0 +1,378 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "title": "Data Domain", + "type": "object", + "additionalProperties": false, + "required": [ + "name" + ], + "properties": { + "name": { + "type": "string" + }, + "short_name": { + "type": "string" + }, + "automation": { + "additionalProperties": false, + "properties": { + "location": { + "type": "string" + }, + "impersonation_principals": { + "type": "array", + "items": { + "type": "string", + "pattern": "^(?:domain:|group:|serviceAccount:|user:|principal:|principalSet:|[a-z])" + } + } + } + }, + "deploy_config": { + "type": "object", + "additionalProperties": false, + "properties": { + "composer": { + "type": "object", + "additionalProperties": false, + "required": [ + "node_config" + ], + "properties": { + "encryption_key": { + "type": "string" + }, + "environment_size": { + "type": "string", + "enum": [ + "ENVIRONMENT_SIZE_SMALL", + "ENVIRONMENT_SIZE_MEDIUM", + "ENVIRONMENT_SIZE_LARGE" + ], + "default": "ENVIRONMENT_SIZE_SMALL" + }, + "node_config": { + "type": "object", + "additionalProperties": false, + "required": [ + "network", + "subnetwork" + ], + "properties": { + "service_account": { + "type": "string" + }, + "network": { + "type": "string" + }, + "subnetwork": { + "type": "string" + } + } + }, + "private_builds": { + "type": "boolean", + "default": true + }, + "private_environment": { + "type": "boolean", + "default": true + }, + "region": { + "type": "string" + }, + "workloads_config": { + "type": "object", + "additionalProperties": false, + "properties": { + "dag_processor": { + "$ref": "#/$defs/composer_workload" + }, + "triggerer": { + "$ref": "#/$defs/composer_workload" + }, + "scheduler": { + "$ref": "#/$defs/composer_workload" + }, + "web_server": { + "$ref": "#/$defs/composer_workload" + }, + "worker": { + "type": "object", + "additionalProperties": false, + "properties": { + "cpu": { + "type": "number" + }, + "memory_gb": { + "type": "number" + }, + "storage_gb": { + "type": "number" + }, + "min_count": { + "type": "integer" + }, + "max_count": { + "type": "integer" + } + } + } + } + } + } + } + } + }, + "folder_config": { + "type": "object", + "additionalProperties": false, + "properties": { + "iam": { + "$ref": "#/$defs/iam" + }, + "iam_bindings": { + "$ref": "#/$defs/iam_bindings" + }, + "iam_bindings_additive": { + "$ref": "#/$defs/iam_bindings_additive" + }, + "iam_by_principals": { + "$ref": "#/$defs/iam_by_principals" + } + } + }, + "project_config": { + "type": "object", + "additionalProperties": false, + "properties": { + "name": { + "type": "string" + }, + "iam": { + "$ref": "#/$defs/iam" + }, + "iam_bindings": { + "$ref": "#/$defs/iam_bindings" + }, + "iam_bindings_additive": { + "$ref": "#/$defs/iam_bindings_additive" + }, + "iam_by_principals": { + "$ref": "#/$defs/iam_by_principals" + }, + "services": { + "type": "array", + "items": { + "type": "string" + } + }, + "shared_vpc_service_config": { + "type": "object", + "additionalProperties": false, + "required": [ + "host_project" + ], + "properties": { + "host_project": { + "type": "string" + }, + "network_users": { + "type": "array", + "items": { + "type": "string" + } + }, + "service_agent_iam": { + "type": "object", + "additionalItems": false, + "patternProperties": { + "^[a-z0-9_-]+$": { + "type": "array", + "items": { + "type": "string" + } + } + } + }, + "service_iam_grants": { + "type": "array", + "items": { + "type": "string" + } + } + } + } + } + }, + "service_accounts": { + "type": "object", + "additionalProperties": false, + "patternProperties": { + "^[a-z0-9-]+$": { + "type": "object", + "additionalProperties": false, + "properties": { + "description": { + "type": "string" + }, + "iam": { + "$ref": "#/$defs/iam" + }, + "iam_bindings": { + "$ref": "#/$defs/iam_bindings" + }, + "iam_bindings_additive": { + "$ref": "#/$defs/iam_bindings_additive" + }, + "iam_storage_roles": { + "$ref": "#/$defs/iam_storage_roles" + }, + "name": { + "type": "string" + } + } + } + } + } + }, + "$defs": { + "composer_workload": { + "type": "object", + "additionalProperties": false, + "properties": { + "cpu": { + "type": "number" + }, + "memory_gb": { + "type": "number" + }, + "storage_gb": { + "type": "number" + }, + "count": { + "type": "integer" + } + } + }, + "iam": { + "type": "object", + "additionalProperties": false, + "patternProperties": { + "^(?:roles/|[a-z_]+)": { + "type": "array", + "items": { + "type": "string", + "pattern": "^(?:domain:|group:|serviceAccount:|user:|principal:|principalSet:|[a-z])" + } + } + } + }, + "iam_bindings": { + "type": "object", + "additionalProperties": false, + "patternProperties": { + "^[a-z0-9_-]+$": { + "type": "object", + "additionalProperties": false, + "properties": { + "members": { + "type": "array", + "items": { + "type": "string", + "pattern": "^(?:domain:|group:|serviceAccount:|user:|principal:|principalSet:|[a-z])" + } + }, + "role": { + "type": "string", + "pattern": "^(?:roles/|[a-z])" + }, + "condition": { + "type": "object", + "additionalProperties": false, + "required": [ + "expression", + "title" + ], + "properties": { + "expression": { + "type": "string" + }, + "title": { + "type": "string" + }, + "description": { + "type": "string" + } + } + } + } + } + } + }, + "iam_bindings_additive": { + "type": "object", + "additionalProperties": false, + "patternProperties": { + "^[a-z0-9_-]+$": { + "type": "object", + "additionalProperties": false, + "properties": { + "member": { + "type": "string", + "pattern": "^(?:domain:|group:|serviceAccount:|user:|principal:|principalSet:|[a-z])" + }, + "role": { + "type": "string", + "pattern": "^(?:roles/|[a-z])" + }, + "condition": { + "type": "object", + "additionalProperties": false, + "required": [ + "expression", + "title" + ], + "properties": { + "expression": { + "type": "string" + }, + "title": { + "type": "string" + }, + "description": { + "type": "string" + } + } + } + } + } + } + }, + "iam_by_principals": { + "type": "object", + "additionalProperties": false, + "patternProperties": { + "^[a-z]+[a-z0-9-]+$": { + "type": "array", + "items": { + "type": "string", + "pattern": "^(?:roles/|[a-z_]+)" + } + } + } + }, + "iam_storage_roles": { + "type": "object", + "additionalProperties": false, + "patternProperties": { + "^[a-z0-9-]+$": { + "type": "array", + "items": { + "type": "string" + } + } + } + } + } +} \ No newline at end of file diff --git a/fast/stages/3-data-platform-dev/schemas/data-product.schema.json b/fast/stages/3-data-platform-dev/schemas/data-product.schema.json new file mode 100644 index 000000000..453f81e4a --- /dev/null +++ b/fast/stages/3-data-platform-dev/schemas/data-product.schema.json @@ -0,0 +1,290 @@ +{ + "$schema": "http://json-schema.org/draft-07/schema#", + "title": "Data Product", + "type": "object", + "additionalProperties": false, + "properties": { + "automation": { + "additionalProperties": false, + "properties": { + "location": { + "type": "string" + }, + "impersonation_principals": { + "type": "array", + "items": { + "type": "string", + "pattern": "^(?:domain:|group:|serviceAccount:|user:|principal:|principalSet:|[a-z])" + } + } + } + }, + "exposure_layer": { + "type": "object", + "additionalProperties": false, + "properties": { + "bigquery": { + "type": "object", + "additionalProperties": false, + "properties": { + "datasets": { + "patternProperties": { + "^[a-z][a-z0-9_]+$": { + "type": "object", + "additionalProperties": false, + "properties": { + "encryption_key": { + "type": "string" + }, + "location": { + "type": "string" + } + } + } + } + }, + "iam": { + "$ref": "#/$defs/iam" + } + } + }, + "storage": { + "type": "object", + "additionalProperties": false, + "properties": { + "buckets": { + "patternProperties": { + "^[a-z][a-z0-9-]+$": { + "type": "object", + "additionalProperties": false, + "properties": { + "encryption_key": { + "type": "string" + }, + "location": { + "type": "string" + }, + "storage_class": { + "type": "string" + } + } + } + } + }, + "iam": { + "$ref": "#/$defs/iam" + } + } + } + } + }, + "iam": { + "$ref": "#/$defs/iam" + }, + "iam_bindings": { + "$ref": "#/$defs/iam_bindings" + }, + "iam_bindings_additive": { + "$ref": "#/$defs/iam_bindings_additive" + }, + "iam_by_principals": { + "$ref": "#/$defs/iam_by_principals" + }, + "service_accounts": { + "type": "object", + "additionalProperties": false, + "patternProperties": { + "^[a-z0-9-]+$": { + "type": "object", + "additionalProperties": false, + "properties": { + "description": { + "type": "string" + }, + "iam": { + "$ref": "#/$defs/iam" + }, + "iam_bindings": { + "$ref": "#/$defs/iam_bindings" + }, + "iam_bindings_additive": { + "$ref": "#/$defs/iam_bindings_additive" + }, + "iam_storage_roles": { + "$ref": "#/$defs/iam_storage_roles" + }, + "name": { + "type": "string" + } + } + } + } + }, + "services": { + "type": "array", + "items": { + "type": "string" + } + }, + "shared_vpc_service_config": { + "type": "object", + "additionalProperties": false, + "required": [ + "host_project" + ], + "properties": { + "host_project": { + "type": "string" + }, + "network_users": { + "type": "array", + "items": { + "type": "string" + } + }, + "service_agent_iam": { + "type": "object", + "additionalItems": false, + "patternProperties": { + "^[a-z0-9_-]+$": { + "type": "array", + "items": { + "type": "string" + } + } + } + }, + "service_iam_grants": { + "type": "array", + "items": { + "type": "string" + } + } + } + }, + "short_name": { + "type": "string" + } + }, + "$defs": { + "iam": { + "type": "object", + "additionalProperties": false, + "patternProperties": { + "^(?:roles/|[a-z_]+)": { + "type": "array", + "items": { + "type": "string", + "pattern": "^(?:domain:|group:|serviceAccount:|user:|principal:|principalSet:|[a-z])" + } + } + } + }, + "iam_bindings": { + "type": "object", + "additionalProperties": false, + "patternProperties": { + "^[a-z0-9_-]+$": { + "type": "object", + "additionalProperties": false, + "properties": { + "members": { + "type": "array", + "items": { + "type": "string", + "pattern": "^(?:domain:|group:|serviceAccount:|user:|principal:|principalSet:|[a-z])" + } + }, + "role": { + "type": "string", + "pattern": "^(?:roles/|[a-z])" + }, + "condition": { + "type": "object", + "additionalProperties": false, + "required": [ + "expression", + "title" + ], + "properties": { + "expression": { + "type": "string" + }, + "title": { + "type": "string" + }, + "description": { + "type": "string" + } + } + } + } + } + } + }, + "iam_bindings_additive": { + "type": "object", + "additionalProperties": false, + "patternProperties": { + "^[a-z0-9_-]+$": { + "type": "object", + "additionalProperties": false, + "properties": { + "member": { + "type": "string", + "pattern": "^(?:domain:|group:|serviceAccount:|user:|principal:|principalSet:|[a-z])" + }, + "role": { + "type": "string", + "pattern": "^(?:roles/|[a-z])" + }, + "condition": { + "type": "object", + "additionalProperties": false, + "required": [ + "expression", + "title" + ], + "properties": { + "expression": { + "type": "string" + }, + "title": { + "type": "string" + }, + "description": { + "type": "string" + } + } + } + } + } + } + }, + "iam_by_principals": { + "type": "object", + "additionalProperties": false, + "patternProperties": { + "^[a-z]+[a-z0-9-]+$": { + "type": "array", + "items": { + "type": "string", + "pattern": "^(?:roles/|[a-z_]+)" + } + } + } + }, + "iam_storage_roles": { + "type": "object", + "additionalProperties": false, + "patternProperties": { + "^[a-z0-9-]+$": { + "type": "array", + "items": { + "type": "string" + } + } + } + } + } +} \ No newline at end of file diff --git a/blueprints/data-solutions/data-platform-foundations/locals-06-common.tf b/fast/stages/3-data-platform-dev/templates/providers.tf.tpl similarity index 51% rename from blueprints/data-solutions/data-platform-foundations/locals-06-common.tf rename to fast/stages/3-data-platform-dev/templates/providers.tf.tpl index 4089af3dc..d1c224c5c 100644 --- a/blueprints/data-solutions/data-platform-foundations/locals-06-common.tf +++ b/fast/stages/3-data-platform-dev/templates/providers.tf.tpl @@ -1,5 +1,5 @@ /** - * Copyright 2023 Google LLC + * Copyright 2022 Google LLC * * Licensed under the Apache License, Version 2.0 (the "License"); * you may not use this file except in compliance with the License. @@ -14,24 +14,20 @@ * limitations under the License. */ -locals { - _cmn_iam = flatten([ - for principal, roles in local.cmn_iam : [ - for role in roles : { - key = "${principal}-${role}" - principal = principal - role = role - } - ] - ]) - cmn_iam_additive = { - for binding in local._cmn_iam : binding.key => { - role = binding.role - member = local.iam_principals[binding.principal] - } - } - cmn_iam_auth = { - for binding in local._cmn_iam : - binding.role => local.iam_principals[binding.principal]... +terraform { + backend "gcs" { + bucket = "${bucket}" + impersonate_service_account = "${sa}" + %{~ if backend_extra != null ~} + ${indent(4, backend_extra)} + %{~ endif ~} } } +provider "google" { + impersonate_service_account = "${sa}" +} +provider "google-beta" { + impersonate_service_account = "${sa}" +} + +# end provider.tf for ${name} diff --git a/fast/stages/3-data-platform-dev/variables-fast.tf b/fast/stages/3-data-platform-dev/variables-fast.tf new file mode 100644 index 000000000..7aa193024 --- /dev/null +++ b/fast/stages/3-data-platform-dev/variables-fast.tf @@ -0,0 +1,108 @@ +/** + * Copyright 2024 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +variable "automation" { + # tfdoc:variable:source 0-bootstrap + description = "Automation resources created by the bootstrap stage." + type = object({ + outputs_bucket = string + }) + nullable = false +} +variable "billing_account" { + # tfdoc:variable:source 0-bootstrap + description = "Billing account id. If billing account is not part of the same org set `is_org_level` to false." + type = object({ + id = string + }) +} + +variable "environments" { + # tfdoc:variable:source 1-resman + description = "Environment names." + type = object({ + dev = object({ + name = string + short_name = string + }) + }) +} + +variable "folder_ids" { + # tfdoc:variable:source 1-resman + description = "Folder name => id mappings." + type = map(string) + nullable = false + default = {} +} + +variable "host_project_ids" { + # tfdoc:variable:source 2-networking + description = "Shared VPC host project name => id mappings." + type = map(string) + nullable = false + default = {} +} + +variable "kms_keys" { + # tfdoc:variable:source 2-security + description = "KMS key ids." + type = map(string) + nullable = false + default = {} +} + +variable "prefix" { + # tfdoc:variable:source 0-bootstrap + description = "Prefix used for resources that need unique names. Use a maximum of 9 chars for organizations, and 11 chars for tenants." + type = string + validation { + condition = try(length(var.prefix), 0) < 12 + error_message = "Use a maximum of 9 chars for organizations, and 11 chars for tenants." + } +} + +variable "regions" { + # tfdoc:variable:source 2-networking + description = "Region mappings." + type = map(string) + nullable = false + default = {} +} + +variable "subnet_self_links" { + # tfdoc:variable:source 2-networking + description = "Subnet VPC name => { name => self link } mappings." + type = map(map(string)) + nullable = false + default = {} +} + +variable "tag_values" { + # tfdoc:variable:source 1-resman + description = "FAST-managed resource manager tag values." + type = map(string) + nullable = false + default = {} +} + +variable "vpc_self_links" { + # tfdoc:variable:source 2-networking + description = "Shared VPC name => self link mappings." + type = map(string) + nullable = false + default = {} +} diff --git a/fast/stages/3-data-platform-dev/variables.tf b/fast/stages/3-data-platform-dev/variables.tf new file mode 100644 index 000000000..1c23e1edf --- /dev/null +++ b/fast/stages/3-data-platform-dev/variables.tf @@ -0,0 +1,173 @@ +/** + * Copyright 2025 Google LLC + * + * Licensed under the Apache License, Version 2.0 (the "License"); + * you may not use this file except in compliance with the License. + * You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +variable "aspect_types" { + description = "Aspect templates. Merged with those defined via the factory." + type = map(object({ + description = optional(string) + display_name = optional(string) + labels = optional(map(string), {}) + metadata_template = optional(string) + iam = optional(map(list(string)), {}) + iam_bindings = optional(map(object({ + members = list(string) + role = string + condition = optional(object({ + expression = string + title = string + description = optional(string) + })) + })), {}) + iam_bindings_additive = optional(map(object({ + member = string + role = string + condition = optional(object({ + expression = string + title = string + description = optional(string) + })) + })), {}) + })) + nullable = false + default = {} +} + +variable "central_project_config" { + description = "Configuration for the top-level central project." + type = object({ + iam = optional(map(list(string)), {}) + iam_bindings = optional(map(object({ + members = list(string) + role = string + condition = optional(object({ + expression = string + title = string + description = optional(string) + })) + })), {}) + iam_bindings_additive = optional(map(object({ + member = string + role = string + condition = optional(object({ + expression = string + title = string + description = optional(string) + })) + })), {}) + iam_by_principals = optional(map(list(string)), {}) + services = optional(list(string), [ + # TODO: define default list of services + "bigquery.googleapis.com", + "datacatalog.googleapis.com", + "logging.googleapis.com", + "monitoring.googleapis.com" + ]) + short_name = optional(string, "central-0") + }) + nullable = false + default = {} +} + +variable "encryption_keys" { + description = "Default encryption keys for services, in service => { region => key id } format. Overridable on a per-object basis." + type = object({ + bigquery = optional(map(string), {}) + composer = optional(map(string), {}) + storage = optional(map(string), {}) + }) + nullable = false + default = {} +} + +variable "exposure_config" { + description = "Data exposure configuration." + type = object({ + tag_name = optional(string, "exposure/allow") + }) + nullable = false + default = {} + validation { + condition = ( + var.exposure_config.tag_name != null && + length(regexall( + "^[a-z][a-z0-9-]+/[a-z][a-z0-9]+", var.exposure_config.tag_name + )) > 0 + ) + error_message = "Invalid tag name, required format is 'tag_key/tag_value'." + } +} + +variable "factories_config" { + description = "Configuration for the resource factories." + type = object({ + aspect_types = optional(string, "data/aspect-types") + data_domains = optional(string, "data/data-domains") + context = optional(object({ + iam_principals = optional(map(string), {}) + kms_keys = optional(map(string), {}) + tag_values = optional(map(string), {}) + }), {}) + }) + nullable = false + default = {} +} + +variable "location" { + description = "Default location used when no location is specified." + type = string + nullable = false + default = "europe-west1" +} + +variable "outputs_location" { + description = "Enable writing provider, tfvars and CI/CD workflow files to local filesystem. Leave null to disable." + type = string + default = null +} + +variable "secure_tags" { + description = "Resource manager tags created in the central project." + type = map(object({ + description = optional(string, "Managed by the Terraform project module.") + iam = optional(map(list(string)), {}) + values = optional(map(object({ + description = optional(string, "Managed by the Terraform project module.") + iam = optional(map(list(string)), {}) + id = optional(string) + })), {}) + })) + nullable = false + default = {} + validation { + condition = alltrue([ + for k, v in var.secure_tags : v != null + ]) + error_message = "Use an empty map instead of null as value." + } +} + +variable "stage_config" { + description = "Stage configuration used to find environment and resource ids, and to generate names." + type = object({ + environment = string + name = string + short_name = optional(string, "dp") + }) + default = { + environment = "dev" + name = "data-platform-dev" + } +} diff --git a/fast/stages/diagrams.excalidraw.gz b/fast/stages/diagrams.excalidraw.gz index a7b0163b4..62d08addb 100644 Binary files a/fast/stages/diagrams.excalidraw.gz and b/fast/stages/diagrams.excalidraw.gz differ diff --git a/fast/stages/fast-links.sh b/fast/stages/fast-links.sh index eaa620691..0d971e463 100755 --- a/fast/stages/fast-links.sh +++ b/fast/stages/fast-links.sh @@ -69,7 +69,7 @@ if [[ ! -z ${FAST_STAGE_DEPS+x} ]]; then done fi -echo -e "\n# conventional place for stage tfvars (manually created)" +echo -e "\n# conventional location for this stage terraform.tfvars (manually managed)" echo "$CMD/${FAST_STAGE_LEVEL}-${FAST_STAGE_NAME}.auto.tfvars ./" if [[ ! -z ${FAST_STAGE_OPTIONAL+x} ]]; then diff --git a/modules/data-catalog-tag-template/README.md b/modules/data-catalog-tag-template/README.md index 73246cfc4..b94101cad 100644 --- a/modules/data-catalog-tag-template/README.md +++ b/modules/data-catalog-tag-template/README.md @@ -131,10 +131,10 @@ fields: | name | description | type | required | default | |---|---|:---:|:---:|:---:| -| [project_id](variables.tf#L26) | Id of the project where Tag Templates will be created. | string | ✓ | | -| [region](variables.tf#L31) | Default region for tag templates. | string | ✓ | | -| [factories_config](variables.tf#L17) | Paths to data files and folders that enable factory functionality. | object({…}) | | {} | -| [tag_templates](variables.tf#L36) | Tag templates definitions in the form {TAG_TEMPLATE_ID => TEMPLATE_DEFINITION}. | map(object({…})) | | {} | +| [project_id](variables.tf#L29) | Id of the project where Tag Templates will be created. | string | ✓ | | +| [factories_config](variables.tf#L17) | Paths to data files and folders that enable factory functionality. | object({…}) | | {} | +| [region](variables.tf#L34) | Default region for tag templates. | string | | null | +| [tag_templates](variables.tf#L40) | Tag templates definitions in the form {TAG_TEMPLATE_ID => TEMPLATE_DEFINITION}. | map(object({…})) | | {} | ## Outputs diff --git a/modules/data-catalog-tag-template/main.tf b/modules/data-catalog-tag-template/main.tf index 1498dcb81..2c3121948 100644 --- a/modules/data-catalog-tag-template/main.tf +++ b/modules/data-catalog-tag-template/main.tf @@ -53,9 +53,13 @@ locals { } resource "google_data_catalog_tag_template" "default" { - for_each = local.tag_templates - project = var.project_id - region = coalesce(each.value.region, var.region) + for_each = local.tag_templates + project = var.project_id + region = lookup( + var.factories_config, + coalesce(each.value.region, var.region), + coalesce(each.value.region, var.region) + ) tag_template_id = each.key display_name = each.value.display_name dynamic "fields" { diff --git a/modules/data-catalog-tag-template/variables.tf b/modules/data-catalog-tag-template/variables.tf index 3e72cc3e2..5d17a730d 100644 --- a/modules/data-catalog-tag-template/variables.tf +++ b/modules/data-catalog-tag-template/variables.tf @@ -18,6 +18,9 @@ variable "factories_config" { description = "Paths to data files and folders that enable factory functionality." type = object({ tag_templates = optional(string) + context = optional(object({ + regions = optional(map(string), {}) + }), {}) }) nullable = false default = {} @@ -31,6 +34,7 @@ variable "project_id" { variable "region" { description = "Default region for tag templates." type = string + default = null } variable "tag_templates" { diff --git a/modules/dataplex-aspect-types/README.md b/modules/dataplex-aspect-types/README.md index 9dba8ba63..488012dad 100644 --- a/modules/dataplex-aspect-types/README.md +++ b/modules/dataplex-aspect-types/README.md @@ -69,12 +69,19 @@ module "aspect-types" { Aspect types can also be defined via a resource factory, where the file name will be used as the aspect type id. The resulting data is then internally combined with the `aspect_types` variable. +IAM attributes can leverage substitutions for principals, which need to be defined via the `factories_configs.context.iam_principals` variable as shown in the example below. + ```hcl module "aspect-types" { source = "./fabric/modules/dataplex-aspect-types" project_id = "test-project" factories_config = { aspect_types = "data/aspect-types" + context = { + iam_principals = { + test-sa = "serviceAccount:sa-0@test-project.iam.gserviceaccount.com" + } + } } } # tftest modules=1 resources=4 files=aspect-0,aspect-1 @@ -83,8 +90,8 @@ module "aspect-types" { ```yaml display_name: "Test template 0." iam: - roles/dataplex.aspectTypeOwner: - - "group:data-owners@example.com" + "roles/dataplex.aspectTypeOwner": + - group:data-owners@example.com metadata_template: | { "name": "tf-test-template-0", @@ -117,8 +124,8 @@ metadata_template: | display_name: "Test template 1." iam_bindings_additive: user: - role: "roles/dataplex.aspectTypeUser" - member: "serviceAccount:sa-0@test-project.iam.gserviceaccount.com" + role: roles/dataplex.aspectTypeUser + member: test-sa metadata_template: | { "name": "tf-test-template-1", @@ -151,10 +158,10 @@ metadata_template: | | name | description | type | required | default | |---|---|:---:|:---:|:---:| -| [project_id](variables.tf#L64) | Project id where resources will be created. | string | ✓ | | +| [project_id](variables.tf#L67) | Project id where resources will be created. | string | ✓ | | | [aspect_types](variables.tf#L17) | Aspect templates. Merged with those defined via the factory. | map(object({…})) | | {} | -| [factories_config](variables.tf#L48) | Paths to folders for the optional factories. | object({…}) | | {} | -| [location](variables.tf#L57) | Location for aspect types. | string | | "global" | +| [factories_config](variables.tf#L48) | Paths to folders for the optional factories. | object({…}) | | {} | +| [location](variables.tf#L60) | Location for aspect types. | string | | "global" | ## Outputs diff --git a/modules/dataplex-aspect-types/iam.tf b/modules/dataplex-aspect-types/iam.tf index 4d1f59c61..2cbfbcb77 100644 --- a/modules/dataplex-aspect-types/iam.tf +++ b/modules/dataplex-aspect-types/iam.tf @@ -55,14 +55,20 @@ resource "google_dataplex_aspect_type_iam_binding" "authoritative" { } role = each.value.role aspect_type_id = google_dataplex_aspect_type.default[each.value.aspect_type_id].id - members = each.value.members + members = [ + for v in each.value.members : + lookup(var.factories_config.context.iam_principals, v, v) + ] } resource "google_dataplex_aspect_type_iam_binding" "bindings" { for_each = local.iam_bindings role = each.value.role aspect_type_id = google_dataplex_aspect_type.default[each.value.aspect_type_id].id - members = each.value.members + members = [ + for v in each.value.members : + lookup(var.factories_config.context.iam_principals, v, v) + ] dynamic "condition" { for_each = each.value.condition == null ? [] : [""] content { @@ -77,7 +83,9 @@ resource "google_dataplex_aspect_type_iam_member" "members" { for_each = local.iam_bindings_additive aspect_type_id = google_dataplex_aspect_type.default[each.value.aspect_type_id].id role = each.value.role - member = each.value.member + member = lookup( + var.factories_config.context.iam_principals, each.value.member, each.value.member + ) dynamic "condition" { for_each = each.value.condition == null ? [] : [""] content { diff --git a/modules/dataplex-aspect-types/variables.tf b/modules/dataplex-aspect-types/variables.tf index 9c43b37be..bc52caaaf 100644 --- a/modules/dataplex-aspect-types/variables.tf +++ b/modules/dataplex-aspect-types/variables.tf @@ -49,6 +49,9 @@ variable "factories_config" { description = "Paths to folders for the optional factories." type = object({ aspect_types = optional(string) + context = optional(object({ + iam_principals = optional(map(string), {}) + }), {}) }) nullable = false default = {} diff --git a/modules/project-factory/README.md b/modules/project-factory/README.md index 5b7b4b6a2..b64175908 100644 --- a/modules/project-factory/README.md +++ b/modules/project-factory/README.md @@ -495,9 +495,10 @@ service_accounts: | name | description | type | required | default | |---|---|:---:|:---:|:---:| | [factories_config](variables.tf#L131) | Path to folder with YAML resource description data files. | object({…}) | ✓ | | -| [data_defaults](variables.tf#L17) | Optional default values used when corresponding project data from files are missing. | object({…}) | | {} | +| [data_defaults](variables.tf#L17) | Optional default values used when corresponding project data from files are missing. | object({…}) | | {} | | [data_merges](variables.tf#L73) | Optional values that will be merged with corresponding data from files. Combines with `data_defaults`, file data, and `data_overrides`. | object({…}) | | {} | | [data_overrides](variables.tf#L92) | Optional values that override corresponding data from files. Takes precedence over file data and `data_defaults`. | object({…}) | | {} | +| [factories_data](variables.tf#L155) | Alternate factory data input allowing to use this module as a library. Merged with local YAML data. | object({…}) | | {} | ## Outputs diff --git a/modules/project-factory/factory-budgets.tf b/modules/project-factory/factory-budgets.tf index 97c138487..512021d16 100644 --- a/modules/project-factory/factory-budgets.tf +++ b/modules/project-factory/factory-budgets.tf @@ -19,7 +19,8 @@ locals { # reimplement the billing account factory here to interpolate projects _budget_path = try(pathexpand(var.factories_config.budgets.budgets_data_path), null) - _budgets = ( + _budgets = merge( + var.factories_data.budgets, { for f in try(fileset(local._budget_path, "**/*.yaml"), []) : trimsuffix(f, ".yaml") => yamldecode(file("${local._budget_path}/${f}")) diff --git a/modules/project-factory/factory-folders.tf b/modules/project-factory/factory-folders.tf index bc9fb812a..3da56dabd 100644 --- a/modules/project-factory/factory-folders.tf +++ b/modules/project-factory/factory-folders.tf @@ -20,11 +20,14 @@ locals { _folders_path = try( pathexpand(var.factories_config.folders_data_path), null ) - _folders = { - for f in local._hierarchy_files : dirname(f) => yamldecode(file( - "${coalesce(var.factories_config.folders_data_path, "-")}/${f}" - )) - } + _folders = merge( + var.factories_data.hierarchy, + { + for f in local._hierarchy_files : dirname(f) => yamldecode(file( + "${coalesce(var.factories_config.folders_data_path, "-")}/${f}" + )) + } + ) _hierarchy_files = try( fileset(local._folders_path, "**/_config.yaml"), [] diff --git a/modules/project-factory/variables.tf b/modules/project-factory/variables.tf index 7e351bb8a..b65afd253 100644 --- a/modules/project-factory/variables.tf +++ b/modules/project-factory/variables.tf @@ -47,7 +47,7 @@ variable "data_defaults" { service_agent_subnet_iam = optional(map(list(string)), {}) service_iam_grants = optional(list(string), []) network_subnet_users = optional(map(list(string)), {}) - }), { host_project = null }) + })) storage_location = optional(string) tag_bindings = optional(map(string), {}) # non-project resources @@ -151,3 +151,247 @@ variable "factories_config" { }) nullable = false } + +variable "factories_data" { + description = "Alternate factory data input allowing to use this module as a library. Merged with local YAML data." + type = object({ + budgets = optional(map(object({ + amount = object({ + currency_code = optional(string) + nanos = optional(number) + units = optional(number) + use_last_period = optional(bool) + }) + display_name = optional(string) + filter = optional(object({ + credit_types_treatment = optional(object({ + exclude_all = optional(bool) + include_specified = optional(list(string)) + })) + label = optional(object({ + key = string + value = string + })) + period = optional(object({ + calendar = optional(string) + custom = optional(object({ + start_date = object({ + day = number + month = number + year = number + }) + end_date = optional(object({ + day = number + month = number + year = number + })) + })) + })) + projects = optional(list(string)) + resource_ancestors = optional(list(string)) + services = optional(list(string)) + subaccounts = optional(list(string)) + })) + threshold_rules = optional(list(object({ + percent = number + forecasted_spend = optional(bool) + })), []) + update_rules = optional(map(object({ + disable_default_iam_recipients = optional(bool) + monitoring_notification_channels = optional(list(string)) + pubsub_topic = optional(string) + })), {}) + })), {}) + hierarchy = optional(map(object({ + name = optional(string) + parent = optional(string) + iam = optional(map(list(string)), {}) + iam_bindings = optional(map(object({ + members = list(string) + role = string + condition = optional(object({ + expression = string + title = string + description = optional(string) + })) + })), {}) + iam_bindings_additive = optional(map(object({ + member = string + role = string + condition = optional(object({ + expression = string + title = string + description = optional(string) + })) + })), {}) + iam_by_principals = optional(map(list(string)), {}) + tag_bindings = optional(map(string), {}) + })), {}) + projects = optional(map(object({ + automation = optional(object({ + project = string + bucket = optional(object({ + location = string + description = optional(string) + prefix = optional(string) + storage_class = optional(string, "STANDARD") + uniform_bucket_level_access = optional(bool, true) + versioning = optional(bool) + iam = optional(map(list(string)), {}) + iam_bindings = optional(map(object({ + members = list(string) + role = string + condition = optional(object({ + expression = string + title = string + description = optional(string) + })) + })), {}) + iam_bindings_additive = optional(map(object({ + member = string + role = string + condition = optional(object({ + expression = string + title = string + description = optional(string) + })) + })), {}) + labels = optional(map(string), {}) + })) + service_accounts = optional(map(object({ + description = optional(string) + iam = optional(map(list(string)), {}) + iam_bindings = optional(map(object({ + members = list(string) + role = string + condition = optional(object({ + expression = string + title = string + description = optional(string) + })) + })), {}) + iam_bindings_additive = optional(map(object({ + member = string + role = string + condition = optional(object({ + expression = string + title = string + description = optional(string) + })) + })), {}) + iam_billing_roles = optional(map(list(string)), {}) + iam_folder_roles = optional(map(list(string)), {}) + iam_organization_roles = optional(map(list(string)), {}) + iam_project_roles = optional(map(list(string)), {}) + iam_sa_roles = optional(map(list(string)), {}) + iam_storage_roles = optional(map(list(string)), {}) + })), {}) + })) + billing_account = optional(string) + billing_budgets = optional(list(string), []) + buckets = optional(map(object({ + location = string + description = optional(string) + prefix = optional(string) + storage_class = optional(string, "STANDARD") + uniform_bucket_level_access = optional(bool, true) + versioning = optional(bool) + iam = optional(map(list(string)), {}) + iam_bindings = optional(map(object({ + members = list(string) + role = string + condition = optional(object({ + expression = string + title = string + description = optional(string) + })) + })), {}) + iam_bindings_additive = optional(map(object({ + member = string + role = string + condition = optional(object({ + expression = string + title = string + description = optional(string) + })) + })), {}) + labels = optional(map(string), {}) + })), {}) + contacts = optional(map(list(string)), {}) + iam = optional(map(list(string)), {}) + iam_bindings = optional(map(object({ + members = list(string) + role = string + condition = optional(object({ + expression = string + title = string + description = optional(string) + })) + })), {}) + iam_bindings_additive = optional(map(object({ + member = string + role = string + condition = optional(object({ + expression = string + title = string + description = optional(string) + })) + })), {}) + iam_by_principals = optional(map(list(string)), {}) + labels = optional(map(string), {}) + metric_scopes = optional(list(string), []) + name = optional(string) + org_policies = optional(map(object({ + inherit_from_parent = optional(bool) # for list policies only. + reset = optional(bool) + rules = optional(list(object({ + allow = optional(object({ + all = optional(bool) + values = optional(list(string)) + })) + deny = optional(object({ + all = optional(bool) + values = optional(list(string)) + })) + enforce = optional(bool) # for boolean policies only. + condition = optional(object({ + description = optional(string) + expression = optional(string) + location = optional(string) + title = optional(string) + }), {}) + parameters = optional(string) + })), []) + })), {}) + parent = optional(string) + prefix = optional(string) + service_accounts = optional(map(object({ + display_name = optional(string) + iam_self_roles = optional(list(string), []) + iam_project_roles = optional(map(list(string)), {}) + })), {}) + service_encryption_key_ids = optional(map(list(string)), {}) + services = optional(list(string), []) + shared_vpc_host_config = optional(object({ + enabled = bool + service_projects = optional(list(string), []) + })) + shared_vpc_service_config = optional(object({ + host_project = string + network_users = optional(list(string), []) + service_agent_iam = optional(map(list(string)), {}) + service_agent_subnet_iam = optional(map(list(string)), {}) + service_iam_grants = optional(list(string), []) + network_subnet_users = optional(map(list(string)), {}) + })) + tag_bindings = optional(map(string), {}) + vpc_sc = optional(object({ + perimeter_name = string + perimeter_bridges = optional(list(string), []) + is_dry_run = optional(bool, false) + })) + })), {}) + }) + nullable = false + default = {} +} diff --git a/modules/project/README.md b/modules/project/README.md index 0380ac2ea..5996311e1 100644 --- a/modules/project/README.md +++ b/modules/project/README.md @@ -1654,12 +1654,12 @@ alerts: | [service_encryption_key_ids](variables.tf#L204) | Service Agents to be granted encryption/decryption permissions over Cloud KMS encryption keys. Format {SERVICE_AGENT => [KEY_ID]}. | map(list(string)) | | {} | | [services](variables.tf#L211) | Service APIs to enable. | list(string) | | [] | | [shared_vpc_host_config](variables.tf#L217) | Configures this project as a Shared VPC host project (mutually exclusive with shared_vpc_service_project). | object({…}) | | null | -| [shared_vpc_service_config](variables.tf#L226) | Configures this project as a Shared VPC service project (mutually exclusive with shared_vpc_host_config). | object({…}) | | {…} | -| [skip_delete](variables.tf#L254) | Deprecated. Use deletion_policy. | bool | | null | +| [shared_vpc_service_config](variables.tf#L227) | Configures this project as a Shared VPC service project (mutually exclusive with shared_vpc_host_config). | object({…}) | | {…} | +| [skip_delete](variables.tf#L255) | Deprecated. Use deletion_policy. | bool | | null | | [tag_bindings](variables-tags.tf#L81) | Tag bindings for this project, in key => tag value id format. | map(string) | | null | | [tags](variables-tags.tf#L88) | Tags by key name. If `id` is provided, key or value creation is skipped. The `iam` attribute behaves like the similarly named one at module level. | map(object({…})) | | {} | -| [universe](variables.tf#L266) | GCP universe where to deploy the project. The prefix will be prepended to the project id. | object({…}) | | null | -| [vpc_sc](variables.tf#L275) | VPC-SC configuration for the project, use when `ignore_changes` for resources is set in the VPC-SC module. | object({…}) | | null | +| [universe](variables.tf#L267) | GCP universe where to deploy the project. The prefix will be prepended to the project id. | object({…}) | | null | +| [vpc_sc](variables.tf#L276) | VPC-SC configuration for the project, use when `ignore_changes` for resources is set in the VPC-SC module. | object({…}) | | null | ## Outputs diff --git a/modules/project/cmek.tf b/modules/project/cmek.tf index 8bd44a56f..68ab9a325 100644 --- a/modules/project/cmek.tf +++ b/modules/project/cmek.tf @@ -27,10 +27,9 @@ locals { "artifactregistry.googleapis.com" : ["artifactregistry"] "bigtableadmin.googleapis.com" : ["bigtable"] "bigquery.googleapis.com" : ["bigquery-encryption"] - "composer.googleapis.com" : [ - "composer", "artifactregistry", "container-engine", - "compute", "pubsub", "storage" - ] + # the list for composer now track composer 3 + # https://cloud.google.com/composer/docs/composer-3/configure-cmek-encryption#grant-roles-permissions + "composer.googleapis.com" : ["composer", "storage"] "compute.googleapis.com" : ["compute"] "container.googleapis.com" : ["compute"] "dataflow.googleapis.com" : ["dataflow", "compute"] diff --git a/modules/project/variables.tf b/modules/project/variables.tf index d2b1b0ddb..6cff61263 100644 --- a/modules/project/variables.tf +++ b/modules/project/variables.tf @@ -220,7 +220,8 @@ variable "shared_vpc_host_config" { enabled = bool service_projects = optional(list(string), []) }) - default = null + nullable = true + default = null } variable "shared_vpc_service_config" { diff --git a/tests/fast/stages/s0_bootstrap/cicd.yaml b/tests/fast/stages/s0_bootstrap/cicd.yaml index d91dae7c6..54e1d6f0f 100644 --- a/tests/fast/stages/s0_bootstrap/cicd.yaml +++ b/tests/fast/stages/s0_bootstrap/cicd.yaml @@ -343,7 +343,7 @@ counts: google_project_iam_audit_config: 1 google_project_iam_binding: 19 google_project_iam_member: 23 - google_project_service: 32 + google_project_service: 33 google_project_service_identity: 8 google_service_account: 12 google_service_account_iam_binding: 12 @@ -356,4 +356,4 @@ counts: google_tags_tag_value: 2 local_file: 13 modules: 26 - resources: 287 + resources: 288 diff --git a/tests/fast/stages/s0_bootstrap/simple.yaml b/tests/fast/stages/s0_bootstrap/simple.yaml index 21a9a233e..5230b1bef 100644 --- a/tests/fast/stages/s0_bootstrap/simple.yaml +++ b/tests/fast/stages/s0_bootstrap/simple.yaml @@ -28,7 +28,7 @@ counts: google_project_iam_audit_config: 1 google_project_iam_binding: 19 google_project_iam_member: 17 - google_project_service: 32 + google_project_service: 33 google_project_service_identity: 8 google_service_account: 6 google_service_account_iam_binding: 6 @@ -41,7 +41,7 @@ counts: google_tags_tag_value: 2 local_file: 8 modules: 20 - resources: 250 + resources: 251 outputs: automation: __missing__ diff --git a/tests/fast/stages/s1_resman/simple.yaml b/tests/fast/stages/s1_resman/simple.yaml index d0d25f9a9..7020270fb 100644 --- a/tests/fast/stages/s1_resman/simple.yaml +++ b/tests/fast/stages/s1_resman/simple.yaml @@ -13,23 +13,23 @@ # limitations under the License. counts: - google_folder: 14 - google_folder_iam_binding: 67 + google_folder: 16 + google_folder_iam_binding: 74 google_org_policy_policy: 2 - google_organization_iam_member: 20 - google_project_iam_member: 17 - google_service_account: 17 - google_service_account_iam_binding: 17 - google_storage_bucket: 8 - google_storage_bucket_iam_binding: 16 - google_storage_bucket_iam_member: 17 - google_storage_bucket_object: 19 - google_tags_tag_binding: 14 + google_organization_iam_member: 21 + google_project_iam_member: 19 + google_service_account: 19 + google_service_account_iam_binding: 19 + google_storage_bucket: 9 + google_storage_bucket_iam_binding: 18 + google_storage_bucket_iam_member: 19 + google_storage_bucket_object: 21 + google_tags_tag_binding: 16 google_tags_tag_key: 2 google_tags_tag_value: 13 google_tags_tag_value_iam_binding: 4 - modules: 40 - resources: 247 + modules: 45 + resources: 272 outputs: cicd_repositories: @@ -40,6 +40,8 @@ outputs: name: cloud-foundation-fabric/1-resman type: github service_accounts: + data-platform-dev-ro: fast2-dev-resman-dp-0r@fast2-prod-automation.iam.gserviceaccount.com + data-platform-dev-rw: fast2-dev-resman-dp-0@fast2-prod-automation.iam.gserviceaccount.com gcve-dev-ro: fast2-dev-resman-gcve-0r@fast2-prod-automation.iam.gserviceaccount.com gcve-dev-rw: fast2-dev-resman-gcve-0@fast2-prod-automation.iam.gserviceaccount.com gke-dev-ro: fast2-dev-resman-gke-0r@fast2-prod-automation.iam.gserviceaccount.com diff --git a/tests/fast/stages/s2_security/simple.tfvars b/tests/fast/stages/s2_security/simple.tfvars index f41ca510c..e60698058 100644 --- a/tests/fast/stages/s2_security/simple.tfvars +++ b/tests/fast/stages/s2_security/simple.tfvars @@ -24,10 +24,6 @@ certificate_authorities = { location = "europe-west8" } } -custom_roles = { - project_iam_viewer = "organizations/123456789012/roles/bar" - service_project_network_admin = "organizations/123456789012/roles/foo" -} environments = { dev = { is_default = false diff --git a/tests/fast/stages/s3_data_platform_dev/__init__.py b/tests/fast/stages/s3_data_platform_dev/__init__.py new file mode 100644 index 000000000..c37e93b74 --- /dev/null +++ b/tests/fast/stages/s3_data_platform_dev/__init__.py @@ -0,0 +1,13 @@ +# Copyright 2025 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. diff --git a/tests/fast/stages/s3_data_platform_dev/simple.tfvars b/tests/fast/stages/s3_data_platform_dev/simple.tfvars new file mode 100644 index 000000000..e5dd22f76 --- /dev/null +++ b/tests/fast/stages/s3_data_platform_dev/simple.tfvars @@ -0,0 +1,44 @@ +automation = { + outputs_bucket = "fast2-prod-iac-core-outputs" +} +billing_account = { + id = "000000-111111-222222" +} +environments = { + dev = { + is_default = false + name = "Development" + short_name = "dev" + tag_name = "development" + } +} +factories_config = { + context = { + iam_principals = { + data-consumer-bi = "group:gcp-consumer-bi@example.com" + dp-product-a-0 = "group:gcp-data-product-a-0@example.com" + } + } +} +folder_ids = { + data-platform-dev = "folders/00000000000000" +} +host_project_ids = { + dev-spoke-0 = "fast2-dev-net-spoke-0" +} +organization = { + domain = "fast.example.com" + id = 123456789012 + customer_id = "C00000000" +} +prefix = "fast2" +subnet_self_links = { + dev-spoke-0 = { + "europe-west8/dev-dataplatform" = "projects/fast2-dev-net-spoke-0/regions/europe-west8/subnetworks/dev-dataplatform" + } +} +vpc_self_links = { + dev-spoke-0 = "projects/fast2-dev-net-spoke-0/global/networks/dev-spoke-0" +} + + diff --git a/tests/fast/stages/s3_data_platform_dev/simple.yaml b/tests/fast/stages/s3_data_platform_dev/simple.yaml new file mode 100644 index 000000000..e4319eb6f --- /dev/null +++ b/tests/fast/stages/s3_data_platform_dev/simple.yaml @@ -0,0 +1,41 @@ +# Copyright 2024 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +counts: + google_bigquery_dataset: 1 + google_bigquery_dataset_iam_binding: 1 + google_bigquery_default_service_account: 2 + google_composer_environment: 1 + google_compute_shared_vpc_service_project: 1 + google_data_catalog_policy_tag: 3 + google_data_catalog_taxonomy: 1 + google_dataplex_aspect_type: 1 + google_folder: 2 + google_folder_iam_binding: 5 + google_project: 3 + google_project_iam_binding: 21 + google_project_iam_member: 13 + google_project_service: 17 + google_project_service_identity: 6 + google_service_account: 6 + google_service_account_iam_binding: 4 + google_storage_bucket: 3 + google_storage_bucket_iam_binding: 5 + google_storage_bucket_object: 5 + google_storage_project_service_account: 2 + google_tags_location_tag_binding: 2 + google_tags_tag_key: 1 + google_tags_tag_value: 1 + modules: 19 + resources: 107 diff --git a/tests/fast/stages/s3_data_platform_dev/tftest.yaml b/tests/fast/stages/s3_data_platform_dev/tftest.yaml new file mode 100644 index 000000000..6d1d5c567 --- /dev/null +++ b/tests/fast/stages/s3_data_platform_dev/tftest.yaml @@ -0,0 +1,18 @@ +# Copyright 2024 Google LLC +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +module: fast/stages/3-data-platform-dev + +tests: + simple: diff --git a/tests/fixtures.py b/tests/fixtures.py index c45d19697..7e9eace58 100644 --- a/tests/fixtures.py +++ b/tests/fixtures.py @@ -191,7 +191,7 @@ def plan_validator(module_path, inventory_paths, basedir, tf_var_files=None, # - put the values coming from user's inventory the right # side of any comparison operators. # - include a descriptive error message to the assert - print(yaml.dump({'values': summary.values})) + # print(yaml.dump({'values': summary.values})) # print("", yaml.dump({'counts': summary.counts})) if 'values' in inventory: