Files
hunfabric/modules/dataproc
Jason Steenblik 90360c591e Add confidential compute support to google_dataproc_cluster in the da… (#2736)
* Add confidential compute support to google_dataproc_cluster in the dataproc module

* fix parent id lookup for networking and security stages (#2744)

* Add optional automated MD5 generation in net-vlan-attachment module (#2745)

* Bump path-to-regexp and express in /blueprints/gke/binauthz/image (#2749)

Bumps [path-to-regexp](https://github.com/pillarjs/path-to-regexp) to 0.1.12 and updates ancestor dependency [express](https://github.com/expressjs/express). These dependencies need to be updated together.


Updates `path-to-regexp` from 0.1.10 to 0.1.12
- [Release notes](https://github.com/pillarjs/path-to-regexp/releases)
- [Changelog](https://github.com/pillarjs/path-to-regexp/blob/master/History.md)
- [Commits](https://github.com/pillarjs/path-to-regexp/compare/v0.1.10...v0.1.12)

Updates `express` from 4.21.1 to 4.21.2
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.2/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.21.1...4.21.2)

---
updated-dependencies:
- dependency-name: path-to-regexp
  dependency-type: indirect
- dependency-name: express
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add ability to autogenerate md5 keys in net-vpn-ha (#2748)

* Add ability to optionally generate MD5 secrets in VPN module

* Add ability to autogenerate MD5 keys in net-vpn-ha module

* restore missing output

* fix test counts

---------

Co-authored-by: Luca Prete <lucaprete@google.com>
Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com>

* update changelog

* Bump path-to-regexp and express (#2752)

Bumps [path-to-regexp](https://github.com/pillarjs/path-to-regexp) and [express](https://github.com/expressjs/express). These dependencies needed to be updated together.

Updates `path-to-regexp` from 0.1.10 to 0.1.12
- [Release notes](https://github.com/pillarjs/path-to-regexp/releases)
- [Changelog](https://github.com/pillarjs/path-to-regexp/blob/master/History.md)
- [Commits](https://github.com/pillarjs/path-to-regexp/compare/v0.1.10...v0.1.12)

Updates `express` from 4.21.1 to 4.21.2
- [Release notes](https://github.com/expressjs/express/releases)
- [Changelog](https://github.com/expressjs/express/blob/4.21.2/History.md)
- [Commits](https://github.com/expressjs/express/compare/4.21.1...4.21.2)

---
updated-dependencies:
- dependency-name: path-to-regexp
  dependency-type: indirect
- dependency-name: express
  dependency-type: indirect
...

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* add support for routing mode to net-swp module (#2751)

Co-authored-by: Julio Castillo <jccb@google.com>

* remove default location in tag value - cloud-run-v2 tags.tf (#2755)

The Parent resource has a default to europe-west1 when it should be for the resource block from where the cloud run actually is.

Changed to use the var.region instead

* Add path_template_match and path_template_rewrite support to net-lb-app-ext (required for React apps for example).

* Add rest of load balancers.

* Add path_template_match and path_template_rewrite support to internal load balancers

* Add disk encyption key to the google_compute_instance_template - Sovereign support (#2750)

* add disk encyption key to the google_compute_instance_template

* add a condition to the kms_key_self_link

* use dynamic variable for disk_encryption_key

* remove the getpip from the repo

---------

Co-authored-by: Julio Castillo <jccb@google.com>
Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com>

* Add support for password validation policy to cloudsql module (#2740)

* add support for password validation policy to cloudsql module

* fix defaults

* update changelog

* bump provider version constraint

---------

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: Ludovico Magnocavallo <ludomagno@google.com>
Co-authored-by: Luca Prete <preteluca@gmail.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Co-authored-by: Luca Prete <lucaprete@google.com>
Co-authored-by: Julio Castillo <jccb@google.com>
Co-authored-by: Matthew Callinan <47421139+Mattible@users.noreply.github.com>
Co-authored-by: Taneli Leppä <taneli@google.com>
Co-authored-by: Wiktor Niesiobędzki <wiktorn@google.com>
Co-authored-by: Kovács Dávid <david-kovacs@t-systems.com>
2024-12-10 16:39:48 +01:00
..
2024-04-17 10:23:48 +02:00

Google Cloud Dataproc

This module Manages a Google Cloud Dataproc cluster resource, including IAM.

TODO

Examples

Simple

module "dataproc-cluster" {
  source     = "./fabric/modules/dataproc"
  project_id = var.project_id
  name       = "my-cluster"
  region     = var.region
}
# tftest modules=1 resources=1

Cluster configuration on GCE

To set cluster configuration use the 'dataproc_config.cluster_config' variable. If you don't want to use dedicated service account, remember to grant roles/dataproc.worker to Compute Default Service Account.

module "dataproc-service-account" {
  source     = "./fabric/modules/iam-service-account"
  project_id = var.project_id
  name       = "dataproc-worker"
  iam_project_roles = {
    (var.project_id) = ["roles/dataproc.worker"]
  }
}

module "firewall" {
  source     = "./fabric/modules/net-vpc-firewall"
  project_id = var.project_id
  network    = var.vpc.name
  ingress_rules = {
    allow-ingress-dataproc = {
      description = "Allow all traffic between Dataproc nodes."
      targets     = ["dataproc"]
      sources     = ["dataproc"]
    }
  }
}

module "processing-dp-cluster" {
  source     = "./fabric/modules/dataproc"
  project_id = var.project_id
  name       = "my-cluster"
  region     = var.region
  dataproc_config = {
    cluster_config = {
      gce_cluster_config = {
        internal_ip_only       = true
        service_account        = module.dataproc-service-account.email
        service_account_scopes = ["cloud-platform"]
        subnetwork             = var.subnet.self_link
        tags                   = ["dataproc"]
        zone                   = "${var.region}-b"
      }
    }
  }
  depends_on = [
    module.dataproc-service-account, # ensure all grants are done before creating the cluster
  ]
}
# tftest modules=3 resources=7 e2e

Cluster configuration on GCE with CMEK encryption

To set cluster configuration use the Customer Managed Encryption key, set dataproc_config.encryption_config. variable. The Compute Engine service agent and the Cloud Storage service agent need to have CryptoKey Encrypter/Decrypter role on they configured KMS key (Documentation).

module "project" {
  source          = "./fabric/modules/project"
  name            = "dataproc"
  billing_account = var.billing_account_id
  prefix          = var.prefix
  parent          = var.folder_id
  services = [
    "cloudkms.googleapis.com",
    "dataproc.googleapis.com",
    "servicenetworking.googleapis.com",
  ]
}

module "kms" {
  source     = "./fabric/modules/kms"
  project_id = module.project.project_id
  keyring = {
    location = var.region
    name     = "keyring"
  }
  keys = {
    "key-regional" = {
    }
  }
  iam = {
    "roles/cloudkms.cryptoKeyEncrypterDecrypter" = [
      module.project.service_agents.dataproc.iam_email
    ]
  }
}

module "vpc" {
  source     = "./fabric/modules/net-vpc"
  project_id = module.project.project_id
  name       = "my-network"
  subnets = [
    {
      ip_cidr_range = "10.0.0.0/24"
      name          = "production"
      region        = var.region
    },
  ]
  psa_configs = [{
    ranges = { myrange = "10.0.1.0/24" }
  }]
}

module "dataproc-service-account" {
  source     = "./fabric/modules/iam-service-account"
  project_id = module.project.project_id
  name       = "dataproc-worker"
  iam_project_roles = {
    (module.project.project_id) = ["roles/dataproc.worker", "roles/cloudkms.cryptoKeyEncrypterDecrypter"]
  }
}

module "firewall" {
  source     = "./fabric/modules/net-vpc-firewall"
  project_id = module.project.project_id
  network    = module.vpc.name
  ingress_rules = {
    allow-ingress-dataproc = {
      description = "Allow all traffic between Dataproc nodes."
      targets     = ["dataproc"]
      sources     = ["dataproc"]
    }
  }
}

module "processing-dp-cluster" {
  source     = "./fabric/modules/dataproc"
  project_id = module.project.project_id
  name       = "my-cluster"
  region     = var.region
  dataproc_config = {
    cluster_config = {
      gce_cluster_config = {
        internal_ip_only       = true
        service_account        = module.dataproc-service-account.email
        service_account_scopes = ["cloud-platform"]
        subnetwork             = module.vpc.subnet_self_links["${var.region}/production"]
        tags                   = ["dataproc"]
        zone                   = "${var.region}-b"
      }
    }
    encryption_config = {
      kms_key_name = module.kms.keys.key-regional.id
    }
  }
}
# tftest modules=6 resources=28 e2e

Cluster configuration on GKE

To set cluster configuration GKE use the 'dataproc_config.virtual_cluster_config' variable. This example shows usage of dedicated Service Account.

locals {
  dataproc_namespace = "foobar"
}

module "dataproc-service-account" {
  source     = "./fabric/modules/iam-service-account"
  project_id = var.project_id
  name       = "dataproc-worker"
  iam = {
    "roles/iam.workloadIdentityUser" = [
      "serviceAccount:${var.project_id}.svc.id.goog[${local.dataproc_namespace}/agent]",
      "serviceAccount:${var.project_id}.svc.id.goog[${local.dataproc_namespace}/spark-driver]",
      "serviceAccount:${var.project_id}.svc.id.goog[${local.dataproc_namespace}/spark-executor]"
    ]
  }
  iam_project_roles = {
    (var.project_id) = ["roles/dataproc.worker"]
  }
  depends_on = [
    module.gke-cluster-standard, # granting workloadIdentityUser requires cluster/pool to be created first
  ]
}

module "processing-dp-cluster" {
  source     = "./fabric/modules/dataproc"
  project_id = var.project_id
  name       = "my-dataproc-cluster"
  region     = var.region
  dataproc_config = {
    virtual_cluster_config = {
      kubernetes_cluster_config = {
        kubernetes_namespace = local.dataproc_namespace
        kubernetes_software_config = {
          component_version = {
            "SPARK" : "3.1-dataproc-14"
          }
          properties = {
            "dataproc:dataproc.gke.agent.google-service-account"          = module.dataproc-service-account.email
            "dataproc:dataproc.gke.spark.driver.google-service-account"   = module.dataproc-service-account.email
            "dataproc:dataproc.gke.spark.executor.google-service-account" = module.dataproc-service-account.email
          }
        }
        gke_cluster_config = {
          gke_cluster_target = module.gke-cluster-standard.id
          node_pool_target = {
            node_pool = "node-pool-name"
            roles     = ["DEFAULT"]
          }
        }
      }
    }
  }
}
# tftest modules=4 resources=6 fixtures=fixtures/gke-cluster-standard.tf e2e

IAM

IAM is managed via several variables that implement different features and levels of control:

  • iam and iam_by_principals configure authoritative bindings that manage individual roles exclusively, and are internally merged
  • iam_bindings configure authoritative bindings with optional support for conditions, and are not internally merged with the previous two variables
  • iam_bindings_additive configure additive bindings via individual role/member pairs with optional support conditions

The authoritative and additive approaches can be used together, provided different roles are managed by each. Some care must also be taken with the iam_by_principals variable to ensure that variable keys are static values, so that Terraform is able to compute the dependency graph.

Refer to the project module for examples of the IAM interface.

Authoritative IAM

module "processing-dp-cluster" {
  source     = "./fabric/modules/dataproc"
  project_id = var.project_id
  name       = "my-cluster"
  region     = var.region
  iam_by_principals = {
    "group:gcp-data-engineers@example.net" = [
      "roles/dataproc.viewer"
    ]
  }
  iam = {
    "roles/dataproc.viewer" = [
      "serviceAccount:service-account@PROJECT_ID.iam.gserviceaccount.com"
    ]
  }
}
# tftest modules=1 resources=2

Additive IAM

module "processing-dp-cluster" {
  source     = "./fabric/modules/dataproc"
  project_id = var.project_id
  name       = "my-cluster"
  region     = var.region
  iam_bindings_additive = {
    am1-viewer = {
      member = "user:am1@example.com"
      role   = "roles/dataproc.viewer"
    }
  }
}
# tftest modules=1 resources=2

Variables

name description type required default
name Cluster name. string
project_id Project ID. string
region Dataproc region. string
dataproc_config Dataproc cluster config. object({…}) {}
iam IAM bindings in {ROLE => [MEMBERS]} format. map(list(string)) {}
iam_bindings Authoritative IAM bindings in {KEY => {role = ROLE, members = [], condition = {}}}. Keys are arbitrary. map(object({…})) {}
iam_bindings_additive Individual additive IAM bindings. Keys are arbitrary. map(object({…})) {}
iam_by_principals Authoritative IAM binding in {PRINCIPAL => [ROLES]} format. Principals need to be statically defined to avoid cycle errors. Merged internally with the iam variable. map(list(string)) {}
labels The resource labels for instance to use to annotate any related underlying resources, such as Compute Engine VMs. map(string) {}

Outputs

name description sensitive
id Fully qualified cluster id.
name The name of the cluster.

Fixtures