* Add ephemeral_storage_local_ssd_config support to modules/gke-nodepool Adds ephemeral_storage_local_ssd_count to node_config variable and the corresponding dynamic ephemeral_storage_local_ssd_config block in the node pool resource, enabling use of local SSDs as ephemeral storage. * feat(gke-nodepool): add flex_start support to node_config Add `flex_start` as an optional bool to the `node_config` variable type and wire it through to the `google_container_node_pool` resource's node_config block. This enables DWS (Dynamic Workload Scheduler) flex-start mode for node pools, used for on-demand capacity access without requiring ProvisioningRequest objects (e.g. spot TPU pools). * feat(gke-nodepool): add flex_start support to node_config Add `flex_start` as an optional bool to the `node_config` variable type and wire it through to the `google_container_node_pool` resource's node_config block. This enables DWS (Dynamic Workload Scheduler) flex-start mode for node pools, which allows the Cluster Autoscaler to request capacity on-demand without requiring ProvisioningRequest objects (unlike queued_provisioning). Typical use case is spot TPU node pools. * feat(gke-nodepool): add advanced_machine_features support to node_config Add `advanced_machine_features` as an optional object to the `node_config` variable type and wire it through to the `google_container_node_pool` resource via a dynamic block. This allows callers to configure `threads_per_core` (e.g. set to 1 to disable hyperthreading) and `enable_nested_virtualization` for node pools that require fine-grained CPU threading control or nested hypervisor support. GKE auto-sets `advanced_machine_features` (threads_per_core=1) on ct6e/TPU machine types; exposing this field also lets consumers add it to ignore_changes in their own lifecycle blocks to avoid forced replacements. * feat(gke-nodepool): add containerd_config support to node_config Add `containerd_config` as an optional object to the `node_config` variable and wire it through to the `google_container_node_pool` resource via a dynamic block. This allows callers to configure private registry mirrors or custom containerd registry hosts per node pool — useful for air-gapped environments and internal registry proxies. The `registry_hosts` list maps each upstream server to one or more mirror hosts, with optional `capabilities`, `override_path`, and `dial_timeout` fields (all defaulting to sensible values). * refactor(gke-nodepool): use maps for containerd_config registry_hosts and hosts Convert registry_hosts and hosts from lists to maps so that the registry server and host URLs serve as stable keys, avoiding index-shifting issues with for_each. Add default values for capabilities, override_path, and dial_timeout. Update README example and test inventory accordingly. * Remove default values from containerd_config hosts fields Leave capabilities, override_path, and dial_timeout without defaults so the provider/API picks them rather than the module imposing values. * Refine containerd_config variable interface - Simplify header to optional(map(list(string))) - Flatten ca, client cert/key to strings with descriptive names - Derive private_registry_access_config enabled from ca domain config list - Simplify writable_cgroups to optional(bool) - Flatten gcp_secret_manager_certificate_config to string - Remove redundant defaults where try() handles null in main.tf - Fix long lines in main.tf to stay within 79-char limit - Update copyright year to 2026 in inventory files * fix(gke-nodepool): run terraform fmt to fix attribute alignment in containerd_config * docs(gke-nodepool): regenerate README with updated variable line numbers * fix(gke-nodepool): use coalesce instead of try for null header map in for_each * tests(gke-nodepool): update containerd-config inventory to match actual plan output --------- Co-authored-by: Julio Castillo <jccb@google.com>
GKE nodepool module
This module allows simplified creation and management of individual GKE nodepools, setting sensible defaults (eg a service account is created for nodes if none is set) and allowing for less verbose usage in most use cases.
Example usage
Module defaults
If no specific node configuration is set via variables, the module uses the provider's defaults only setting OAuth scopes to a minimal working set and the node machine type to n1-standard-1. The service account set by the provider in this case is the GCE default service account.
module "cluster-1-nodepool-1" {
source = "./fabric/modules/gke-nodepool"
project_id = "myproject"
cluster_name = "cluster-1"
location = "europe-west1-b"
name = "nodepool-1"
}
# tftest modules=1 resources=1 inventory=basic.yaml
Internally managed service account
There are three different approaches to defining the nodes service account, all depending on the service_account variable where the create attribute controls creation of a new service account by this module, and the email attribute controls the actual service account to use.
If you create a new service account, its resource and email (in both plain and IAM formats) are then available in outputs to reference it in other modules or resources.
GCE default service account
To use the GCE default service account, you can ignore the variable which is equivalent to { create = null, email = null }. This is what the first example of this document does.
Externally defined service account
To use an existing service account, pass in just the email attribute. If you do this, will most likely want to use the cloud-platform scope.
module "cluster-1-nodepool-1" {
source = "./fabric/modules/gke-nodepool"
project_id = "myproject"
cluster_name = "cluster-1"
location = "europe-west1-b"
name = "nodepool-1"
service_account = {
email = "foo-bar@myproject.iam.gserviceaccount.com"
oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
}
}
# tftest modules=1 resources=1 inventory=external-sa.yaml
Auto-created service account
To have the module create a service account, set the create attribute to true and optionally pass the desired account id in email.
module "cluster-1-nodepool-1" {
source = "./fabric/modules/gke-nodepool"
project_id = "myproject"
cluster_name = "cluster-1"
location = "europe-west1-b"
name = "nodepool-1"
service_account = {
create = true
email = "spam-eggs" # optional
oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
}
}
# tftest modules=1 resources=2 inventory=create-sa.yaml
Node & node pool configuration
module "cluster-1-nodepool-1" {
source = "./fabric/modules/gke-nodepool"
project_id = "myproject"
cluster_name = "cluster-1"
location = "europe-west1-b"
name = "nodepool-1"
k8s_labels = { environment = "dev" }
service_account = {
create = true
email = "nodepool-1" # optional
oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
}
node_config = {
machine_type = "n2-standard-2"
disk_size_gb = 50
disk_type = "pd-ssd"
ephemeral_ssd_count = 1
gvnic = true
spot = true
}
nodepool_config = {
autoscaling = {
max_node_count = 10
min_node_count = 1
}
management = {
auto_repair = true
auto_upgrade = false
}
}
}
# tftest modules=1 resources=2 inventory=config.yaml
GPU Node & node pool configuration
module "cluster-1-nodepool-gpu-1" {
source = "./fabric/modules/gke-nodepool"
project_id = "myproject"
cluster_name = "cluster-1"
location = "europe-west4-a"
name = "nodepool-gpu-1"
k8s_labels = { environment = "dev" }
service_account = {
create = true
email = "nodepool-gpu-1" # optional
oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
}
node_config = {
machine_type = "a2-highgpu-1g"
disk_size_gb = 50
disk_type = "pd-ssd"
ephemeral_ssd_count = 1
gvnic = true
spot = true
guest_accelerator = {
type = "nvidia-tesla-a100"
count = 1
gpu_driver = {
version = "LATEST"
}
}
}
}
# tftest modules=1 resources=2 inventory=guest-accelerator.yaml
Dynamic Workload Scheduler (DWS) & node pool configuration
This example uses Dynamic Workload Scheduler (DWS) to configure a GPU nodepool.
module "cluster-1-nodepool-dws" {
source = "./fabric/modules/gke-nodepool"
project_id = "myproject"
cluster_name = "cluster-1"
location = "europe-west4-a"
name = "nodepool-dws"
k8s_labels = { environment = "dev" }
service_account = {
create = true
email = "nodepool-gpu-1" # optional
oauth_scopes = ["https://www.googleapis.com/auth/cloud-platform"]
}
node_config = {
machine_type = "g2-standard-4"
disk_size_gb = 50
disk_type = "pd-ssd"
ephemeral_ssd_count = 1
gvnic = true
spot = true
guest_accelerator = {
type = "nvidia-l4"
count = 1
gpu_driver = {
version = "LATEST"
}
}
}
nodepool_config = {
autoscaling = {
max_node_count = 10
min_node_count = 0
}
queued_provisioning = true
}
node_count = {
initial = 0
}
reservation_affinity = {
consume_reservation_type = "NO_RESERVATION"
}
}
# tftest modules=1 resources=2 inventory=dws.yaml
Hyperdisk Balanced
This example shows how to configure Hyperdisk Balanced with provisioned IOPS and throughput.
module "cluster-1-nodepool-hyperdisk" {
source = "./fabric/modules/gke-nodepool"
project_id = "myproject"
cluster_name = "cluster-1"
location = "europe-west4-a"
name = "nodepool-hyperdisk"
node_config = {
machine_type = "c3-standard-4"
boot_disk = {
image_type = "COS_CONTAINERD"
type = "hyperdisk-balanced"
size_gb = 100
provisioned_iops = 3000
provisioned_throughput = 140
}
}
}
# tftest modules=1 resources=1 inventory=hyperdisk.yaml
Advanced machine features
This example shows how to configure advanced machine features such as disabling hyperthreading (threads_per_core = 1) or enabling nested virtualization, useful for performance-sensitive workloads or VMs that require running nested hypervisors.
module "cluster-1-nodepool-advanced-machine-features" {
source = "./fabric/modules/gke-nodepool"
project_id = "myproject"
cluster_name = "cluster-1"
location = "europe-west4-a"
name = "nodepool-advanced-machine-features"
node_config = {
machine_type = "n2-standard-4"
advanced_machine_features = {
threads_per_core = 1
}
}
}
# tftest modules=1 resources=1 inventory=advanced-machine-features.yaml
Containerd registry mirror configuration
This example shows how to configure a private registry mirror for containerd on each node, useful for air-gapped environments or when pulling images through an internal registry proxy.
module "cluster-1-nodepool-containerd" {
source = "./fabric/modules/gke-nodepool"
project_id = "myproject"
cluster_name = "cluster-1"
location = "europe-west4-a"
name = "nodepool-containerd"
node_config = {
machine_type = "n2-standard-4"
containerd_config = {
registry_hosts = {
"registry.example.com" = {
hosts = {
"mirror.example.com" = {}
}
}
}
}
}
}
# tftest modules=1 resources=1 inventory=containerd-config.yaml
Variables
| name | description | type | required | default |
|---|---|---|---|---|
| cluster_name | Cluster name. | string |
✓ | |
| location | Cluster location. | string |
✓ | |
| project_id | Cluster project id. | string |
✓ | |
| cluster_id | Cluster id. Optional, but providing cluster_id is recommended to prevent cluster misconfiguration in some of the edge cases. | string |
null |
|
| gke_version | Kubernetes nodes version. Ignored if auto_upgrade is set in management_config. | string |
null |
|
| k8s_labels | Kubernetes labels applied to each node. | map(string) |
{} |
|
| labels | The resource labels to be applied each node (vm). | map(string) |
{} |
|
| max_pods_per_node | Maximum number of pods per node. | number |
null |
|
| name | Optional nodepool name. | string |
null |
|
| network_config | Network configuration. | object({…}) |
null |
|
| node_config | Node-level configuration. | object({…}) |
{} |
|
| node_count | Number of nodes per instance group. Initial value can only be changed by recreation, current is ignored when autoscaling is used. | object({…}) |
{…} |
|
| node_locations | Node locations. | list(string) |
null |
|
| nodepool_config | Nodepool-level configuration. | object({…}) |
null |
|
| reservation_affinity | Configuration of the desired reservation which instances could take capacity from. | object({…}) |
null |
|
| resource_manager_tags | A map of resource manager tag keys and values to be attached to the nodes for managing Compute Engine firewalls using Network Firewall Policies. | map(string) |
null |
|
| service_account | Nodepool service account. If this variable is set to null, the default GCE service account will be used. If set and email is null, a service account will be created. If scopes are null a default will be used. | object({…}) |
{} |
|
| sole_tenant_nodegroup | Sole tenant node group. | string |
null |
|
| tags | Network tags applied to nodes. | list(string) |
null |
|
| taints | Kubernetes taints applied to all nodes. | map(object({…})) |
{} |
Outputs
| name | description | sensitive |
|---|---|---|
| id | Fully qualified nodepool id. | |
| name | Nodepool name. | |
| service_account_email | Service account email. | |
| service_account_iam_email | Service account email. |