* var definitions * skeleton, untested * fix errors, test with existing cluster * test vpc creation, todo notes * initial variables for AR and image * initial variables for AR and image * Add support for remote repositories to artifact-registry * Add support for virtual repositories to artifact-registry * Add support for extra config options to artifact-registry * artifact registry module: add validation and precondition, fix tests * ar module id/name * registry * service accoutn and roles * fetch pods, remove image prefix * small changes * use additive IAM at project level * use additive IAM at project level * configmaps * manifests * fix statefulset manifest * service manifest * fix configmap mode * add todo * job (broken) * job * wait on manifest, endpoints datasource * fix job * Fix local * sa * Update README.md * Restructure gke bp * refactor tree and infra variables * no create test * simplify cluster SA * test cluster and vpc creation * project creation fixes * use iam_members variable * nits * readme with examples * readme with examples * outputs * variables, provider configuration * variables, manifests * start cluster job * fix redis cluster creation Co-authored-by: Julio Castillo <juliocc@users.noreply.github.com> * Revert changes in autopilot cluster * Default templates path, use namespace for node names * Update readmes * Fix IAM bindings * Make STABLE the default release channel * Use Cloud DNS as default DNS provider * Allow optional Cloud NAT creation * Allow backup agent and proxy only subnet * Work around terraform not short-circuiting logical operators * Rename create variables to be more consistent with other blueprints * Add basic features * Update variable names * Initial kafka JS * Move providers to a new file * Kafka / Strimzi * First possibily working version for MySQL (with a lot of todo's left) * Explicitly use proxy repo + some other fixes * Strimzi draft * Refactor variables, use CluterIP as pointer for mysql-router for bootstraping * Validate number of replicas, autoscale required number of running nodes to n/2+1 * Use seaprate service for bootstrap, do not recreate all resources on change of replicas count as the config is preserved in PV * Test dual chart kafka * Update chart for kafka * Expose basic kafka configuration options * Remove unused manifest * Added batch blueprint * Added README * switch to kubectl_manifest * Add README and support for static IP address * Move namespace creation to helm * Interpolate kafka variables * Rename kafka-strimzi to kafka * Added TUTORIAL for cloudshell for batch blueprint * deleted tutorial * Remove commented replace trigger * Move to helm chart * WIP of Cloud Shell tutorial for MySQL * Rename folders * Fix rename * Update paths * Unify styles * Update paths * Add Readme links * Update mysql tutorial * Fix path according to self-link * Use relative path to cwd * Fix service_account variable location * Fix tfvars creation * Restore some fixes for helm deployment * Add cluster deletion_prevention * Fixes for tutorial * Update cluster docs * Fixes to batch tutorial * Bare bones readme for batch * Update batch readme * README fixes * Fix README title for redis * Fix Typos * Make it easy to pass variables from autopilot-cluster to other modules * Add connectivity test and bastion host * updates to readme, and gpu fix * Add versions.tf and README updates * Fix typo * Kafka and Redis README updates * Update versions.tf * Fixes * Add boilerplate * Fix linting * Move mysql to separate branch * Update cloud shell links * Fix broken link --------- Co-authored-by: Ludo <ludomagno@google.com> Co-authored-by: Daniel Marzini <44803752+danielmarzini@users.noreply.github.com> Co-authored-by: Wiktor Niesiobędzki <wiktorn@google.com> Co-authored-by: Miren Esnaola <mirene@google.com>
3.8 KiB
Batch Processing on GKE with Kueue
Introduction
This blueprint shows how to deploy a batch system using Kueue to perform job queuing on Google Kubernetes Engine (GKE) using Terraform.
Kueue is a Cloud Native Job scheduler that works with the default Kubernetes scheduler, the Job controller, and the cluster autoscaler to provide an end-to-end batch system. Kueue implements Job queueing, deciding when Jobs should wait and when they should start, based on quotas and a hierarchy for sharing resources fairly among teams.
Requirements
This blueprint assumes the GKE cluster already exists. We recommend using the accompanying Autopilot Cluster Pattern to deploy a cluster according to Google's best practices. Once you have the cluster up-and-running, you can use this blueprint to deploy Kueue in it.
The Kueue manifests use container images hosted by registry.k8s.io, which means that the subnet where the GKE cluster is deployed needs to have Internet connectivity to download the images. If you're using the provided Autopilot Cluster Pattern, you can set the enable_cloud_nat option of the vpc_create variable.
Cluster authentication
Once you have a cluster with Internet connectivity, create a terraform.tfvars and setup the credentials_config variable. We recommend using Anthos Fleet to simplify accessing the control plane.
Kueue Configuration
Only two variables are available to control Kueue's configuration:
teams_namespaceswhich controls the namespaces used by different teams to run jobs.kueue_namespacewhich controls the namepsace to deploy Kueue's own resources.
Any other configuration can be applied by directly modifying the YAML manifests under the manifest-templates directory.
Sample Configuration
The following template as a starting point for your terraform.tfvars
credentials_config = {
kubeconfig = {
path = "~/.kube/config"
}
}
teams_namespaces = [
"team-a",
"team-b"
]
Variables
| name | description | type | required | default |
|---|---|---|---|---|
| credentials_config | Configure how Terraform authenticates to the cluster. | object({…}) |
✓ | |
| kueue_namespace | Namespaces of the teams running jobs in the clusters. | string |
"kueue-system" |
|
| team_namespaces | Namespaces of the teams running jobs in the clusters. | list(string) |
[…] |
|
| templates_path | Path where manifest templates will be read from. Set to null to use the default manifests. | string |
null |