Files
hunfabric/blueprints/data-solutions/bq-ml/README.md
2023-03-04 08:09:29 +01:00

4.3 KiB

BigQuery ML and Vertex Pipeline

This blueprint creates the infrastructure needed to deploy and run a Vertex AI environment to develop and deploy a machine learning model to be used from Vertex AI endpoint or in BigQuery.

This is the high-level diagram:

High-level diagram

It also includes the IAM wiring needed to make such scenarios work. Regional resources are used in this example, but the same logic applies to 'dual regional', 'multi regional', or 'global' resources.

The example is designed to match real-world use cases with a minimum amount of resources and be used as a starting point for your scenario.

Managed resources and services

This sample creates several distinct groups of resources:

  • Networking
    • VPC network
    • Subnet
    • Firewall rules for SSH access via IAP and open communication within the VPC
    • Cloud Nat
  • IAM
    • Vertex AI workbench service account
    • Vertex AI pipeline service account
  • Storage
    • GCS bucket
    • Bigquery dataset

Customization

Virtual Private Cloud (VPC) design

As is often the case in real-world configurations, this blueprint accepts an existing Shared-VPC via the network_config variable as input.

Customer Managed Encryption Key

As is often the case in real-world configurations, this blueprint accepts as input existing Cloud KMS keys to encrypt resources via the service_encryption_keys variable.

Demo

In the repository demo folder, you can find an example of creating a Vertex AI pipeline from a publically available dataset and deploying the model to be used from a Vertex AI managed endpoint or from within Bigquery.

To run the demo:

  • Connect to the Vertex AI workbench instance
  • Clone this repository
  • Run the and run demo/bmql_pipeline.ipynb Jupyter Notebook.

Variables

name description type required default
prefix Prefix used for resource names. string
project_id Project id references existing project if project_create is null. string
location The location where resources will be deployed. string "US"
network_config Shared VPC network configurations to use. If null networks will be created in projects with pre-configured values. object({…}) null
project_create Provide values if project creation is needed, use existing project if null. Parent format: folders/folder_id or organizations/org_id. object({…}) null
region The region where resources will be deployed. string "us-central1"
service_encryption_keys Cloud KMS to use to encrypt different services. The key location should match the service region. object({…}) null

Outputs

name description sensitive
bucket GCS Bucket URL.
dataset GCS Bucket URL.
notebook Vertex AI notebook details.
project Project id.
service-account-vertex Service account to be used for Vertex AI pipelines
vertex-ai-metadata-store
vpc VPC Network.

Test

module "test" {
  source = "./fabric/blueprints/data-solutions/bq-ml/"
  project_create = {
    billing_account_id = "123456-123456-123456"
    parent             = "folders/12345678"
  }
  project_id = "project-1"
  prefix     = "prefix"
}
# tftest modules=9 resources=46