Fix readme.

2023-03-04 08:09:29 +01:00
parent 6526dda8c7
commit 98e17bb997
4 changed files with 59 additions and 46 deletions
--- a/blueprints/data-solutions/README.md
+++ b/blueprints/data-solutions/README.md
@@ -69,3 +69,10 @@ This [blueprint](./vertex-mlops/) implements the infrastructure required to have
 This [blueprint](./shielded-folder/) implements an opinionated folder configuration according to GCP best practices. Configurations implemented on the folder would be beneficial to host workloads inheriting constraints from the folder they belong to.

 <br clear="left">
+
+### BigQuery ML and Vertex Pipeline
+
+<a href="./bq-ml/" title="BigQuery ML and Vertex Pipeline"><img src="./bq-ml/images/diagram.png" align="left" width="280px"></a>
+This [blueprint](./bq-ml/) implements the infrastructure required to have a fully functional develpement environment using BigQuery ML and Vertex AI to develop and deploy a machine learning model to be used from Vertex AI endpoint or in BigQuery ML.
+
+<br clear="left">
--- a/blueprints/data-solutions/bq-ml/README.md
+++ b/blueprints/data-solutions/bq-ml/README.md
@@ -1,14 +1,14 @@
-# BQ ML and Vertex Pipeline
+# BigQuery ML and Vertex Pipeline

-This blueprint creates the infrastructure needed to deploy and run a Vertex AI environment to develop and deploy a machine learning model to be used from Vertex AI an endpoint or in BigQuery.
+This blueprint creates the infrastructure needed to deploy and run a Vertex AI environment to develop and deploy a machine learning model to be used from Vertex AI endpoint or in BigQuery.

-This is the high level diagram:
+This is the high-level diagram:

 ![High-level diagram](diagram.png "High-level diagram")

-It also includes the IAM wiring needed to make such scenarios work. Regional resources are used in this example, but the same logic will apply for 'dual regional', 'multi regional' or 'global' resources.
+It also includes the IAM wiring needed to make such scenarios work. Regional resources are used in this example, but the same logic applies to 'dual regional', 'multi regional', or 'global' resources.

-The example is designed to match real-world use cases with a minimum amount of resources, and be used as a starting point for your scenario.
+The example is designed to match real-world use cases with a minimum amount of resources and be used as a starting point for your scenario.

 ## Managed resources and services

@@ -30,15 +30,21 @@ This sample creates several distinct groups of resources:

 ### Virtual Private Cloud (VPC) design

-As is often the case in real-world configurations, this blueprint accepts as input an existing Shared-VPC via the `network_config` variable.
+As is often the case in real-world configurations, this blueprint accepts an existing Shared-VPC via the `network_config` variable as input.

 ### Customer Managed Encryption Key

-As is often the case in real-world configurations, this blueprint accepts as input  existing Cloud KMS keys to encrypt resources via the `service_encryption_keys` variable.
+As is often the case in real-world configurations, this blueprint accepts as input existing Cloud KMS keys to encrypt resources via the `service_encryption_keys` variable.

 ## Demo

-In the repository `demo` folder you can find an example on how to create a Vertex AI pipeline from a publically available dataset and deploy the model to be used from a Vertex AI managed endpoint or from within Bigquery.
+In the repository [`demo`](./demo/) folder, you can find an example of creating a Vertex AI pipeline from a publically available dataset and deploying the model to be used from a Vertex AI managed endpoint or from within Bigquery.
+
+To run the demo:
+
+- Connect to the Vertex AI workbench instance
+- Clone this repository
+- Run the and run [`demo/bmql_pipeline.ipynb`](demo/bmql_pipeline.ipynb) Jupyter Notebook.

 <!-- BEGIN TFDOC -->

@@ -46,27 +52,27 @@ In the repository `demo` folder you can find an example on how to create a Verte

 | name | description | type | required | default |
 |---|---|:---:|:---:|:---:|
-| [prefix](variables.tf#L32) | Prefix used for resource names. | <code>string</code> | ✓ |  |
-| [project_id](variables.tf#L50) | Project id, references existing project if `project_create` is null. | <code>string</code> | ✓ |  |
-| [location](variables.tf#L16) | The location where resources will be deployed. | <code>string</code> |  | <code>&#34;EU&#34;</code> |
-| [network_config](variables.tf#L22) | Shared VPC network configurations to use. If null networks will be created in projects with preconfigured values. | <code title="object&#40;&#123;&#10;  host_project      &#61; string&#10;  network_self_link &#61; string&#10;  subnet_self_link  &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> |  | <code>null</code> |
-| [project_create](variables.tf#L41) | Provide values if project creation is needed, uses existing project if null. Parent format:  folders/folder_id or organizations/org_id. | <code title="object&#40;&#123;&#10;  billing_account_id &#61; string&#10;  parent             &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> |  | <code>null</code> |
-| [region](variables.tf#L55) | The region where resources will be deployed. | <code>string</code> |  | <code>&#34;europe-west1&#34;</code> |
+| [prefix](variables.tf#L33) | Prefix used for resource names. | <code>string</code> | ✓ |  |
+| [project_id](variables.tf#L51) | Project id references existing project if `project_create` is null. | <code>string</code> | ✓ |  |
+| [location](variables.tf#L17) | The location where resources will be deployed. | <code>string</code> |  | <code>&#34;US&#34;</code> |
+| [network_config](variables.tf#L23) | Shared VPC network configurations to use. If null networks will be created in projects with pre-configured values. | <code title="object&#40;&#123;&#10;  host_project      &#61; string&#10;  network_self_link &#61; string&#10;  subnet_self_link  &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> |  | <code>null</code> |
+| [project_create](variables.tf#L42) | Provide values if project creation is needed, use existing project if null. Parent format:  folders/folder_id or organizations/org_id. | <code title="object&#40;&#123;&#10;  billing_account_id &#61; string&#10;  parent             &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> |  | <code>null</code> |
+| [region](variables.tf#L56) | The region where resources will be deployed. | <code>string</code> |  | <code>&#34;us-central1&#34;</code> |
+| [service_encryption_keys](variables.tf#L62) | Cloud KMS to use to encrypt different services. The key location should match the service region. | <code title="object&#40;&#123;&#10;  bq      &#61; string&#10;  compute &#61; string&#10;  storage &#61; string&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> |  | <code>null</code> |

 ## Outputs

 | name | description | sensitive |
 |---|---|:---:|
-| [bucket](outputs.tf#L15) | GCS Bucket URL. |  |
-| [dataset](outputs.tf#L20) | GCS Bucket URL. |  |
-| [notebook](outputs.tf#L25) | Vertex AI notebook details. |  |
-| [project](outputs.tf#L33) | Project id. |  |
-| [service-account-vertex](outputs.tf#L43) | Service account to be used for Vertex AI pipelines |  |
-| [vertex-ai-metadata-store](outputs.tf#L48) |  |  |
-| [vpc](outputs.tf#L38) | VPC Network. |  |
+| [bucket](outputs.tf#L17) | GCS Bucket URL. |  |
+| [dataset](outputs.tf#L22) | GCS Bucket URL. |  |
+| [notebook](outputs.tf#L27) | Vertex AI notebook details. |  |
+| [project](outputs.tf#L35) | Project id. |  |
+| [service-account-vertex](outputs.tf#L45) | Service account to be used for Vertex AI pipelines |  |
+| [vertex-ai-metadata-store](outputs.tf#L50) |  |  |
+| [vpc](outputs.tf#L40) | VPC Network. |  |

 <!-- END TFDOC -->
-
 ## Test

 ```hcl
--- a/blueprints/data-solutions/bq-ml/demo/bmql_pipeline.ipynb
+++ b/blueprints/data-solutions/bq-ml/demo/bmql_pipeline.ipynb
@@ -53,18 +53,18 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Vertex Pipeline Definition\n",
+    "# Vertex AI Pipeline Definition\n",
    "\n",
-    "In the following code block we are defining our Vertex AI pipeline. It is made up of three main steps:\n",
-    "1. Create a BigQuery dataset which will contains the BQ ML models\n",
-    "2. Train the BQ ML model, in this case a logistic regression\n",
-    "3. Evaluate the BQ ML model with the standard evaluation metrics\n",
+    "In the following code block, we are defining our Vertex AI pipeline. It is made up of three main steps:\n",
+    "1. Create a BigQuery dataset that will contain the BigQuery ML models\n",
+    "2. Train the BigQuery ML model, in this case, a logistic regression\n",
+    "3. Evaluate the BigQuery ML model with the standard evaluation metrics\n",
    "\n",
    "The pipeline takes as input the following variables:\n",
-    "- ```model_name```: the display name of the BQ ML model\n",
-    "- ```split_fraction```: the percentage of data that will be used as evaluation dataset\n",
-    "- ```evaluate_job_conf```: bq dict configuration to define where to store evalution metrics\n",
-    "- ```dataset```: name of dataset where the artifacts will be stored\n",
+    "- ```model_name```: the display name of the BigQuery ML model\n",
+    "- ```split_fraction```: the percentage of data that will be used as an evaluation dataset\n",
+    "- ```evaluate_job_conf```: bq dict configuration to define where to store evaluation metrics\n",
+    "- ```dataset```: name of the dataset where the artifacts will be stored\n",
    "- ```project_id```: the project id where the GCP resources will be created\n",
    "- ```location```: BigQuery location"
   ]
@@ -136,7 +136,7 @@
   "source": [
    "# Create Experiment\n",
    "\n",
-    "We will create an experiment in order to keep track of our trainings and tasks on a specific issue or problem."
+    "We will create an experiment to keep track of our training and tasks on a specific issue or problem."
   ]
  },
  {
@@ -158,11 +158,11 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Running the same training pipeline with different parameters\n",
+    "# Running the same training Verte AI pipeline with different parameters\n",
    "\n",
-    "One of the main tasks during the training phase is to compare different models or to try the same model with different inputs. We can leverage the power of Vertex Pipelines in order to submit the same steps with different training parameters. Thanks to the experiments artifact it is possible to easily keep track of all the tests that have been done. This simplifies the process to select the best model to deploy.\n",
+    "One of the main tasks during the training phase is to compare different models or to try the same model with different inputs. We can leverage the power of Vertex AI Pipelines to submit the same steps with different training parameters. Thanks to the experiments artifact, it is possible to easily keep track of all the tests that have been done. This simplifies the process of selecting the best model to deploy.\n",
    "\n",
-    "In this demo case, we will run the same training pipeline while changing the data split percentage between training and test data."
+    "In this demo case, we will run the same training pipeline while changing the split data percentage between training and test data."
   ]
  },
  {
@@ -193,9 +193,9 @@
   "cell_type": "markdown",
   "metadata": {},
   "source": [
-    "# Deploy the model to an endpoint\n",
+    "# Deploy the model on a Vertex AI endpoint\n",
    "\n",
-    "Thanks to the integration of Vertex Endpoint, it is very straightforward to create a live endpoint to serve the model which we prefer."
+    "Thanks to the integration of Vertex AI Endpoint, creating a live endpoint to serve the model we prefer is very straightforward."
   ]
  },
  {
@@ -221,7 +221,7 @@
   "metadata": {},
   "outputs": [],
   "source": [
-    "# deploy the BQ ML model on Vertex Endpoint\n",
+    "# deploy the BigQuery ML model on Vertex Endpoint\n",
    "# have a coffe - this step can take up 10/15 minutes to finish\n",
    "model.deploy(endpoint=endpoint, deployed_model_display_name='bqml-deployed-model')"
   ]
@@ -265,7 +265,7 @@
  },
  "language_info": {
   "name": "python",
-   "version": "3.10.9"
+   "version": "3.8.9"
  },
  "orig_nbformat": 4,
  "vscode": {
--- a/blueprints/data-solutions/bq-ml/variables.tf
+++ b/blueprints/data-solutions/bq-ml/variables.tf
@@ -17,11 +17,11 @@
 variable "location" {
  description = "The location where resources will be deployed."
  type        = string
-  default     = "EU"
+  default     = "US"
 }

 variable "network_config" {
-  description = "Shared VPC network configurations to use. If null networks will be created in projects with preconfigured values."
+  description = "Shared VPC network configurations to use. If null networks will be created in projects with pre-configured values."
  type = object({
    host_project      = string
    network_self_link = string
@@ -40,7 +40,7 @@ variable "prefix" {
 }

 variable "project_create" {
-  description = "Provide values if project creation is needed, uses existing project if null. Parent format:  folders/folder_id or organizations/org_id."
+  description = "Provide values if project creation is needed, use existing project if null. Parent format:  folders/folder_id or organizations/org_id."
  type = object({
    billing_account_id = string
    parent             = string
@@ -49,18 +49,18 @@ variable "project_create" {
 }

 variable "project_id" {
-  description = "Project id, references existing project if `project_create` is null."
+  description = "Project id references existing project if `project_create` is null."
  type        = string
 }

 variable "region" {
  description = "The region where resources will be deployed."
  type        = string
-  default     = "europe-west1"
+  default     = "us-central1"
 }

-variable "service_encryption_keys" { # service encription key
-  description = "Cloud KMS to use to encrypt different services. Key location should match service region."
+variable "service_encryption_keys" {
+  description = "Cloud KMS to use to encrypt different services. The key location should match the service region."
  type = object({
    bq      = string
    compute = string