Add support for bundling net monitoring tool in a Docker image, and deploying via CR Job (#2609)

* dockerfile and reqs update

* deployment via cloud run jobs

* README

* boilerplate
This commit is contained in:
Ludovico Magnocavallo
2024-10-07 14:56:09 +02:00
committed by GitHub
parent bbe84a5ca8
commit 74427386b9
14 changed files with 668 additions and 12 deletions

View File

@@ -0,0 +1,23 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
FROM python:3-slim-bookworm
COPY src /app/
RUN pip install -r /app/requirements.txt
RUN chmod 755 /app/main.py
WORKDIR /app
ENTRYPOINT ["./main.py"]

View File

@@ -8,13 +8,13 @@ The tool tracks several distinct usage types across a variety of resources: proj
The screenshot below is an example of a simple dashboard provided with this blueprint, showing utilization for a specific metric (number of instances per VPC) for multiple VPCs and projects:
<img src="metric.png" width="640px">
<img src="metric.png" width="640px" alt="diagram">
One other example is the IP utilization information per subnet, allowing you to monitor the percentage of used IP addresses in your GCP subnets.
More complex scenarios are possible by leveraging and combining the 50 different timeseries created by this tool, and connecting them to Cloud Operations dashboards and alerts.
Refer to the [Cloud Function deployment instructions](./deploy-cloud-function/) for a high level overview and an end-to-end deployment example, and to the[discovery tool documentation](./src/) to try it as a standalone program or to package it in alternative ways.
Refer to the [Cloud Function](./deploy-cloud-function/) or [Cloud Run Job](./deploy-cloudrun-job/) instructions for a high level overview and end-to-end deployment examples, and to the[discovery tool documentation](./src/) to try it as a standalone program or to package it in alternative ways.
## Metrics created

View File

@@ -0,0 +1,59 @@
# Network Quota Monitoring via Cloud Run Job
This simple Terraform setup allows deploying the [discovery tool for the Network Dashboard](../src/) to a Cloud Run Job triggered by Cloud Scheduler.
For service configuration refer to the [Cloud Function deployment](../deploy-cloud-function/) as the underlying monitoring scraper is the same.
## Creating and uploading the Docker container
To build the container run `docker build` in the parent folder, then tag and push it to the URL printed in outputs.
## Example configuration
This is an example of a working configuration, where the discovery root is set at the org level, but resources used to compute timeseries need to be part of the hierarchy of two specific folders:
```tfvars
discovery_config = {
discovery_root = "organizations/1234567890"
monitored_folders = ["3456789012", "7890123456"]
}
grant_discovery_iam_roles = true
project_create_config = {
billing_account_id = "12345-ABCDEF-12345"
parent_id = "folders/2345678901"
}
project_id = "my-project"
# tftest modules=5 resources=27
```
## Monitoring dashboard
A monitoring dashboard can be optionally be deployed int the same project by setting the `dashboard_json_path` variable to the path of a dashboard JSON file. A sample dashboard is in included, and can be deployed with this variable configuration:
```tfvars
dashboard_json_path = "../dashboards/quotas-utilization.json"
```
<!-- BEGIN TFDOC -->
## Variables
| name | description | type | required | default |
|---|---|:---:|:---:|:---:|
| [discovery_config](variables.tf#L23) | Discovery configuration. Discovery root is the organization or a folder. If monitored folders and projects are empty, every project under the discovery root node will be monitored. | <code title="object&#40;&#123;&#10; discovery_root &#61; string&#10; monitored_folders &#61; optional&#40;list&#40;string&#41;, &#91;&#93;&#41;&#10; monitored_projects &#61; optional&#40;list&#40;string&#41;, &#91;&#93;&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | ✓ | |
| [project_id](variables.tf#L69) | Project id where the tool will be deployed. | <code>string</code> | ✓ | |
| [dashboard_json_path](variables.tf#L17) | Optional monitoring dashboard to deploy. | <code>string</code> | | <code>null</code> |
| [grant_discovery_iam_roles](variables.tf#L41) | Optionally grant required IAM roles to the monitoring tool service account. | <code>bool</code> | | <code>false</code> |
| [monitoring_project](variables.tf#L48) | Project where generated metrics will be written. Default is to use the same project where the Cloud Function is deployed. | <code>string</code> | | <code>null</code> |
| [name](variables.tf#L54) | Name used to create resources. | <code>string</code> | | <code>&#34;netmon&#34;</code> |
| [project_create_config](variables.tf#L60) | Optional configuration if project creation is required. | <code title="object&#40;&#123;&#10; billing_account_id &#61; string&#10; parent_id &#61; optional&#40;string&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>null</code> |
| [region](variables.tf#L74) | Compute region where Cloud Run will be deployed. | <code>string</code> | | <code>&#34;europe-west1&#34;</code> |
| [schedule_config](variables.tf#L80) | Scheduler configuration. Region is only used if different from the one used for Cloud Run. | <code title="object&#40;&#123;&#10; crontab &#61; optional&#40;string, &#34;&#42;&#47;30 &#42; &#42; &#42; &#42;&#34;&#41;&#10; region &#61; optional&#40;string&#41;&#10;&#125;&#41;">object&#40;&#123;&#8230;&#125;&#41;</code> | | <code>&#123;&#125;</code> |
## Outputs
| name | description | sensitive |
|---|---|:---:|
| [docker_tag](outputs.tf#L17) | Docker tag for the container image. | |
| [project_id](outputs.tf#L22) | Project id. | |
| [service_account](outputs.tf#L27) | Cloud Run Job service account. | |
<!-- END TFDOC -->

View File

@@ -0,0 +1,157 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
# TODO: support custom quota file
locals {
discovery_roles = ["roles/compute.viewer", "roles/cloudasset.viewer"]
}
module "project" {
source = "../../../../modules/project"
name = var.project_id
billing_account = try(var.project_create_config.billing_account_id, null)
parent = try(var.project_create_config.parent_id, null)
project_create = var.project_create_config != null
services = [
"artifactregistry.googleapis.com",
"cloudasset.googleapis.com",
"cloudscheduler.googleapis.com",
"compute.googleapis.com",
"monitoring.googleapis.com",
"run.googleapis.com"
]
}
module "ar" {
source = "../../../../modules/artifact-registry"
project_id = module.project.project_id
location = var.region
name = var.name
format = { docker = { standard = {} } }
}
module "sa" {
source = "../../../../modules/iam-service-account"
project_id = module.project.project_id
name = var.name
display_name = "Net monitoring service."
iam_project_roles = {
(module.project.project_id) = [
"roles/monitoring.metricWriter"
]
}
}
module "sa-invoker" {
source = "../../../../modules/iam-service-account"
project_id = module.project.project_id
name = "${var.name}-invoker"
display_name = "Net monitoring service invoker."
}
module "cr-job" {
source = "../../../../modules/cloud-run-v2"
project_id = module.project.project_id
name = var.name
region = var.region
create_job = true
containers = {
netmon = {
image = "${module.ar.url}/${var.name}"
args = concat(
[
"-dr",
var.discovery_config.discovery_root,
"-mon",
coalesce(var.monitoring_project, module.project.project_id)
],
flatten([
for f in var.discovery_config.monitored_folders : [
"-f", f
]
]),
flatten([
for f in var.discovery_config.monitored_projects : [
"-p", f
]
])
)
}
}
iam = {
"roles/run.invoker" = [
module.sa-invoker.iam_email
]
}
revision = {
job = {
max_retries = 0
}
}
service_account = module.sa.email
deletion_protection = false
}
resource "google_cloud_scheduler_job" "job" {
name = var.name
description = "Schedule net monitor job."
schedule = var.schedule_config.crontab
time_zone = "UTC"
attempt_deadline = "320s"
region = coalesce(var.schedule_config.region, var.region)
project = module.project.project_id
retry_config {
retry_count = 1
}
http_target {
http_method = "POST"
uri = "https://${var.region}-run.googleapis.com/apis/run.googleapis.com/v1/namespaces/${module.project.number}/jobs/${var.name}:run"
oauth_token {
service_account_email = module.sa-invoker.email
}
}
}
resource "google_organization_iam_member" "discovery" {
for_each = toset(
var.grant_discovery_iam_roles &&
startswith(var.discovery_config.discovery_root, "organizations/")
? local.discovery_roles
: []
)
org_id = split("/", var.discovery_config.discovery_root)[1]
role = each.key
member = module.sa.iam_email
}
resource "google_folder_iam_member" "discovery" {
for_each = toset(
var.grant_discovery_iam_roles &&
startswith(var.discovery_config.discovery_root, "folders/")
? local.discovery_roles
: []
)
folder = var.discovery_config.discovery_root
role = each.key
member = module.sa.iam_email
}
resource "google_monitoring_dashboard" "dashboard" {
count = var.dashboard_json_path == null ? 0 : 1
project = var.project_id
dashboard_json = file(var.dashboard_json_path)
}

View File

@@ -0,0 +1,30 @@
/**
* Copyright 2024 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
output "docker_tag" {
description = "Docker tag for the container image."
value = "${module.ar.url}/${var.name}"
}
output "project_id" {
description = "Project id."
value = module.project.project_id
}
output "service_account" {
description = "Cloud Run Job service account."
value = module.sa.email
}

View File

@@ -0,0 +1,87 @@
/**
* Copyright 2022 Google LLC
*
* Licensed under the Apache License, Version 2.0 (the "License");
* you may not use this file except in compliance with the License.
* You may obtain a copy of the License at
*
* http://www.apache.org/licenses/LICENSE-2.0
*
* Unless required by applicable law or agreed to in writing, software
* distributed under the License is distributed on an "AS IS" BASIS,
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
* See the License for the specific language governing permissions and
* limitations under the License.
*/
variable "dashboard_json_path" {
description = "Optional monitoring dashboard to deploy."
type = string
default = null
}
variable "discovery_config" {
description = "Discovery configuration. Discovery root is the organization or a folder. If monitored folders and projects are empty, every project under the discovery root node will be monitored."
type = object({
discovery_root = string
monitored_folders = optional(list(string), [])
monitored_projects = optional(list(string), [])
# custom_quota_file = optional(string)
})
nullable = false
validation {
condition = (
var.discovery_config.monitored_folders != null &&
var.discovery_config.monitored_projects != null
)
error_message = "Monitored folders and projects can be empty lists, but they cannot be null."
}
}
variable "grant_discovery_iam_roles" {
description = "Optionally grant required IAM roles to the monitoring tool service account."
type = bool
default = false
nullable = false
}
variable "monitoring_project" {
description = "Project where generated metrics will be written. Default is to use the same project where the Cloud Function is deployed."
type = string
default = null
}
variable "name" {
description = "Name used to create resources."
type = string
default = "netmon"
}
variable "project_create_config" {
description = "Optional configuration if project creation is required."
type = object({
billing_account_id = string
parent_id = optional(string)
})
default = null
}
variable "project_id" {
description = "Project id where the tool will be deployed."
type = string
}
variable "region" {
description = "Compute region where Cloud Run will be deployed."
type = string
default = "europe-west1"
}
variable "schedule_config" {
description = "Scheduler configuration. Region is only used if different from the one used for Cloud Run."
type = object({
crontab = optional(string, "*/30 * * * *")
region = optional(string)
})
default = {}
}

View File

@@ -17,6 +17,7 @@
import base64
import binascii
import collections
import functools
import json
import logging
import os
@@ -29,13 +30,17 @@ import yaml
from google.auth.transport.requests import AuthorizedSession
HTTP = AuthorizedSession(google.auth.default()[0])
LOGGER = logging.getLogger('net-dash')
MONITORING_ROOT = 'netmon/'
Result = collections.namedtuple('Result', 'phase resource data')
@functools.cache
def _http():
return AuthorizedSession(google.auth.default()[0])
def do_discovery(resources):
'''Calls discovery plugin functions and collect discovered resources.
@@ -198,10 +203,10 @@ def fetch(request):
LOGGER.debug(f'fetch {"POST" if request.data else "GET"} {request.url}')
try:
if not request.data:
response = HTTP.get(request.url, headers=request.headers)
response = _http().get(request.url, headers=request.headers)
else:
response = HTTP.post(request.url, headers=request.headers,
data=request.data)
response = _http().post(request.url, headers=request.headers,
data=request.data)
except google.auth.exceptions.RefreshError as e:
raise SystemExit(e.args[0])
if response.status_code != 200:

View File

@@ -1,4 +1,4 @@
click==8.1.3
google-auth==2.14.1
PyYAML==6.0
requests==2.32.0
click>=8.1.3
google-auth>=2.14.1
PyYAML>=6.0
requests>=2.32.0

View File

@@ -1,4 +1,4 @@
# Copyright 2023 Google LLC
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.

View File

@@ -0,0 +1,13 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -0,0 +1,13 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -0,0 +1,13 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

View File

@@ -0,0 +1,256 @@
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
values:
google_cloud_scheduler_job.job:
app_engine_http_target: []
attempt_deadline: 320s
description: Schedule net monitor job.
http_target:
- body: null
headers: null
http_method: POST
oauth_token:
- scope: null
service_account_email: netmon-invoker@my-project.iam.gserviceaccount.com
oidc_token: []
name: netmon
project: my-project
pubsub_target: []
region: europe-west1
retry_config:
- retry_count: 1
schedule: '*/30 * * * *'
time_zone: UTC
timeouts: null
google_organization_iam_member.discovery["roles/cloudasset.viewer"]:
condition: []
member: serviceAccount:netmon@my-project.iam.gserviceaccount.com
org_id: '1234567890'
role: roles/cloudasset.viewer
google_organization_iam_member.discovery["roles/compute.viewer"]:
condition: []
member: serviceAccount:netmon@my-project.iam.gserviceaccount.com
org_id: '1234567890'
role: roles/compute.viewer
module.ar.google_artifact_registry_repository.registry:
cleanup_policies: []
cleanup_policy_dry_run: null
description: Terraform-managed registry
docker_config: []
effective_labels:
goog-terraform-provisioned: 'true'
format: DOCKER
kms_key_name: null
labels: null
location: europe-west1
maven_config: []
mode: STANDARD_REPOSITORY
project: my-project
remote_repository_config: []
repository_id: netmon
terraform_labels:
goog-terraform-provisioned: 'true'
timeouts: null
virtual_repository_config: []
module.cr-job.google_cloud_run_v2_job.job[0]:
annotations: null
binary_authorization: []
client: null
client_version: null
deletion_protection: false
effective_labels:
goog-terraform-provisioned: 'true'
labels: null
location: europe-west1
name: netmon
project: my-project
run_execution_token: null
start_execution_token: null
template:
- annotations: null
labels: null
template:
- containers:
- args:
- -dr
- organizations/1234567890
- -mon
- my-project
- -f
- '3456789012'
- -f
- '7890123456'
command: null
env: []
image: europe-west1-docker.pkg.dev/my-project/netmon/netmon
name: netmon
ports: []
volume_mounts: []
working_dir: null
encryption_key: null
max_retries: 0
service_account: netmon@my-project.iam.gserviceaccount.com
volumes: []
vpc_access: []
terraform_labels:
goog-terraform-provisioned: 'true'
timeouts: null
module.cr-job.google_cloud_run_v2_job_iam_binding.binding["roles/run.invoker"]:
condition: []
location: europe-west1
members:
- serviceAccount:netmon-invoker@my-project.iam.gserviceaccount.com
name: netmon
project: my-project
role: roles/run.invoker
module.project.google_project.project[0]:
auto_create_network: false
billing_account: 12345-ABCDEF-12345
deletion_policy: DELETE
effective_labels:
goog-terraform-provisioned: 'true'
folder_id: '2345678901'
labels: null
name: my-project
org_id: null
project_id: my-project
tags: null
terraform_labels:
goog-terraform-provisioned: 'true'
timeouts: null
module.project.google_project_iam_member.service_agents["artifactregistry"]:
condition: []
project: my-project
role: roles/artifactregistry.serviceAgent
module.project.google_project_iam_member.service_agents["cloudasset"]:
condition: []
project: my-project
role: roles/cloudasset.serviceAgent
module.project.google_project_iam_member.service_agents["cloudscheduler"]:
condition: []
project: my-project
role: roles/cloudscheduler.serviceAgent
module.project.google_project_iam_member.service_agents["compute-system"]:
condition: []
project: my-project
role: roles/compute.serviceAgent
module.project.google_project_iam_member.service_agents["monitoring-notification"]:
condition: []
project: my-project
role: roles/monitoring.notificationServiceAgent
module.project.google_project_iam_member.service_agents["serverless-robot-prod"]:
condition: []
project: my-project
role: roles/run.serviceAgent
module.project.google_project_service.project_services["artifactregistry.googleapis.com"]:
disable_dependent_services: false
disable_on_destroy: false
project: my-project
service: artifactregistry.googleapis.com
timeouts: null
module.project.google_project_service.project_services["cloudasset.googleapis.com"]:
disable_dependent_services: false
disable_on_destroy: false
project: my-project
service: cloudasset.googleapis.com
timeouts: null
module.project.google_project_service.project_services["cloudscheduler.googleapis.com"]:
disable_dependent_services: false
disable_on_destroy: false
project: my-project
service: cloudscheduler.googleapis.com
timeouts: null
module.project.google_project_service.project_services["compute.googleapis.com"]:
disable_dependent_services: false
disable_on_destroy: false
project: my-project
service: compute.googleapis.com
timeouts: null
module.project.google_project_service.project_services["monitoring.googleapis.com"]:
disable_dependent_services: false
disable_on_destroy: false
project: my-project
service: monitoring.googleapis.com
timeouts: null
module.project.google_project_service.project_services["run.googleapis.com"]:
disable_dependent_services: false
disable_on_destroy: false
project: my-project
service: run.googleapis.com
timeouts: null
module.project.google_project_service_identity.default["artifactregistry.googleapis.com"]:
project: my-project
service: artifactregistry.googleapis.com
timeouts: null
module.project.google_project_service_identity.default["cloudasset.googleapis.com"]:
project: my-project
service: cloudasset.googleapis.com
timeouts: null
module.project.google_project_service_identity.default["cloudscheduler.googleapis.com"]:
project: my-project
service: cloudscheduler.googleapis.com
timeouts: null
module.project.google_project_service_identity.default["monitoring.googleapis.com"]:
project: my-project
service: monitoring.googleapis.com
timeouts: null
module.project.google_project_service_identity.default["run.googleapis.com"]:
project: my-project
service: run.googleapis.com
timeouts: null
module.sa-invoker.google_service_account.service_account[0]:
account_id: netmon-invoker
create_ignore_already_exists: null
description: null
disabled: false
display_name: Net monitoring service invoker.
project: my-project
timeouts: null
module.sa.google_project_iam_member.project-roles["my-project-roles/monitoring.metricWriter"]:
condition: []
project: my-project
role: roles/monitoring.metricWriter
module.sa.google_service_account.service_account[0]:
account_id: netmon
create_ignore_already_exists: null
description: null
disabled: false
display_name: Net monitoring service.
project: my-project
timeouts: null
counts:
google_artifact_registry_repository: 1
google_cloud_run_v2_job: 1
google_cloud_run_v2_job_iam_binding: 1
google_cloud_scheduler_job: 1
google_organization_iam_member: 2
google_project: 1
google_project_iam_member: 7
google_project_service: 6
google_project_service_identity: 5
google_service_account: 2
modules: 5
resources: 27
outputs:
project_id:
sensitive: false
type: string
value: my-project
service_account:
sensitive: false
type: string
value: netmon@my-project.iam.gserviceaccount.com

View File

@@ -1,4 +1,4 @@
# Copyright 2023 Google LLC
# Copyright 2024 Google LLC
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.