|
|
|
|
@@ -43,7 +43,7 @@ module "dataplex-datascan" {
|
|
|
|
|
|
|
|
|
|
To create an Data Quality scan, provide the `data_quality_spec` input arguments as documented in <https://cloud.google.com/dataplex/docs/reference/rest/v1/DataQualitySpec>.
|
|
|
|
|
|
|
|
|
|
Documentation for the supported rule types and rule specifications can be found in <https://cloud.example.com/dataplex/docs/reference/rest/v1/DataQualityRule>.
|
|
|
|
|
Documentation for the supported rule types and rule specifications can be found in <https://cloud.google.com/dataplex/docs/reference/rest/v1/DataQualityRule>.
|
|
|
|
|
|
|
|
|
|
This example shows how to create a Data Quality scan.
|
|
|
|
|
|
|
|
|
|
@@ -137,6 +137,19 @@ module "dataplex-datascan" {
|
|
|
|
|
table_condition_expectation = {
|
|
|
|
|
sql_expression = "COUNT(*) > 0"
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
dimension = "VALIDITY"
|
|
|
|
|
sql_assertion = {
|
|
|
|
|
sql_statement = <<-EOT
|
|
|
|
|
SELECT
|
|
|
|
|
city_asset_number, council_district
|
|
|
|
|
FROM $${data()}
|
|
|
|
|
WHERE city_asset_number IS NOT NULL
|
|
|
|
|
GROUP BY 1,2
|
|
|
|
|
HAVING COUNT(*) > 1
|
|
|
|
|
EOT
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
]
|
|
|
|
|
}
|
|
|
|
|
@@ -225,6 +238,15 @@ rules:
|
|
|
|
|
- dimension: VALIDITY
|
|
|
|
|
table_condition_expectation:
|
|
|
|
|
sql_expression: COUNT(*) > 0
|
|
|
|
|
- dimension: VALIDITY
|
|
|
|
|
sql_assertion:
|
|
|
|
|
sql_statement: |
|
|
|
|
|
SELECT
|
|
|
|
|
city_asset_number, council_district
|
|
|
|
|
FROM ${data()}
|
|
|
|
|
WHERE city_asset_number IS NOT NULL
|
|
|
|
|
GROUP BY 1,2
|
|
|
|
|
HAVING COUNT(*) > 1
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
While the module only accepts input in snake_case, the YAML file provided to the `data_quality_spec_file` variable can use either camelCase or snake_case. This example below should also produce the same DataScan configuration as the previous examples.
|
|
|
|
|
@@ -308,6 +330,15 @@ rules:
|
|
|
|
|
- dimension: VALIDITY
|
|
|
|
|
tableConditionExpectation:
|
|
|
|
|
sqlExpression: COUNT(*) > 0
|
|
|
|
|
- dimension: VALIDITY
|
|
|
|
|
sqlAssertion:
|
|
|
|
|
sqlStatement: |
|
|
|
|
|
SELECT
|
|
|
|
|
city_asset_number, council_district
|
|
|
|
|
FROM ${data()}
|
|
|
|
|
WHERE city_asset_number IS NOT NULL
|
|
|
|
|
GROUP BY 1,2
|
|
|
|
|
HAVING COUNT(*) > 1
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
## Data Source
|
|
|
|
|
@@ -431,21 +462,21 @@ module "dataplex-datascan" {
|
|
|
|
|
| name | description | type | required | default |
|
|
|
|
|
|---|---|:---:|:---:|:---:|
|
|
|
|
|
| [data](variables.tf#L17) | The data source for DataScan. The source can be either a Dataplex `entity` or a BigQuery `resource`. | <code title="object({ entity = optional(string) resource = optional(string) })">object({…})</code> | ✓ | |
|
|
|
|
|
| [name](variables.tf#L119) | Name of Dataplex Scan. | <code>string</code> | ✓ | |
|
|
|
|
|
| [project_id](variables.tf#L130) | The ID of the project where the Dataplex DataScan will be created. | <code>string</code> | ✓ | |
|
|
|
|
|
| [region](variables.tf#L135) | Region for the Dataplex DataScan. | <code>string</code> | ✓ | |
|
|
|
|
|
| [name](variables.tf#L122) | Name of Dataplex Scan. | <code>string</code> | ✓ | |
|
|
|
|
|
| [project_id](variables.tf#L133) | The ID of the project where the Dataplex DataScan will be created. | <code>string</code> | ✓ | |
|
|
|
|
|
| [region](variables.tf#L138) | Region for the Dataplex DataScan. | <code>string</code> | ✓ | |
|
|
|
|
|
| [data_profile_spec](variables.tf#L29) | DataProfileScan related setting. Variable descriptions are provided in https://cloud.google.com/dataplex/docs/reference/rest/v1/DataProfileSpec. | <code title="object({ sampling_percent = optional(number) row_filter = optional(string) })">object({…})</code> | | <code>null</code> |
|
|
|
|
|
| [data_quality_spec](variables.tf#L38) | DataQualityScan related setting. Variable descriptions are provided in https://cloud.google.com/dataplex/docs/reference/rest/v1/DataQualitySpec. | <code title="object({ sampling_percent = optional(number) row_filter = optional(string) post_scan_actions = optional(object({ bigquery_export = optional(object({ results_table = optional(string) })) })) rules = list(object({ column = optional(string) ignore_null = optional(bool, null) dimension = string threshold = optional(number) non_null_expectation = optional(object({})) range_expectation = optional(object({ min_value = optional(number) max_value = optional(number) strict_min_enabled = optional(bool) strict_max_enabled = optional(bool) })) regex_expectation = optional(object({ regex = string })) set_expectation = optional(object({ values = list(string) })) uniqueness_expectation = optional(object({})) statistic_range_expectation = optional(object({ statistic = string min_value = optional(number) max_value = optional(number) strict_min_enabled = optional(bool) strict_max_enabled = optional(bool) })) row_condition_expectation = optional(object({ sql_expression = string })) table_condition_expectation = optional(object({ sql_expression = string })) })) })">object({…})</code> | | <code>null</code> |
|
|
|
|
|
| [description](variables.tf#L85) | Custom description for DataScan. | <code>string</code> | | <code>null</code> |
|
|
|
|
|
| [execution_schedule](variables.tf#L91) | Schedule DataScan to run periodically based on a cron schedule expression. If not specified, the DataScan is created with `on_demand` schedule, which means it will not run until the user calls `dataScans.run` API. | <code>string</code> | | <code>null</code> |
|
|
|
|
|
| [factories_config](variables.tf#L97) | Paths to data files and folders that enable factory functionality. | <code title="object({ data_quality_spec = optional(string) })">object({…})</code> | | <code>{}</code> |
|
|
|
|
|
| [data_quality_spec](variables.tf#L38) | DataQualityScan related setting. Variable descriptions are provided in https://cloud.google.com/dataplex/docs/reference/rest/v1/DataQualitySpec. | <code title="object({ sampling_percent = optional(number) row_filter = optional(string) post_scan_actions = optional(object({ bigquery_export = optional(object({ results_table = optional(string) })) })) rules = list(object({ column = optional(string) ignore_null = optional(bool, null) dimension = string threshold = optional(number) non_null_expectation = optional(object({})) range_expectation = optional(object({ min_value = optional(number) max_value = optional(number) strict_min_enabled = optional(bool) strict_max_enabled = optional(bool) })) regex_expectation = optional(object({ regex = string })) set_expectation = optional(object({ values = list(string) })) uniqueness_expectation = optional(object({})) statistic_range_expectation = optional(object({ statistic = string min_value = optional(number) max_value = optional(number) strict_min_enabled = optional(bool) strict_max_enabled = optional(bool) })) row_condition_expectation = optional(object({ sql_expression = string })) table_condition_expectation = optional(object({ sql_expression = string })) sql_assertion = optional(object({ sql_statement = string })) })) })">object({…})</code> | | <code>null</code> |
|
|
|
|
|
| [description](variables.tf#L88) | Custom description for DataScan. | <code>string</code> | | <code>null</code> |
|
|
|
|
|
| [execution_schedule](variables.tf#L94) | Schedule DataScan to run periodically based on a cron schedule expression. If not specified, the DataScan is created with `on_demand` schedule, which means it will not run until the user calls `dataScans.run` API. | <code>string</code> | | <code>null</code> |
|
|
|
|
|
| [factories_config](variables.tf#L100) | Paths to data files and folders that enable factory functionality. | <code title="object({ data_quality_spec = optional(string) })">object({…})</code> | | <code>{}</code> |
|
|
|
|
|
| [iam](variables-iam.tf#L24) | Dataplex DataScan IAM bindings in {ROLE => [MEMBERS]} format. | <code>map(list(string))</code> | | <code>{}</code> |
|
|
|
|
|
| [iam_bindings](variables-iam.tf#L31) | Authoritative IAM bindings in {KEY => {role = ROLE, members = [], condition = {}}}. Keys are arbitrary. | <code title="map(object({ members = list(string) role = string condition = optional(object({ expression = string title = string description = optional(string) })) }))">map(object({…}))</code> | | <code>{}</code> |
|
|
|
|
|
| [iam_bindings_additive](variables-iam.tf#L46) | Individual additive IAM bindings. Keys are arbitrary. | <code title="map(object({ member = string role = string condition = optional(object({ expression = string title = string description = optional(string) })) }))">map(object({…}))</code> | | <code>{}</code> |
|
|
|
|
|
| [iam_by_principals](variables-iam.tf#L17) | Authoritative IAM binding in {PRINCIPAL => [ROLES]} format. Principals need to be statically defined to avoid cycle errors. Merged internally with the `iam` variable. | <code>map(list(string))</code> | | <code>{}</code> |
|
|
|
|
|
| [incremental_field](variables.tf#L106) | The unnested field (of type Date or Timestamp) that contains values which monotonically increase over time. If not specified, a data scan will run for all data in the table. | <code>string</code> | | <code>null</code> |
|
|
|
|
|
| [labels](variables.tf#L112) | Resource labels. | <code>map(string)</code> | | <code>{}</code> |
|
|
|
|
|
| [prefix](variables.tf#L124) | Optional prefix used to generate Dataplex DataScan ID. | <code>string</code> | | <code>null</code> |
|
|
|
|
|
| [incremental_field](variables.tf#L109) | The unnested field (of type Date or Timestamp) that contains values which monotonically increase over time. If not specified, a data scan will run for all data in the table. | <code>string</code> | | <code>null</code> |
|
|
|
|
|
| [labels](variables.tf#L115) | Resource labels. | <code>map(string)</code> | | <code>{}</code> |
|
|
|
|
|
| [prefix](variables.tf#L127) | Optional prefix used to generate Dataplex DataScan ID. | <code>string</code> | | <code>null</code> |
|
|
|
|
|
|
|
|
|
|
## Outputs
|
|
|
|
|
|
|
|
|
|
|