Fix README

This commit is contained in:
Lorenzo Caggioni
2022-01-14 21:41:11 +01:00
parent 33ce60cf00
commit 1f063b2f09

View File

@@ -90,13 +90,13 @@ We need to create 3 file:
- A `person_udf.js` containing the UDF javascript file used by the Dataflow template.
- A `person_schema.json` file containing the table schema used to import the CSV.
You can find an example of those file in the folder `./data-demo`. You can copy the example files in the GCS bucket using the command returned in the terraform output as `command-01-gcs`.
You can find an example of those file in the folder `./data-demo`. You can copy the example files in the GCS bucket using the command returned in the terraform output as `command-01-gcs`. Below an example:
```bash
gsutil -i gcs-landing@PROJECT.iam.gserviceaccount.com cp data-demo/* gs://LANDING_BUCKET
```
We can now run the Dataflow pipeline using the `gcloud` returned in the terraform output as `command-03-dataflow`.
We can now run the Dataflow pipeline using the `gcloud` returned in the terraform output as `command-02-dataflow`. Below an example:
```bash
gcloud --impersonate-service-account=orch-test@PROJECT.iam.gserviceaccount.com dataflow jobs run test_batch_01 \
@@ -116,14 +116,12 @@ outputTable=PROJECT:bq_dataset.person,\
bigQueryLoadingTemporaryDirectory=gs://PROJECT-eu-df-tmplocation
```
You can check data imported into Google BigQuery using the command returned in the terraform output as `command-03-bq`:
You can check data imported into Google BigQuery using the command returned in the terraform output as `command-03-bq`. Below an example:
```
bq query --use_legacy_sql=false 'SELECT * FROM `PROJECT.datalake.person` LIMIT 1000'
```
<!-- BEGIN TFDOC -->
## Variables
@@ -151,4 +149,3 @@ bq query --use_legacy_sql=false 'SELECT * FROM `PROJECT.datalake.person` LIMIT 1
| serviceaccount | Service account. | |
<!-- END TFDOC -->