* format python files in blueprints * update check on blueprints python code * update python linter in CI workflow
Data ingestion Demo
In this folder, you can find an example to ingest data on the data platform instantiated here.
The example is not intended to be a production-ready code.
Demo use case
The demo imports purchase data generated by a store.
Input files
Data are uploaded to the drop off GCS bucket. File structure:
customers.csv: Comma separate value with customer information in the following format: Customer ID, Name, Surname, Registration Timestamppurchases.csv: Comma separate value with customer information in the following format: Item ID, Customer ID, Item, Item price, Purchase Timestamp
Data processing pipelines
Different data pipelines are provided to highlight different features and patterns. For the purpose of the example, a single pipeline handle all data lifecycles. When adapting them to your real use case, you may want to evaluate the option to handle each functional step on a separate pipeline or a dedicated tool. For example, you may want to use Dataform to handle data schemas lifecycle.
Below you can find a description of each example:
- Simple import data:
datapipeline.pyis a simple pipeline to import provided data from thedrop offGoogle Cloud Storage bucket to the Data Hub Confidential layer joiningcustomersandpurchasestables intocustomerpurchasetable. - Import data with Policy Tags:
datapipeline_dc_tags.pyimports provided data fromdrop offbucket to the Data Hub Confidential layer protecting sensitive data using Data Catalog policy Tags. - Delete tables:
delete_table.pydeletes BigQuery tables created by import pipelines.
Running the demo
To run demo examples, please follow the following steps:
- 01: Copy sample data to the
drop offCloud Storage bucket impersonating theloadservice account. - 02: Copy sample data structure definition in the
orchestrationCloud Storage bucket impersonating theorchestrationservice account. - 03: Copy the Cloud Composer DAG to the Cloud Composer Storage bucket impersonating the
orchestrationservice account. - 04: Build the Dataflow Flex template and image via a Cloud Build pipeline
- 05: Open the Cloud Composer Airflow UI and run the imported DAG.
- 06: Run the BigQuery query to see results.
You can find pre-computed commands in the demo_commands output variable of the deployed terraform data pipeline.