(WIP) Read-only service accounts for automation and CI/CD (#1899)

* add design doc for the new CI/CD sa

* describe the actual implementation

* specify which files will need to be changed

* Update 0-cicd-plan-sa.md

* Update 0-cicd-plan-sa.md

* Update 0-cicd-plan-sa.md

* Update 0-cicd-plan-sa.md

* Update 0-cicd-plan-sa.md

* Update 0-cicd-plan-sa.md

* Update 0-cicd-plan-sa.md

* Fix typo

* stage 0 read-only service accounts

* stage 0 IAM map

* linting

* cicd read-only service accounts

* tweak workflow templates

* roles and github workflow fixes

* tfdoc

* Ad-hoc custom role factory for FAST bootstrap

* use factory variable for custom roles data path

* custom roles factory in org/project modules

* tfdoc

* rename custom roles factory variable, fix gitlab template

* gitlab workflow fixes

* fix merge

* output plan results on failed assertion

* update stage 0 expected values

* data platform branch

* gke

* networking

* security

* project factory

* outputs

* workflow templates

* resman apply fixes

* tfdoc

* fix stage 1 test fixture

* fix gh workflow

* read-only resman sa roles

* fix test

* read-only resman sa roles

* read-only resman sa roles

* read-only resman sa roles

* read-only resman sa roles

* fix test variables

* rename wif principal attribute names

* rename wif principal variables

* multitenant stages

---------

Co-authored-by: Wiktor Niesiobędzki <wiktorn@google.com>
Co-authored-by: Julio Castillo <jccb@google.com>
This commit is contained in:
Ludovico Magnocavallo
2023-12-27 12:33:16 +01:00
committed by GitHub
parent 70a94eda46
commit 9d6e61428b
56 changed files with 1888 additions and 878 deletions

View File

@@ -0,0 +1,85 @@
# Add new service accounts for CI/CD with plan-only permissions
**authors:** [Ludo](https://github.com/ludoo) \
**date:** December 3, 2023
## Status
In development.
## Context
The current CI/CD workflows are inherently insecure, as the same service account is used to run `terraform plan` in PR checks, and `terraform apply` in merges.
The current repository configuration variable allows setting a branch which could be used to only allow using the service account in merges, but that only has the consequence of preventing PR checks to work so it's not working as desired.
## Proposal
The proposal is to create a separate "chain" of less privileged service accounts that can only run `plan`, used only when a repository configuration sets a branch for merges in the `cicd_repositories` variable.
### Use cases
#### Merge branch set in repository configuration
```hcl
cicd_repositories = {
bootstrap = {
branch = "main"
identity_provider = "github-example"
name = "example/bootstrap"
type = "github"
}
}
# tftest skip
```
When a merge branch is set as in the example above, the CI/CD workflow will have two separate flows:
- for PR checks, the OIDC token will be exchanged with credentials for the `plan`-only CI/CD service account, which can only impersonate the `plan`-only automation service account
- for merges, the current flow that enables credential exchange and impersonation of the `apply`-enabled service account will be used
#### No merge branch set in repository configuration
```hcl
cicd_repositories = {
bootstrap = {
identity_provider = "github-example"
name = "example/bootstrap"
type = "github"
}
}
# tftest skip
```
If no merge branch is set in the repository configuration as in the example above, the current behaviour will be preserved allowing exchange and impersonation of the `apply`-enabled service account from any branch.
### Implementation
No changes to variables will be needed other than a lightweight refactor with `optional`.
The following resource changes will need to be implemented:
- define the set of read-only roles for each stage
- create a new automation service account in each stage and assign the identified roles
- create a new CI/CD service account with `roles/iam.serviceAccountTokenCreator` on the new automation service account
- if a merge branch is set in the repository configuration
- grant `roles/iam.workloadIdentityUser` on the new CI/CD service account to the `principalSet:` matching any branch
- define a new provider file that impersonates the new automation service account and use it in the workflow for checks
- keep the existing token exchange via `principal:`, impersonation and provider file for the `apply` part of the workflow only matching the specified merge branch
- if a branch is not set the current behaviour will be kept
Implementation will modify in stages 0 and 1
- the `automation.tf` files
- any file where IAM roles are assigned to the automation service account
- the `cicd-*.tf` files
- the `templates/workflow-*.yaml` files to implement the new workflow logic
- the `outputs.tf` files to generate the additional provider files
## Decision
This has been surfaced a while ago and implementation was only pending actual time for development. Development has started.
## Consequences
Existing CI/CD workflows will need to be replaced when a merge branch is already defined in the repository configuration (unlikely to happen as the current workflow would not work).

View File

@@ -16,6 +16,7 @@ This was not an issue when there were only a few networking stages, but as FAST
## Decision
We adopted an IP plan based on regions and environments with the following key points:
- Large ranges for the 3 environments we have out of the box (landing, dev, prod)
- Support for 2 regions
- Leave enough space to easily grow either the number of environments or regions
@@ -31,9 +32,10 @@ The following table summarizes the agreed IP plan:
| Region 2, secondary ranges | 100.80.0.0/12 | 100.80.0.0/14 | 100.84.0.0/16 | 100.88.0.0/14 |
To allocate additional secondary ranges for GKE clusters:
- For the pods range, use the next available /16 in the secondary range of its region/environment pair.
- For the service range, use the next available /24 in the last /16 of its region/environment pair.
## Consequences
Default subnets for networking stages were updated to reflect to new ranges.
Default subnets for networking stages were updated to reflect the new ranges.