FAST: move organization policies to stage 0 (#1698)

* design doc

* Update 0-org-policies.md

* moved org policies to stage 0, wip

* stage0

* stage 0

* export tag keys and values from stage 0

* rename factory variable

* change org policy outputs

* stage 1

* Update 0-org-policies.md

* make org policy variable not nullable, README changes

* use optionals for tag names

* better factory variable name

* README changes

* ADR
This commit is contained in:
Ludovico Magnocavallo
2023-09-21 16:03:21 +02:00
committed by GitHub
parent 79b0dc9751
commit f628cdbc06
18 changed files with 202 additions and 182 deletions

View File

@@ -1,29 +1,54 @@
# Support initial org policies in bootstrap stage
# Move organization policies to bootstrap stage
**authors:** [Ludo](https://github.com/ludoo), [Roberto](https://github.com/drebes) \
**date:** July 15, 2023
**authors:** [Julio](https://github.com/juliocc), [Ludo](https://github.com/ludoo), [Roberto](https://github.com/drebes) \
**date:** September 13, 2023
## Status
Under discussion.
Implemented.
## Context
Some organization policies might need to be applied right from the start in stage 0, to better configure defaults for organization resources or to improve security. Two examples are skipping default network creation for projects (which would avoid creating a network in the IaC project), and deprivileging service accounts.
Three different requirements drive this proposal.
There are essentially two ways of achieving this
### Organization policies deployed at bootstrap time
- moving organization policies from stage 1 to stage 0, which would also mean moving tags or at least the `org-policies` key and its values
- turning on the organization policy factory in stage 0
Many organizations take security seriously, and would like to have organization policies (for example `constraints/iam.automaticIamGrantsForDefaultServiceAccounts`) deployed right from the beginning at bootstrap time. This is currently extremely cumbersome, as organization policies are managed in stage 1.
The first approach is complex and results in a "fat" stage 0, which then would need to be applied every time an organization policy or even worse a tag is changed. This is counter to our initial approach where stage 0 is a "decoupling" (from org admin permissions) stage, only taking care of org-level IAM and logging, and the initial IaC resources.
As an additional benefit, managing some or all organization policies in stage 0 will enable to turn off undesired resource configuration for the initial projects (for example `constraints/compute.skipDefaultNetworkCreation`).
The second approach is lightweight and its only impact is in the need to avoid duplication between the organization policies managed in stage 0, and those managed in stage 1.
### Simplify and limit delegation of Organization Policy Administrator role
Automation service accounts are currently assigned the Organization Policy Administrator role at the organization level, scoped via resource management tags. This is cumbersome as bindings are distributed between stage 0 that delegates role control to the stage 1 service account, and stage 1 that creates the automation service accounts, tags and folder bindings used for scoping.
A more secure way of doing this is via a dedicated resource management tag value hierarchy, and conditions on the organization policies that alter behaviour based on tags. This would allow centrally defining allowed exceptions to organization policies, and selectively granting access to specific exceptions to individual automation service accounts via tag values.
The project factory will need to retain scoped grants, to set policies that enforce lists of resources which would be too cumbersome to maintain in stage 0.
### Reduce stage 1 complexity to allow simpler creation of hierarchy templates
Stage 1 is currently too complex to allow easy cloning into different resource hierarchy templates, which are needed to account for all landing zone designs.
Removing complexity from stage 1 by moving organization policy and its related IAM to stage 0 will be an initial step towards stage 1 simplification.
## Proposal
The proposal is to
- move management of organization policies to stage 0
- move management of the `org_policies` tag key and associated values to stage 0
- remove delegated/conditional grants for the Organization Policy Administrator role from stage 0 and 1
The approach fattens stage 0 and lessens its decoupling role in the overall FAST design, but looks preferable compared to the complexity of splitting organization policy management between stage 0 and 1, or worse delegating control of specific policies to external commands run before stage 0.
## Decision
No decision yet, this will need to be discussed.
Decision is to implement this.
## Consequences
TBD
Organization policies and related tags will need to be moved from stage 1 to stage 0 state. One approach is to
- switch both states to local state
- use `terraform state mv -state-out` to temporarily move resources from stage 1 to stage 0
- push stage 0 and stage 1 state