This page looks best with JavaScript enabled

Pitfall of upgrading multitenancy Hashicorp Vault Terraform provider

· ☕ 4 min read
🏷️
  • #hashicorp
  • #vault
  • Hi, so long after my last post.

    If you are using Terraform Vault provider to provision for Hashicorp Vault and the version is lower than 2.3.0 and also using cloud provider auth methods, you should read this.
    As I was upgrading the Vault and the Terraform provider vault(terraform-provider-vault), I ran into this pitwall.

    You will never be able to upgrade your Terraform provider if you ran into this pitfall and leave it. So, you should understand the details of this pitfall and solve it before your provider getting stale and not maintained.

    If you are not using Terraform to provision Vault, you don’t need to read this article

    Before you read

    You should understand the basics of Hashicorp Vault.
    If you don’t know Vault, please read the official document first.

    Which version

    Between terraform-provider-vault 2.2.0 and 2.3.0.

    Multitenancy in Vault

    Multitenancy in Vault is enabled by Vault’s namespace feature.
    Each service has it’s own namespace like serviceA, serviceB…. And all the API request’s endpoints will be

    1. Relative path. Example: serviceA/secret/xxx
    2. Absolute path with X-Vault-Namespace headers. Example: serviceA with X-Vault-Namespace: serviceA
    3. (if it has deep namespace)Mixture of 1 and 2.

    But anyway, Vault can be used for multi tenant.

    Pitfall: One single path for Vault Backend

    Here, this is a example of vault_auth_backend resource for GCP.

    1
    2
    3
    4
    
    resource "vault_auth_backend" "gcp" {
        path = "gcp"
        type = "gcp"
    }
    

    The path field is optional and you can configure any paths BEFORE 2.3.0. In some cases, if you are using Vault with multitenancy, you will want to configure like this below:

    1
    2
    3
    4
    
    resource "vault_auth_backend" "gcp_serviceA" {
        path = "gcp/serviceA"
        type = "gcp"
    }
    
    1
    2
    3
    4
    
    resource "vault_auth_backend" "gcp_serviceB" {
        path = "gcp/serviceB"
        type = "gcp"
    }
    

    And so on. Again, this worked with BEFORE 2.3.0. But, the path check was introduced in 2.3.0. The path has to be one single path like gcp or xxx.(ref: vault_gcp_auth_backend_role: “Expecdted 4 parts in path ‘auth/gcp/foo/role/bar’", GitHub Issue)
    The most terrible part is, this validation was introduces in the read job. Therefore, even if you only upgrade the terraform version like below, all terraform operation will fail.

    1
    2
    3
    4
    
    provider "vault" {
      - version = "2.2.0"
      + version = "2.3.0"
    }
    

    The backend field of each cloud provider backend for example vault_gcp_backend_role should be in 4 / seprated parts. The backend will be

    • auth/<path>/role/<rolename>

    so the path shouldn’t contain any /.

    How should we solve this?

    Migrating to the new path with small steps

    You can solve this problem by migrating the current resource to the new path.
    But the application is using the legacy path. Therefore, you have to migrate it gradually with small steps.

    This is the steps.

    1. Remove the old resource form the Terraform State.
    2. Update the path with new path and apply those changes
    3. Update the application implementation for Vault authentication
    4. Delete the resource with old path manually

    1. Removing from the Terraform State

    The reason that I removed from the Terrafor State is I wanted to ignore changed.

    Run this command.

    $ terraform state rm vault_auth_backend.gcp
    

    And also the resources that refer to this path. For example:

    $ terraform state rm vault_gcp_auth_backend_role.xxxx
    

    2. Update the path

    Update the path argument with new path. You can’t specify existing path like gcp.

    1
    2
    3
    4
    5
    
    resource "vault_auth_backend" "gcp_serviceB" {
        - path = "gcp/serviceB"
        + path = "google-cloud-platform"
        type = "gcp"
    }
    

    Then terraform apply this.

    3. Update the application implementation

    You have to update ALL microservices request endpoint to Vault. Be careful, if there are microservices that doesn’t upgrade and go no to the next step, that microservice will not be able to authenticate on Vault.

    This requires developer effor.

    4. Delete the old resource

    After confirming that the all microservice is updated, you can delete the backend with old paths.

    Next

    Maintaining the Terraform provider version and Hashicorp Vault is not easy.
    Even in minor changes, there can be a breaking change like this pitfall. This is not documented and you will have to solve(=migrate) this for futher upgrades.

    This was my first time in my carear that I have to read all the changelogs and even commits of the upgrade.
    As an SRE, this was not easy and took time but I was able to upgrade all the components. It required developers efforts too.

    I was a good exprience because if I fail the migration, all application will not be able to login, which means that the service will be entirely down.
    I will work even hard! Thanks for reading.

    Reference

    Share on

    Keisuke Yamashita
    WRITTEN BY
    Keisuke Yamashita
    Site Reliability Engineer