Wednesday, 16 April 2014

Heat auth model updates - part 2 "Stack Domain Users"

As promised, here's the second part of my updates on the Heat auth model, following on from part 1 describing our use of Keystone trusts.

This post will cover details of the recently implemented instance-users blueprint, which makes use of keystone domains to contain users related to credentials which are deployed inside instances created by heat.  If you just want to know how the new stuff works, you can skip to the last sections :)

So...why does heat create users at all?

Lets start with a bit of context.  Heat has historically needed to do some or all of the following:
  1. Provide metadata to agents inside instances, which poll for changes and apply the configuration expressed in the metadata to the instance.
  2. Signal completion of some action, typically configuration of software on a VM after it is booted (because nova moves the state of a VM to "Active" as soon as it spawns it, not when heat has fully configured it)
  3. Provide application level status or metrics from inside the instance, e.g to allow AutoScaling actions to be performed in response to some measure of performance or quality of service.
Heat provides API's which enable all of these things, but all of those API's require some sort of authentication, e.g credentials so whatever agent is running on the instance is able to access it.  So credentials must be deployed inside the instance, e.g here's how things work if you're using the heat-cfntools agents:

heat-cfntools agents data-flow with CFN-compatible API's


The heat-cfntools agents use signed requests, which requires an ec2 keypair created via keystone, which is then used to sign requests to the heat cloudformation and cloudwatch compatible API's, which are authenticated by heat via signature validation (which uses the keystone ec2tokens extension).

The problem is, ec2 keypairs are associated with a user.  And we don't want to deploy credentials directly related to the stack owner, otherwise any compromise of the (implicitly untrusted) instance could result in a cascading compromise where an attacker could take control of anything the stack-owning user has permission to access.

I've used cfntools/ec2tokens as an example, but the same issue exists if you use any credential available via keystone (token, username/password) which can be used to authenticate with the heat APIs.

So we need separation/isolation of the credentials deployed in the instance, such that we can limit the access allowed to the minimum necessary to make heat work.  Our first attempt at this did the following:
  • Create a new user in keystone, in the same project as the stack owner (either explicitly in the template via User and AccessKey resources, or for some resources such as WaitConditionHandle and ScalingPolicy we do it internally to obtain an ec2 keypair for generation of a pre-signed URL)
  • Add the "heat stack user" to a special role, default "heat_stack_user" (configurable via the heat_stack_user_role in heat.conf)
  • Limit the API surface accessible to the "heat_stack_user" via policy.json, with the expectation that access to other service's will be restricted in a similar way, or denied completely via network separation/firewall rules.
This approach is flawed, and led to this long-standing bug, there are multiple problems:
  • It requires the user creating the stack to have permissions to create users in keystone, which typically requires administrative roles.
  • It doesn't provide complete separation - even with the policy rules, it's possible a compromised stack could abuse the credentials (for example obtaining metadata for some other stack created by the user in the same project)
  • It clutters the user list for the project with spurious (from the user/operator perspective) users who aren't "real" users, the users are a heat implementation detail, and we're exposing them to the end user.

Hmm, that sounds bad, what's the alternative?

Well, we've been considering that for quite some time ;) multiple solutions were discussed:
  • Delegating a subset of user roles via trusts (rejected because token expiry is not optional, and separation from the stack owner is desired, e.g we don't really want to delegate or impersonate them from the instance, we just need an identity which can be verified as related to the stack)
  • Rolling our own auth mechanism based on some random "token" (some folks were in favour of this, but I'm opposed to it, I think we should stick to orchestration and leverage or improve what's in keystone instead of taking on the burden and security risk of maintaining our own auth scheme)
  • Using the keystone OAuth extension to use OAuth keypairs and signed requests.  (This was rejected due to lack of keystoneclient support, e.g client API and auth middleware, maybe we'll revisit enabling this as an option in some future release).
  • Isolating the in-instance users by creating them in a completely separate heat-specific keystone domain.  This idea was first suggested by Adam Young, as is what we ended up implementing for Icehouse.

"Stack Domain Users", the details..

The new approach is, effectively, an optimisation of the existing implementation.  We encapuslate all stack-defined users (ie users created as a result of things contained in a heat template) in a separate domain, which is created specifically to contain things related only to heat stacks.  A user is created which is the "domain admin", and heat uses that user to manage the lifecycle of the users in the "stack user domain".

There are two aspects of this I'll discuss below, firstly what deployers need to do to enable stack domain users in Heat (Icehouse or later), and secondly what actually happens when you create a stack, and how it addresses the previously identified problems:

When deploying heat:

  • A special keystone domain is created, e.g one called "heat" and the ID is set in the "stack_user_domain" option in heat.conf
  • A user with sufficient permissions to create/delete projects and users in the "heat" domain is created, e.g in devstack a user called "heat_domain_admin" is created, and given the admin role on the heat domain.
  • The username/password for the domain admin user is set in heat.conf (stack_domain_admin and stack_domain_admin_password).  This user administers "stack domain users" on behalf of stack owners, so they no longer need to be admins themselves, and the risk of this escalation path is limited because the heat_domain_admin is only given administrative permission for the "heat" domain.
This is all done automatically for you when using recent devstack, but if you're deploying via some other method, you need to use python-openstackclient (which is the only CLI interface to v3 keystone) to create the domain and user:

Create the domain:
$OS_TOKEN refers to a token, e.g the service admin token or some other valid token for a user with sufficient roles to create users and domains.
$KS_ENDPOINT_V3 refers to the v3 keystone endpoint, e.g http://<keystone>:5000/v3 where <keystone> is the IP address or resolvable name for the keystone service.

openstack --os-token $OS_TOKEN --os-url=$KS_ENDPOINT_V3 --os-identity-api-version=3 domain create heat --description "Owns users and projects created by heat"
The domain ID is returned by this command, and is referred to as $HEAT_DOMAIN_ID below.

Create the user:
openstack --os-token $OS_TOKEN --os-url=$KS_ENDPOINT_V3 --os-identity-api-version=3 user create --password $PASSWORD --domain $HEAT_DOMAIN_ID heat_domain_admin --description "Manages users and projects created by heat"
The user ID is returned by this command and is referred to as $DOMAIN_ADMIN_ID below:

Make the user a domain admin:
openstack --os-token $OS_TOKEN --os-url=$KS_ENDPOINT_V3 --os-identity-api-version=3 role add --user $DOMAIN_ADMIN_ID --domain $HEAT_DOMAIN_ID admin

Then you need to add the domain ID, username and password from these steps to heat.conf:

stack_domain_admin_password = <password>
stack_domain_admin = heat_domain_admin
stack_user_domain = <domain id returned from domain create above>

When a user creates a stack:

  • We create a new "stack domain project" in the "heat" domain, if the stack contains any resources which require creation of a "stack domain user"
  • Any resources which require a user, we create the user in the "stack domain project", which is associated with the heat stack in the heat database, but is completely separate and unrelated (from an authentication perspective) to the stack owners project
  • The users created in the stack domain are still assigned the heat_stack_user role, so as before the API surface they can access is limited via policy.json
  • When API requests are processed, we do an internal lookup, and allow stack details for a given stack to be retrieved from the database for both the stack owner's project (the default API path to the stack), and also the "stack domain project", subject to the policy.json restrictions.
To clarify that last point, that means there are now two paths which can result in retrieval of the same data via the heat API, e.g for resource-metadata:

GET v1/​{stack_owner_project_id}​/stacks/​{stack_name}​/​{stack_id}​/resources/​{resource_name}​/metadata

or

GET v1/​{stack_domain_project_id}​/stacks/​{stack_name}​/​{stack_id}​/resources/​{resource_name}​/metadata

The stack owner would use the former (e.g via "heat resource-metadata {stack_name} {resource_name}), and any agents in the instance will use the latter.

This solves all of the problems identified previously:
  • The stack owner no longer requires admin roles, because the heat_domain_admin user administers stack domain users
  • There is complete separation, the users created in the stack domain project cannot access any resources other than those explicitly allowed by heat, any attempt to access other stacks, or any other resource owned by the stack-owner will fail.
  • The list of users in the stack-owner project is unaffected, because we've created a completely different project in another domain.
Hopefully that provides a fairly clear picture of the new feature, and how it works - it should be transparent to users but I'm hoping this information may be useful to deployers when adopting the functionality for Icehouse.

The main gap still to be investigated is how we handle situations where keystone is backed by a read-only directory (e.g LDAP), my expectation is that it can be solved via the keystone capability to have different identity drivers per domain, so you could for example have e.g domains containing human users backed by LDAP, and the heat domain backed by SQL.  My understanding is that there are outstanding issues to be solved for Juno in keystone, but I will post a future update when I've had time to do some testing and figure out what works.

That is all, respect if you managed to read it all! ;)

2 comments:

  1. I should have read this much ealier ;) This is great!

    ReplyDelete