Showing posts with label openstack. Show all posts
Showing posts with label openstack. Show all posts

Monday, 4 June 2018

TripleO Containerized deployments, debugging basics

Containerized deployments, debugging basics

Since the Pike release, TripleO has supported deployments with OpenStack services running in containers.  Currently we use docker to run images based on those maintained by the Kolla project.

We already have some tips and tricks for container deployment debugging in tripleo-docs, but below are some more notes on my typical debug workflows.

Config generation debugging overview

In the TripleO container architecture, we still use Puppet to generate configuration files and do some bootstrapping, but it is run (inside a container) via a script docker-puppet.py

The config generation usage happens at the start of the deployment (step 1) and the configuration files are generated for all services (regardless of which step they are started in).

The input file used is /var/lib/docker-puppet/docker-puppet.json, but you can also filter this (e.g via cut/paste or jq as shown below) to enable debugging for specific services - this is helpful when you need to iterate on debugging a config generation issue for just one service.

[root@overcloud-controller-0 docker-puppet]# jq '[.[]|select(.config_volume | contains("heat"))]' /var/lib/docker-puppet/docker-puppet.json | tee /tmp/heat_docker_puppet.json
{
  "puppet_tags": "heat_config,file,concat,file_line",
  "config_volume": "heat_api",
  "step_config": "include ::tripleo::profile::base::heat::api\n",
  "config_image": "192.168.24.1:8787/tripleomaster/centos-binary-heat-api:current-tripleo"
}
{
  "puppet_tags": "heat_config,file,concat,file_line",
  "config_volume": "heat_api_cfn",
  "step_config": "include ::tripleo::profile::base::heat::api_cfn\n",
  "config_image": "192.168.24.1:8787/tripleomaster/centos-binary-heat-api-cfn:current-tripleo"
}
{
  "puppet_tags": "heat_config,file,concat,file_line",
  "config_volume": "heat",
  "step_config": "include ::tripleo::profile::base::heat::engine\n\ninclude ::tripleo::profile::base::database::mysql::client",
  "config_image": "192.168.24.1:8787/tripleomaster/centos-binary-heat-api:current-tripleo"
}

 

Then we can run the config generation, if necessary changing the tags (or puppet modules, which are consumed from the host filesystem e.g /etc/puppet/modules) until the desired output is achieved:


[root@overcloud-controller-0 docker-puppet]# export NET_HOST='true'
[root@overcloud-controller-0 docker-puppet]# export DEBUG='true'
[root@overcloud-controller-0 docker-puppet]# export PROCESS_COUNT=1
[root@overcloud-controller-0 docker-puppet]# export CONFIG=/tmp/heat_docker_puppet.json
[root@overcloud-controller-0 docker-puppet]# python /var/lib/docker-puppet/docker-puppet.py2018-02-09 16:13:16,978 INFO: 102305 -- Running docker-puppet
2018-02-09 16:13:16,978 DEBUG: 102305 -- CONFIG: /tmp/heat_docker_puppet.json
2018-02-09 16:13:16,978 DEBUG: 102305 -- config_volume heat_api
2018-02-09 16:13:16,978 DEBUG: 102305 -- puppet_tags heat_config,file,concat,file_line
2018-02-09 16:13:16,978 DEBUG: 102305 -- manifest include ::tripleo::profile::base::heat::api
2018-02-09 16:13:16,978 DEBUG: 102305 -- config_image 192.168.24.1:8787/tripleomaster/centos-binary-heat-api:current-tripleo
...

 

When the config generation is completed, configuration files are written out to /var/lib/config-data/heat.

We then compare timestamps against the /var/lib/config-data/heat/heat.*origin_of_time file (touched for each service before we run the config-generating containers), so that only those files modified or created by puppet are copied to /var/lib/config-data/puppet-generated/heat.

Note that we also calculate a checksum for each service (see /var/lib/config-data/puppet-generated/*.md5sum), which means we can detect when the configuration changes - when this happens we need paunch to restart the containers, even though the image did not change.

This checksum is added to the /var/lib/tripleo-config/hashed-docker-container-startup-config-step_*.json files by docker-puppet.py, and these files are later used by paunch to decide if a container should be restarted (see below).

 

Runtime debugging, paunch 101

Paunch is a tool that orchestrates launching containers for each step, and performing any bootstrapping tasks not handled via docker-puppet.py.

It accepts a json format, which are the /var/lib/tripleo-config/docker-container-startup-config-step_*.json files that are created based on the enabled services (the content is directly derived from the service templates in tripleo-heat-templates)

These json files are then modified via docker-puppet.py (as mentioned above) to add a TRIPLEO_CONFIG_HASH value to the container environment - these modified files are written with a different name, see /var/lib/tripleo-config/hashed-docker-container-startup-config-step_*.json

Note this environment variable isn't used by the container directly, it is used as a salt to trigger restarting containers when the configuration files in the mounted config volumes have changed.

As in the docker-puppet case it's possible to filter the json file with jq and debug e.g mounted volumes or other configuration changes directly.

It's also possible to test configuration changes by manually modifying /var/lib/config-data/puppet-generated/ then either restarting the container via docker restart, or by modifying TRIPLEO_CONFIG_HASH then re-running paunch.

Note paunch will kill any containers tagged for a particular step e.g the --config-id tripleo_step4 --managed-by tripleo-Controller means all containers started during this step for any previous paunch apply will be killed if they are removed from your json during testing.  This is a feature which enables changes to the enabled services on update to your overcloud but it's worth bearing in mind when testing as described here.


[root@overcloud-controller-0]# cd /var/lib/tripleo-config/
[root@overcloud-controller-0 tripleo-config]# jq '{"heat_engine": .heat_engine}' hashed-docker-container-startup-config-step_4.json | tee /tmp/heat_startup_config.json
{
  "heat_engine": {
    "healthcheck": {
      "test": "/openstack/healthcheck"
    },
    "image": "192.168.24.1:8787/tripleomaster/centos-binary-heat-engine:current-tripleo",
    "environment": [
      "KOLLA_CONFIG_STRATEGY=COPY_ALWAYS",
      "TRIPLEO_CONFIG_HASH=14617e6728f5f919b16c74f1e98d0264"
    ],
    "volumes": [
      "/etc/hosts:/etc/hosts:ro",
      "/etc/localtime:/etc/localtime:ro",
      "/etc/pki/ca-trust/extracted:/etc/pki/ca-trust/extracted:ro",
      "/etc/pki/tls/certs/ca-bundle.crt:/etc/pki/tls/certs/ca-bundle.crt:ro",
      "/etc/pki/tls/certs/ca-bundle.trust.crt:/etc/pki/tls/certs/ca-bundle.trust.crt:ro",
      "/etc/pki/tls/cert.pem:/etc/pki/tls/cert.pem:ro",
      "/dev/log:/dev/log",
      "/etc/ssh/ssh_known_hosts:/etc/ssh/ssh_known_hosts:ro",
      "/etc/puppet:/etc/puppet:ro",
      "/var/log/containers/heat:/var/log/heat",
      "/var/lib/kolla/config_files/heat_engine.json:/var/lib/kolla/config_files/config.json:ro",
      "/var/lib/config-data/puppet-generated/heat/:/var/lib/kolla/config_files/src:ro"
    ],
    "net": "host",
    "privileged": false,
    "restart": "always"
  }
}
[root@overcloud-controller-0 tripleo-config]#  paunch --debug apply --file /tmp/heat_startup_config.json --config-id tripleo_step4 --managed-by tripleo-Controller
stdout: dd60546daddd06753da445fd973e52411d0a9031c8758f4bebc6e094823a8b45

stderr: 
[root@overcloud-controller-0 tripleo-config]# docker ps | grep heat
dd60546daddd        192.168.24.1:8787/tripleomaster/centos-binary-heat-engine:current-tripleo          "kolla_start"            9 seconds ago       Up 9 seconds (health: starting)                       heat_engine

 

 

Containerized services, logging

There are a couple of ways to access the container logs:

  • On the host filesystem, the container logs are persisted under /var/log/containers/<service>
  • docker logs <container id or name>
It is also often useful to use docker inspect <container id or name> to verify the container configuration, e.g the image in use and the mounted volumes etc.

 

Debugging containers directly

Sometimes logs are not enough to debug problems, and in this case you must interact with the container directly to diagnose the issue.

When a container is not restarting, you can attach a shell to the running container via docker exec:


[root@openstack-controller-0 ~]# docker exec -ti heat_engine /bin/bash
()[heat@openstack-controller-0 /]$ ps ax
    PID TTY      STAT   TIME COMMAND
      1 ?        Ss     0:00 /usr/local/bin/dumb-init /bin/bash /usr/local/bin/kolla_start
      5 ?        Ss     1:50 /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat
     25 ?        S      3:05 /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat
     26 ?        S      3:06 /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat
     27 ?        S      3:06 /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat
     28 ?        S      3:05 /usr/bin/python /usr/bin/heat-engine --config-file /usr/share/heat/heat-dist.conf --config-file /etc/heat/heat
   2936 ?        Ss     0:00 /bin/bash
   2946 ?        R+     0:00 ps ax

 

That's all for today, for more information please refer to tripleo-docs,, or feel free to ask questions in #tripleo on Freenode!

Friday, 9 February 2018

Debugging TripleO revisited - Heat, Ansible & Puppet

Some time ago I wrote a post about debugging TripleO heat templates, which contained some details of possible debug workflows when TripleO deployments fail.

In recent releases (since the Pike release) we've made some major changes to the TripleO architecture - we makes more use of Ansible "under the hood", and we now support deploying containerized environments.  I described some of these architectural changes in a talk at the recent OpenStack Summit in Sydney.

In this post I'd like to provide a refreshed tutorial on typical debug workflow, primarily focussing on the configuration phase of a typical TripleO deployment, and with particular focus on interfaces which have changed or are new since my original debugging post.

We'll start by looking at the deploy workflow as a whole, some heat interfaces for diagnosing the nature of the failure, then we'll at how to debug directly via Ansible and Puppet.  In a future post I'll also cover the basics of debugging containerized deployments.

The TripleO deploy workflow, overview

A typical TripleO deployment consists of several discrete phases, which are run in order:

Provisioning of the nodes


  1. A "plan" is created (heat templates and other files are uploaded to Swift running on the undercloud
  2. Some validation checks are performed by Mistral/Heat then a Heat stack create is started (by Mistral on the undercloud)
  3. Heat creates some groups of nodes (one group per TripleO role e.g "Controller"), which results in API calls to Nova
  4. Nova makes scheduling/placement decisions based on your flavors (which can be different per role), and calls Ironic to provision the baremetal nodes
  5. The nodes are provisioned by Ironic

This first phase is the provisioning workflow, after that is complete and the nodes are reported ACTIVE by nova (e.g the nodes are provisioned with an OS and running).

Host preparation

The next step is to configure the nodes in preparation for starting the services, which again has a specific workflow (some optional steps are omitted for clarity):

  1. The node networking is configured, via the os-net-config tool
  2. We write hieradata for puppet to the node filesystem (under /etc/puppet/hieradata/*)
  3. We write some data files to the node filesystem (a puppet manifest for baremetal configuration, and some json files that are used for container configuration)

Service deployment, step-by-step configuration

The final step is to deploy the services, either on the baremetal host or in containers, this consists of several tasks run in a specific order:

  1. We run puppet on the baremetal host (even in the containerized architecture this is still needed, e.g to configure the docker daemon and a few other things)
  2. We run "docker-puppet.py" to generate the configuration files for each enabled service (this only happens once, on step 1, for all services)
  3. We start any containers enabled for this step via the "paunch" tool, which translates some json files into running docker containers, and optionally does some bootstrapping tasks.
  4. We run docker-puppet.py again (with a different configuration, only on one node the "bootstrap host"), this does some bootstrap tasks that are performed via puppet, such as creating keystone users and endpoints after starting the service.

Note that these steps are performed repeatedly with an incrementing step value (e.g step 1, 2, 3, 4, and 5), with the exception of the "docker-puppet.py" config generation which we only need to do once (we just generate the configs for all services regardless of which step they get started in).

Below is a diagram which illustrates this step-by-step deployment workflow:
TripleO Service configuration workflow

The most common deployment failures occur during this service configuration phase of deployment, so the remainder of this post will primarily focus on debugging failures of the deployment steps.

 

Debugging first steps - what failed?

Heat Stack create failed.
 

Ok something failed during your TripleO deployment, it happens to all of us sometimes!  The next step is to understand the root-cause.

My starting point after this is always to run:

openstack stack failures list --long <stackname>

(undercloud) [stack@undercloud ~]$ openstack stack failures list --long overcloud
overcloud.AllNodesDeploySteps.ControllerDeployment_Step1.0: 
  resource_type: OS::Heat::StructuredDeployment 
  physical_resource_id: 421c7860-dd7d-47bd-9e12-de0008a4c106 
  status: CREATE_FAILED 
  status_reason: | 
    Error: resources[0]: Deployment to server failed: deploy_status_code : Deployment exited with non-zero status code: 2 
  deploy_stdout: | 
     
    PLAY [localhost] ***************************************************************  
     
    ... 
     
    TASK [Run puppet host configuration for step 1] ********************************  
    ok: [localhost] 
     
    TASK [debug] *******************************************************************  
    fatal: [localhost]: FAILED! => { 
        "changed": false,  
        "failed_when_result": true,  
        "outputs.stdout_lines|default([])|union(outputs.stderr_lines|default([]))": [ 
            "Debug: Runtime environment: puppet_version=4.8.2, ruby_version=2.0.0, run_mode=user, default_encoding=UTF-8",  
            "Error: Evaluation Error: Error while evaluating a Resource Statement, Unknown resource type: 'ugeas' at /etc/puppet/modules/tripleo/manifests/profile/base/docker.pp:181:5 on node overcloud-controller-0.localdomain" 
        ] 
    } 
          to retry, use: --limit @/var/lib/heat-config/heat-config-ansible/8dd0b23a-acb8-4e11-aef7-12ea1d4cf038_playbook.retry 
     
    PLAY RECAP ********************************************************************* 
    localhost                  : ok=18   changed=12   unreachable=0    failed=1    
 

We can tell several things from the output (which has been edited above for brevity), firstly the name of the failing resource

overcloud.AllNodesDeploySteps.ControllerDeployment_Step1.0
  • The error was on one of the Controllers (ControllerDeployment)
  • The deployment failed during the per-step service configuration phase (the AllNodesDeploySteps part tells us this)
  • The failure was during the first step (Step1.0)
Then we see more clues in the deploy_stdout, ansible failed running the task which runs puppet on the host, it looks like a problem with the puppet code.

With a little more digging we can see which node exactly this failure relates to, e.g we copy the SoftwareDeployment ID from the output above, then run:

(undercloud) [stack@undercloud ~]$ openstack software deployment show 421c7860-dd7d-47bd-9e12-de0008a4c106 --format value --column server_id
29b3c254-5270-42ae-8150-9fc3f67d3d89
(undercloud) [stack@undercloud ~]$ openstack server list | grep 29b3c254-5270-42ae-8150-9fc3f67d3d89
| 29b3c254-5270-42ae-8150-9fc3f67d3d89 | overcloud-controller-0  | ACTIVE | ctlplane=192.168.24.6 | overcloud-full | oooq_control |
 

Ok so puppet failed while running via ansible on overcloud-controller-0.

 

Debugging via Ansible directly

Having identified that the problem was during the ansible-driven configuration phase, one option is to re-run the same configuration directly via ansible-ansible playbook, so you can either increase verbosity or potentially modify the tasks to debug the problem.

Since the Queens release, this is actually very easy, using a combination of the new "openstack overcloud config download" command and the tripleo dynamic ansible inventory.

(undercloud) [stack@undercloud ~]$ openstack overcloud config download
The TripleO configuration has been successfully generated into: /home/stack/tripleo-VOVet0-config
(undercloud) [stack@undercloud ~]$ cd /home/stack/tripleo-VOVet0-config
 (undercloud) [stack@undercloud tripleo-VOVet0-config]$ ls
 common_deploy_steps_tasks.yaml    external_post_deploy_steps_tasks.yaml  templates
 Compute                           global_vars.yaml                       update_steps_playbook.yaml
 Controller                        group_vars                             update_steps_tasks.yaml
 deploy_steps_playbook.yaml        post_upgrade_steps_playbook.yaml       upgrade_steps_playbook.yaml
 external_deploy_steps_tasks.yaml  post_upgrade_steps_tasks.yaml          upgrade_steps_tasks.yaml
 

Here we can see there is a "deploy_steps_playbook.yaml", which is the entry point to run the ansible service configuration steps.  This runs all the common deployment tasks (as outlined above) as well as any service specific tasks (these end up in task include files in the per-role directories, e.g Controller and Compute in this example).

We can run the playbook again on all nodes with the tripleo-ansible-inventory from tripleo-validations, which is installed by default on the undercloud:

(undercloud) [stack@undercloud tripleo-VOVet0-config]$ ansible-playbook -i /usr/bin/tripleo-ansible-inventory deploy_steps_playbook.yaml --limit overcloud-controller-0
 ...
TASK [Run puppet host configuration for step 1] ********************************************************************
ok: [192.168.24.6]

TASK [debug] *******************************************************************************************************
fatal: [192.168.24.6]: FAILED! => {
    "changed": false, 
    "failed_when_result": true, 
    "outputs.stdout_lines|default([])|union(outputs.stderr_lines|default([]))": [
        "Notice: hiera(): Cannot load backend module_data: cannot load such file -- hiera/backend/module_data_backend", 
        "exception: connect failed", 
        "Warning: Undefined variable '::deploy_config_name'; ", 
        "   (file & line not available)", 
        "Warning: Undefined variable 'deploy_config_name'; ", 
        "Error: Evaluation Error: Error while evaluating a Resource Statement, Unknown resource type: 'ugeas' at /etc/puppet/modules/tripleo/manifests/profile
/base/docker.pp:181:5 on node overcloud-controller-0.localdomain"
    ]
}

NO MORE HOSTS LEFT *************************************************************************************************
 to retry, use: --limit @/home/stack/tripleo-VOVet0-config/deploy_steps_playbook.retry

PLAY RECAP *********************************************************************************************************
192.168.24.6               : ok=56   changed=2    unreachable=0    failed=1   
 

Here we can see the same error is reproduced directly via ansible, and we made use of the --limit option to only run tasks on the overcloud-controller-0 node.  We could also have added --tags to limit the tasks further (see tripleo-heat-templates for which tags are supported).

If the error were ansible related, this would be a good way to debug and test any potential fixes to the ansible tasks, and in the upcoming Rocky release there are plans to switch to this model of deployment by default.

 

Debugging via Puppet directly

Since this error seems to be puppet related, the next step is to reproduce it on the host (obviously the steps above often yield enough information to identify the puppet error, but this assumes you need to do more detailed debugging directly via puppet):

Firstly we log on to the node, and look at the files in the /var/lib/tripleo-config directory.

(undercloud) [stack@undercloud tripleo-VOVet0-config]$ ssh heat-admin@192.168.24.6
Warning: Permanently added '192.168.24.6' (ECDSA) to the list of known hosts.
Last login: Fri Feb  9 14:30:02 2018 from gateway
[heat-admin@overcloud-controller-0 ~]$ cd /var/lib/tripleo-config/
[heat-admin@overcloud-controller-0 tripleo-config]$ ls
docker-container-startup-config-step_1.json  docker-container-startup-config-step_4.json  puppet_step_config.pp
docker-container-startup-config-step_2.json  docker-container-startup-config-step_5.json
docker-container-startup-config-step_3.json  docker-container-startup-config-step_6.json
 

The puppet_step_config.pp file is the manifest applied by ansible on the baremetal host

We can debug any puppet host configuration by running puppet apply manually. Note that hiera is used to control the step value, this will be at the same value as the failing step, but it can also be useful sometimes to manually modify this for development testing of different steps for a particular service.

[root@overcloud-controller-0 tripleo-config]# hiera -c /etc/puppet/hiera.yaml step
1
[root@overcloud-controller-0 tripleo-config]# cat /etc/puppet/hieradata/config_step.json 
{"step": 1}[root@overcloud-controller-0 tripleo-config]# puppet apply --debug puppet_step_config.pp
...
Error: Evaluation Error: Error while evaluating a Resource Statement, Unknown resource type: 'ugeas' at /etc/puppet/modules/tripleo/manifests/profile/base/docker.pp:181:5 on node overcloud-controller-0.localdomain
 

Here we can see the problem is a typo in the /etc/puppet/modules/tripleo/manifests/profile/base/docker.pp file at line 181, I look at the file, fix the problem (ugeas should be augeas) then re-run puppet apply to confirm the fix.

Note that with puppet module fixes you will need to get the fix either into an updated overcloud image, or update the module via deploy artifacts for testing local forks of the modules.

That's all for today, but in a future post, I will cover the new container architecture, and share some debugging approaches I have found helpful when deployment failures are container related.

Wednesday, 27 September 2017

OpenStack Days UK

OpenStack Days UK

Yesterday I attended the OpenStack Days UK event, held in London.  It was a very good day and there were a number of interesting talks, and it provided a great opportunity to chat with folks about OpenStack.

I gave a talk, titled "Deploying OpenStack at scale, with TripleO, Ansible and Containers", where I gave an update of the recent rework in the TripleO project to make more use of Ansible and enable containerized deployments.

I'm planning some future blog posts with more detail on this topic, but for now here's a copy of the slide deck I used, also available on github.



Thursday, 11 May 2017

OpenStack Summit - TripleO Project Onboarding

We've been having a productive week here in Boston at the OpenStack Summit, and one of the sessions I was involved in was a TripleO project Onboarding session.

The project onboarding sessions are a new idea for this summit, and provide the opportunity for new or potential contributors (and/or users/operators) to talk with the existing project developers and get tips on how to get started as well as ask any questions and discuss ideas/issues.

The TripleO session went well, and I'm very happy to report it was well attended and we had some good discussions.  The session was informal with an emphasis on questions and some live demos/examples, but we did also use a few slides which provide an overview and some context for those new to the project.

Here are the slides used (also on my github), unfortunately I can't share the Q+A aspects of the session as it wasn't recorded, but I hope the slides will prove useful - we can be found in #tripleo on Freenode if anyone has questions about the slides or getting started with TripleO in general.

Friday, 3 March 2017

Developing Mistral workflows for TripleO

During the newton/ocata development cycles, TripleO made changes to the architecture so we make use of Mistral (the OpenStack workflow API project) to drive workflows required to deploy your OpenStack cloud.

Prior to this change we had workflow defined inside python-tripleoclient, and most API calls were made directly to Heat.  This worked OK but there was too much "business logic" inside the client, which doesn't work well if non-python clients (such as tripleo-ui) want to interact with TripleO.

To solve this problem, number of mistral workflows and custom actions have been implemented, which are available via the Mistral API on the undercloud.  This can be considered the primary "TripleO API" for driving all deployment tasks now.

Here's a diagram showing how it fits together:

Overview of Mistral integration in TripleO


Mistral workflows and actions

There are two primary interfaces to mistral, workflows which are a yaml definition of a process or series of tasks, and actions which are a concrete definition of how to do a specific task (such as call some OpenStack API).

Workflows and actions can defined directly via the mistral API, or a wrapper called a workbook.  Mistral actions are also defined via a python plugin interface, which TripleO uses to run some tasks such as running jinja2 on tripleo-heat-templates prior to calling Heat to orchestrate the deployment.

Mistral workflows, in detail

Here I'm going to show how to view and interact with the mistral workflows used by TripleO directly, which is useful to understand what TripleO is doing "under the hood" during a deployment, and also for debugging/development.

First we view the mistral workbooks loaded into Mistral - these contain the TripleO specific workflows and are defined in tripleo-common


[stack@undercloud ~]$ . stackrc 
[stack@undercloud ~]$ mistral workbook-list
+----------------------------+--------+---------------------+------------+
| Name                       | Tags   | Created at          | Updated at |
+----------------------------+--------+---------------------+------------+
| tripleo.deployment.v1      | <none> | 2017-02-27 17:59:04 | None       |
| tripleo.package_update.v1  | <none> | 2017-02-27 17:59:06 | None       |
| tripleo.plan_management.v1 | <none> | 2017-02-27 17:59:09 | None       |
| tripleo.scale.v1           | <none> | 2017-02-27 17:59:11 | None       |
| tripleo.stack.v1           | <none> | 2017-02-27 17:59:13 | None       |
| tripleo.validations.v1     | <none> | 2017-02-27 17:59:15 | None       |
| tripleo.baremetal.v1       | <none> | 2017-02-28 19:26:33 | None       |
+----------------------------+--------+---------------------+------------+

The name of the workbook constitutes a namespace for the workflows it contains, so we can view the related workflows using grep (I also grep for tag_node to reduce the number of matches).


[stack@undercloud ~]$ mistral workflow-list | grep "tripleo.baremetal.v1" | grep tag_node
| 75d2566c-13d9-4aa3-b18d-8e8fc0dd2119 | tripleo.baremetal.v1.tag_nodes                            | 660c5ec71ce043c1a43d3529e7065a9d | <none> | tag_node_uuids, untag_nod... | 2017-02-28 19:26:33 | None       |
| 7a4220cc-f323-44a4-bb0b-5824377af249 | tripleo.baremetal.v1.tag_node                             | 660c5ec71ce043c1a43d3529e7065a9d | <none> | node_uuid, role=None, que... | 2017-02-28 19:26:33 | None       |  

When you know the name of a workflow, you can inspect the required inputs, and run it directly via a mistral execution, in this case we're running the tripleo.baremetal.v1.tag_node workflow, which modifies the profile assigned in the ironic node capabilities (see tripleo-docs for more information about manual tagging of nodes)


[stack@undercloud ~]$ mistral workflow-get tripleo.baremetal.v1.tag_node
+------------+------------------------------------------+
| Field      | Value                                    |
+------------+------------------------------------------+
| ID         | 7a4220cc-f323-44a4-bb0b-5824377af249     |
| Name       | tripleo.baremetal.v1.tag_node            |
| Project ID | 660c5ec71ce043c1a43d3529e7065a9d         |
| Tags       | <none>                                   |
| Input      | node_uuid, role=None, queue_name=tripleo |
| Created at | 2017-02-28 19:26:33                      |
| Updated at | None                                     |
+------------+------------------------------------------+
[stack@undercloud ~]$ ironic node-list
+--------------------------------------+-----------+---------------+-------------+--------------------+-------------+
| UUID                                 | Name      | Instance UUID | Power State | Provisioning State | Maintenance |
+--------------------------------------+-----------+---------------+-------------+--------------------+-------------+
| 30182cb9-eba9-4335-b6b4-d74fe2581102 | control-0 | None          | power off   | available          | False       |
| 19fd7ea7-b4a0-4ae9-a06a-2f3d44f739e9 | compute-0 | None          | power off   | available          | False       |
+--------------------------------------+-----------+---------------+-------------+--------------------+-------------+
[stack@undercloud ~]$ mistral execution-create tripleo.baremetal.v1.tag_node '{"node_uuid": "30182cb9-eba9-4335-b6b4-d74fe2581102", "role": "test"}'
+-------------------+--------------------------------------+
| Field             | Value                                |
+-------------------+--------------------------------------+
| ID                | 6a141065-ad6e-4477-b1a8-c178e6fcadcb |
| Workflow ID       | 7a4220cc-f323-44a4-bb0b-5824377af249 |
| Workflow name     | tripleo.baremetal.v1.tag_node        |
| Description       |                                      |
| Task Execution ID | <none>                               |
| State             | RUNNING                              |
| State info        | None                                 |
| Created at        | 2017-03-03 09:53:10                  |
| Updated at        | 2017-03-03 09:53:10                  |
+-------------------+--------------------------------------+

At this point the mistral workflow is running, and it'll either succeed or fail, and also create some output (which in the TripleO model is sometimes returned to the Ux via a Zaqar queue).  We can view the status, and the outputs (truncated for brevity):


[stack@undercloud ~]$ mistral execution-list | grep  6a141065-ad6e-4477-b1a8-c178e6fcadcb
| 6a141065-ad6e-4477-b1a8-c178e6fcadcb | 7a4220cc-f323-44a4-bb0b-5824377af249 | tripleo.baremetal.v1.tag_node                           |                        | <none>                               | SUCCESS | None       | 2017-03-03 09:53:10 | 2017-03-03 09:53:11 |
[stack@undercloud ~]$ mistral execution-get-output 6a141065-ad6e-4477-b1a8-c178e6fcadcb
{
    "status": "SUCCESS", 
    "message": {
...

So that's it - we ran a mistral workflow, it suceeded and we looked at the output, now we can see the result looking at the node in Ironic, it worked! :)


[stack@undercloud ~]$ ironic node-show 30182cb9-eba9-4335-b6b4-d74fe2581102 | grep profile
|                        | u'cpus': u'2', u'capabilities': u'profile:test,cpu_hugepages:true,boot_o |

 

Mistral workflows, create your own!

Here I'll show how to develop your own custom workflows (which isn't something we expect operators to necessarily do, but is now part of many developers workflow during feature development for TripleO).

First, we create a simple yaml definition of the workflow, as defined in the v2 Mistral DSL - this example lists all available ironic nodes, then finds those which match the "test" profile we assigned in the example above:


This example uses the mistral built-in "ironic" action, which is basically a pass-through action exposing the python-ironicclient interfaces.  Similar actions exist for the majority of OpenStack python clients, so this is a pretty flexible interface.

Now we can now upload the workflow (not wrapped in a workbook this time, so we use workflow-create), run it via execution create, then look at the outputs - we can see that  the matching_nodes output matches the ID of the node we tagged in the example above - success! :)

[stack@undercloud tripleo-common]$ mistral workflow-create shtest.yaml 
+--------------------------------------+-------------------------+----------------------------------+--------+--------------+---------------------+------------+
| ID                                   | Name                    | Project ID                       | Tags   | Input        | Created at          | Updated at |
+--------------------------------------+-------------------------+----------------------------------+--------+--------------+---------------------+------------+
| 2b8f2bea-f3dd-42f0-ad16-79987c75df4d | test_nodes_with_profile | 660c5ec71ce043c1a43d3529e7065a9d | <none> | profile=test | 2017-03-03 10:18:48 | None       |
+--------------------------------------+-------------------------+----------------------------------+--------+--------------+---------------------+------------+
[stack@undercloud tripleo-common]$ mistral execution-create test_nodes_with_profile
+-------------------+--------------------------------------+
| Field             | Value                                |
+-------------------+--------------------------------------+
| ID                | 2392ed1c-96b4-4787-9d11-0f3069e9a7e5 |
| Workflow ID       | 2b8f2bea-f3dd-42f0-ad16-79987c75df4d |
| Workflow name     | test_nodes_with_profile              |
| Description       |                                      |
| Task Execution ID | <none>                               |
| State             | RUNNING                              |
| State info        | None                                 |
| Created at        | 2017-03-03 10:19:30                  |
| Updated at        | 2017-03-03 10:19:30                  |
+-------------------+--------------------------------------+
[stack@undercloud tripleo-common]$ mistral execution-list | grep  2392ed1c-96b4-4787-9d11-0f3069e9a7e5
| 2392ed1c-96b4-4787-9d11-0f3069e9a7e5 | 2b8f2bea-f3dd-42f0-ad16-79987c75df4d | test_nodes_with_profile                                 |                        | <none>                               | SUCCESS | None       | 2017-03-03 10:19:30 | 2017-03-03 10:19:31 |
[stack@undercloud tripleo-common]$ mistral execution-get-output 2392ed1c-96b4-4787-9d11-0f3069e9a7e5
{
    "matching_nodes": [
        "30182cb9-eba9-4335-b6b4-d74fe2581102"
    ], 
    "available_nodes": [
        "30182cb9-eba9-4335-b6b4-d74fe2581102", 
        "19fd7ea7-b4a0-4ae9-a06a-2f3d44f739e9"
    ]
}

Using this basic example, you can see how to develop workflows which can then easily be copied into the tripleo-common workbooks, and integrated into the TripleO deployment workflow.

In a future post, I'll dig into the use of custom actions, and how to develop/debug those.

Monday, 10 October 2016

TripleO composable/custom roles

This is a follow-up to my previous post outlining the new composable services interfaces , which covered the basics of the new for Newton composable services model.

The final piece of the composability model we've been developing this cycle is the ability to deploy user-defined custom roles, in addition to (or even instead of) the built in TripleO roles (where a role is a group of servers, e.g "Controller", which runs some combination of services).

What follows is an overview of this new functionality, the primary interfaces, and some usage examples and a summary of future planned work.


Thursday, 1 September 2016

Complex data transformations with nested Heat intrinsic functions

Disclaimer, what follows is either pretty neat, or pure-evil depending your your viewpoint ;)  But it's based on a real use-case and it works, so I'm posting this to document the approach, why it's needed, and hopefully stimulate some discussion around optimizations leading to a improved/simplified implementation in the future.


Friday, 12 August 2016

TripleO Deploy Artifacts (and puppet development workflow)



For a while now, TripleO has supported a "DeployArtifacts" interface, aimed at making it easier to deploy modified/additional files on your overcloud, without the overhead of frequently rebuilding images.

This started out as a way to enable faster iteration on puppet module development (the puppet modules are by default stored inside the images deployed by TripleO, and generally you'll want to do development in a git checkout on the undercloud node), but it is actually a generic interface that can be used for a variety of deployment time customizations.


Friday, 5 August 2016

TripleO Composable Services 101

Over the newton cycle, we've been working very hard on a major refactor of our heat templates and puppet manifiests, such that a much more granular and flexible "Composable Services" pattern is followed throughout our implementation.

It's been a lot of work, but it's been a frequently requested feature for some time, so I'm excited to be in a position to say it's complete for Newton (kudos to everyone involved in making that happen!) :)

This post aims to provide an introduction to this work, an overview of how it works under the hood, some simple usage examples and a roadmap for some related follow-on work.


Thursday, 9 June 2016

TripleO partial stack updates

Recently I was asked if it's possible to do a partial update of a TripleO overcloud - the answer is yes, so I thought I'd write a post showing how to do it.  Much of what follows is basically an update on my old post on nested resource introspection (some interfaces have changed a bit since I wrote that), combined with an introduction to heat PATCH updates.

Partial update?!  Why?

So, the first question is why would you do this - TripleO heat templates are designed to enforce a consistent state for your entire OpenStack deployment, so in most cases you really should update the entire overcloud, and not mess with the underlying nested stacks directly.

However, for some development usage, this creates a long feedback loop - you change something (perhaps one line in a puppet manifest or heat template), then have to wait several minutes for Heat to walk the entire tree of nested stacks, puppet to run all steps on all nodes, etc.

So, while you would probably never do this in production (seriously, please don't!), it can be a useful technique for developers seeking a quicker hack-then-test cycle, and also when attempting to isolate root-causes for some subset of overcloud stack update behavior.

Ok, with that disclaimer clearly stated, here's how you do it:

Step 1 - Find the nested stack to update


Lets take a specific example - I want to update only the ControllerNodesPostDeployment resource which is defined in overcloud.yaml - this is a resource that maps to a nested stack that uses the cluster configuration interfaces I described in this previous post to apply puppet in a series of steps to all controller nodes.

Here's our overcloud (some CLI output removed for brevity):

$ heat stack-list
| 01c51e7e-ad2f-41d3-b056-3c4c84395114 | overcloud  | CREATE_COMPLETE |
2016-06-08T18:07:00 | None         |








Here's the ControllerNodesPostDeployment resource:


$ heat resource-list overcloud | grep ControllerNodesPost
| ControllerNodesPostDeployment             |
e67fff24-8089-4cf8-adf4-9c6064bf01d6          |
OS::TripleO::ControllerPostDeployment             | CREATE_COMPLETE |
2016-06-08T18:07:00 |
e67fff24-8089-4cf8-adf4-9c6064bf01d6 is the resource ID of
ControllerNodesPostDeployment, which is a nested stack - you can confirm
this via:

$ heat stack-list -n | grep "^| e67fff24-8089-4cf8-adf4-9c6064bf01d6"
| e67fff24-8089-4cf8-adf4-9c6064bf01d6 |
overcloud-ControllerNodesPostDeployment-smy5ygz2lc26
| UPDATE_COMPLETE | 2016-06-08T18:10:34 | 2016-06-09T08:52:45 |
01c51e7e-ad2f-41d3-b056-3c4c84395114 |

Note here the first column is the stack ID, and the last is the parent
stack ID (e.g "overcloud" above).

overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 is the name of the stack that implements ControllerNodesPostDeployment - we can refer to it by either that name or the ID (e67fff24-8089-4cf8-adf4-9c6064bf01d6).

Step 2 - Basic update of the stack

Heat supports PATCH updates, so it is possible to trigger a no-op update without passing any template or parameters (the existing data will be used), or to patch in some specific modification.

Here's now it works, we simply use either the name or ID we discovered above, and use heat stack-update (or the new openstack client equivalent commands.

First, however, we want to get the last event ID before triggering the update (or, on recent heatclient versions you can instead use openstack stack event list --follow):

$ heat event-list overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 | tac | head -n2
+------------------------------------------------------+--------------------------------------+-------------------------------------+--------------------+---------------------+
| overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 | 89e535ef-d414-4121-b726-9924eccb4fc3 | Stack UPDATE completed successfully | UPDATE_COMPLETE    | 2016-06-09T09:09:09 |


So the last event logged by this nested stack has the ID of 89e535ef-d414-4121-b726-9924eccb4fc3 - we can use this as a marker so we hide all previous events for the stack:

 $ heat event-list -m 89e535ef-d414-4121-b726-9924eccb4fc3 overcloud-ControllerNodesPostDeployment-smy5ygz2lc26
+----+------------------------+-----------------+------------+
| id | resource_status_reason | resource_status | event_time |
+----+------------------------+-----------------+------------+
+----+------------------------+-----------------+------------+

 Now, we can trigger the update, and use the marker event-list to follow progress:

heat stack-update -x overcloud-ControllerNodesPostDeployment-smy5ygz2lc26

<wait a short time>

$ heat event-list -m 89e535ef-d414-4121-b726-9924eccb4fc3 overcloud-ControllerNodesPostDeployment-smy5ygz2lc26
+------------------------------------------------------+
| resource_name | id | resource_status_reason | resource_status | event_time |
+------------------------------------------------------+
| overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 | 2e08a022-ce0a-4e57-bf30-719fea6cbb74 | Stack UPDATE started | UPDATE_IN_PROGRESS | 2016-06-09T10:00:52 |
| ControllerArtifactsConfig | a55f9b17-f26c-4664-9ea5-535949c368e8 | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:00 |
| ControllerPuppetConfig | 21679c7f-c354-4319-9688-7fa290168664 | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:00 |
| ControllerPuppetConfig | f5761452-91dd-45dc-92e8-a5c371fa5004 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:02 |
| ControllerArtifactsConfig | 01abec3c-f472-4ec2-893d-0fddb8fc1696 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:02 |
| ControllerArtifactsDeploy | f8f7a21f-9169-4f8c-ab46-46ecbb141be8 | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:02 |
| ControllerArtifactsDeploy | 75937a57-e2f0-4d66-9b4c-2308593e56b1 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:04 |
| ControllerLoadBalancerDeployment_Step1 | 6058e29c-cded-4ad3-94d9-65909fd4911d | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:04 |
| ControllerLoadBalancerDeployment_Step1 | c9f93f1f-177c-4721-827f-a7d409b2cd50 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:06 |
| ControllerServicesBaseDeployment_Step2 | 92409e4c-24f2-4e68-bad9-47ce09107d7a | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:06 |
| ControllerServicesBaseDeployment_Step2 | a9203aa1-c438-47c0-977b-8e34669777bc | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:08 |
| ControllerOvercloudServicesDeployment_Step3 | aa7d78dc-d243-4d54-8ea6-3b59a6ed302a | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:08 |
| ControllerOvercloudServicesDeployment_Step3 | 4a1a6885-29d7-4708-a884-01f481ac1b35 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:10 |
| ControllerOvercloudServicesDeployment_Step4 | 7afd52c1-cbbc-431a-a22c-dd7459ed2255 | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:10 |
| ControllerOvercloudServicesDeployment_Step4 | 0dac2e72-0919-4e91-ac94-100d8d811c67 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:13 |
| ControllerOvercloudServicesDeployment_Step5 | ec57867f-e401-4756-bd30-0a566eced343 | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:13 |
| ControllerOvercloudServicesDeployment_Step5 | 427582fb-acd1-4939-a13c-7b3cbbc7527b | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:15 |
| ExtraConfig | 760fd961-fff6-4f4c-848e-80773e09e04b | state changed | UPDATE_IN_PROGRESS | 2016-06-09T10:01:15 |
| ExtraConfig | caee58b6-01bb-4805-b41f-4c48a8c7d767 | state changed | UPDATE_COMPLETE | 2016-06-09T10:01:16 |
| overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 | 35f527a5-0761-46bb-aecb-6eee0e0f083e | Stack UPDATE completed successfully | UPDATE_COMPLETE | 2016-06-09T10:01:25 |


So, we can see that we triggered an update on the nested stack, and it ran to completion in around 30 seconds (much less time than updating the entire overcloud).

Step 3 - Update of the stack with modifications

So, those paying attention may have noticed that 30 seconds is too fast for puppet to run on all the controller nodes, and it is - the reason being that we did a no-op update, and so Heat detects that no inputs have changed, thus it doesn't cause puppet to re-run.

To work around this, and enable puppet to re-assert state on every overcloud update, we have an identifier in the nested stack that is normally updated to a value that changes every update (in includes a timestamp when updates are triggered via python-tripleoclient vs heatclient directly)

We can emulate this behavior in our patch update, and force puppet to re-run through all the deployment steps - lets first look at the NodeConfigIdentifers parameter value:


$ heat stack-show overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 | grep NodeConfigIdentifiers
"NodeConfigIdentifiers": "{u'deployment_identifier': u'1465409217', u'controller_config': {u'0': u'os-apply-config deployment bb67a1d5-f0a5-48ec-9883-1f2ae578a8bd complet ed,Root CA cert injection not enabled.,TLS not enabled.,None,'}, u'allnodes_extra': u'none'}"

Here we can see various data, including a deployment_identifier, which is the timestamp-derived unique identifier normally passed via python-tripleoclient.

We could update just that field, but the content of this mapping isn't important, only that it changes (this data is not currently consumed by puppet on update, it's just used to trigger the SoftwareDeployment to re-apply the config due to an input value changing).

So we can create an environment file that looks like this (note this must use parameters, not parameter_defaults, so that it overrides the value passed from the parent stack) - any value can be used, but you must change it each update if you want the SoftwareDeployment resources to be re-applied to the nodes.

$ cat update_env.yaml
parameters:
  NodeConfigIdentifiers: 123







Then we can trigger another PATCH update including this data:

heat stack-update -x overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 -e update_env.yaml

This time I'm using the new openstack stack event list --follow approach to monitor progress (if you don't have this, you can repeat the marker event-list approach described above):


$ openstack stack event list --follow2016-06-09 08:52:46 [overcloud-ControllerNodesPostDeployment-smy5ygz2lc26]: UPDATE_IN_PROGRESS  Stack UPDATE started
2016-06-09 08:52:54 [ControllerPuppetConfig]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:52:54 [ControllerArtifactsConfig]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:52:56 [ControllerPuppetConfig]: UPDATE_COMPLETE  state changed
2016-06-09 08:52:56 [ControllerArtifactsConfig]: UPDATE_COMPLETE  state changed
2016-06-09 08:52:56 [ControllerArtifactsDeploy]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:52:58 [ControllerArtifactsDeploy]: UPDATE_COMPLETE  state changed
2016-06-09 08:52:58 [ControllerLoadBalancerDeployment_Step1]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:53:32 [ControllerLoadBalancerDeployment_Step1]: UPDATE_COMPLETE  state changed
2016-06-09 08:53:32 [ControllerServicesBaseDeployment_Step2]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:54:00 [ControllerServicesBaseDeployment_Step2]: UPDATE_COMPLETE  state changed
2016-06-09 08:54:00 [ControllerOvercloudServicesDeployment_Step3]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:54:57 [ControllerOvercloudServicesDeployment_Step3]: UPDATE_COMPLETE  state changed
2016-06-09 08:54:57 [ControllerOvercloudServicesDeployment_Step4]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:56:14 [ControllerOvercloudServicesDeployment_Step4]: UPDATE_COMPLETE  state changed
2016-06-09 08:56:14 [ControllerOvercloudServicesDeployment_Step5]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:57:16 [ControllerOvercloudServicesDeployment_Step5]: UPDATE_COMPLETE  state changed
2016-06-09 08:57:16 [ExtraConfig]: UPDATE_IN_PROGRESS  state changed
2016-06-09 08:57:17 [ExtraConfig]: UPDATE_COMPLETE  state changed
2016-06-09 08:57:26 [overcloud-ControllerNodesPostDeployment-smy5ygz2lc26]: UPDATE_COMPLETE  Stack UPDATE completed successfully

So, here we can see the update of the stack took a little longer (around 5 minutes in my environment), and if you were to check the os-collect-config logs on each controller node, you would see puppet re-applying on each node, fore every step defined in the template.

This approach can be extended if you want to e.g test changes to the stack template (or files it references such as puppet manifests or scripts), you would do something like:

$ cp -r /usr/share/openstack-tripleo-heat-templates .
$ cd openstack-tripleo-heat-templates/
$ heat stack-update -x overcloud-ControllerNodesPostDeployment-smy5ygz2lc26 -e update_env.yaml -f puppet/controller-post.yaml

Note that if you want to do a final update of the entire overcloud, you would need to point to this copied tree (assuming you want to maintain any changes), e.g

$ openstack overcloud deploy --templates /path/to/copy/openstack-tripleo-heat-templates


Sunday, 17 May 2015

TripleO Heat templates Part 3 - Cluster configuration, introduction/primer

In my previous two posts I covered an overview of TripleO template roles and groups, and specifics of how initial deployment of a node happens.  Today I'm planning to introduce the next step of the deployment process - taking the deployed groups of nodes, and configuring them to work together as clusters running the various OpenStack services encapsulated by each role.

This post will provide an introduction to the patterns and Heat features used to configure the groups of nodes, then in the next instalment I'll dig into the specifics of exactly what configuration takes place in the TripleO heat templates.


Wednesday, 13 May 2015

TripleO Heat templates Part 2 - Node initial deployment & config

In my previous post "TripleO Heat templates Part 1 - roles and groups", I provided an overview of the various TripleO roles, the way the role implementation is abstracted via provider resources, and how they are grouped and scaled via OS::Heat::ResourceGroup.

In this post, I'm aiming to dig into the next level of template implementation, specifically how a role is implemented behind the provider resource alias used in the top-level template.

I'm only going to cover one role type for now OS::TripleO::Controller, because the patterns described are directly applicable to all other role types.  I'm also going to focus on the puppet based implementation (because that's what I'm most familiar with), but again most concepts apply to the element/container/etc based implementations too.

Throughout this post, I'll be referring to templates in the tripleo-heat-templates repo, so if you haven't already, now might be a good time to clone that so you can follow along looking at the templates themselves.

Thursday, 7 May 2015

TripleO Heat templates Part 1 - Roles and Groups

This is the start of a series of posts aiming to de-construct the TripleO heat templates, explaining the abstractions that exist,and the heat features which enable them.

If you're not already a little familiar with ResourceGroups, "Provider Resources" used for template composition, and SoftwareConfig resources, it's probably not a bad idea to check out my previous posts on those topics, as well as our user guide and other documentation - TripleO makes heavy use of all of these features.

Overcloud "Roles"



TripleO typically refers to the deployed OpenStack cloud as the "overcloud", because the tools used to perform that deployment mirror those in the deployed cloud - e.g a small OpenStack is used to bootstrap and manage a bigger one (normally the small OpenStack is called either a "seed" or "undercloud", depending on your environment).

The definition of what is deployed in your overcloud exists in a number of Heat templates, with the top-level one defining a number of groups of different node types, or "roles".

  • Controller: Contains API services, e.g Keystone, Neutron, Heat, Glance, Ceilometer, Horizon, and the API parts of Nova, Cinder & Swift.  It can also optionally host the storage parts for Cinder, Swift and Ceph if these are not deployed separately (see below).
  • Compute: Contains the Nova Hypervisor components
  • BlockStorage: Contains the Cinder storage components (if not hosted on the Controller(s).
  • ObjectStorage: Contains the Swift storage components (if not hosted on the Controllers(s).
  • CephStorage: Contains the Ceph storage components (if not hosted on the Controllers(s).

Roles & resource types


Each of the roles (or node types), are mapped to a a type defined in the resource_registry in the environment passed to Heat.

So, for example, the "Controller" role is defined in the heat template as a type OS::TripleO::Controller, and similar aliases exist for all the other roles.

The resource registry maps this type alias to another heat template, which implements whatever is required to deploy one node with that role.

So to create a node type "OS::TripleO::Controller" Heat may create a stack based on the template in "puppet/controller-puppet.yaml", or some other implementation based on whatever mapping exists in the resource_registry.

This makes it very easy if you want to plug in some alternate implementation, while maintaining the top-level template interfaces and deployment topology.  For example, work is currently in-progress implementing an alternate implementation using docker containers, as an alternative to the existing puppet and element based impelementations.







Roles & ResourceGroups


Each of these roles may be independently scaled - because they are defined in an OS::Heat::ResourceGroup.  The minimum you can deploy is one "Controller" and one "Compute" node (some roles may be deployed with zero nodes in the group).

Here's an example of what that looks like in the top level "overcloud-without-mergepy" template (this is the name of the main template TripleO uses to deploy OpenStack, the "without-mergepy" part is historical and refers to an older, now deprecated, implementation.)

  Controller:
    type: OS::Heat::ResourceGroup
    properties:
      count: {get_param: ControllerCount}
      resource_def:
        type: OS::TripleO::Controller
        properties:
          AdminPassword: {get_param: AdminPassword}
          AdminToken: {get_param: AdminToken}

          ...


Here, you can see we've defined a group of OS::TripleO::Controller resources in an OS::Heat::ResourceGroup, and the number of nodes deployed is controlled via a template parameter, "ControllerCount", and similarly a number of template parameters are referenced to provide input properties to enable configuration of the deployed controller node (I've abbreviated the full list of properties).

This pattern is repeated for all roles, so building a specified number of nodes for a particular role (or adding/removing them via a stack-update), is as simple as passing a different number into Heat as a stack parameter :)

That's all, folks


That's all for today - hopefully it provides an overview of the top-level interfaces provided by the TripleO Heat templates, and illustrates the following:

  • There are clearly defined node "roles", containing the various parts of your OpenStack deployment
  • The patterns used to define and implement these roles are repeated, which helps understand the templates despite them being fairly large.
  • The implementation is modular, and abstractions exist which make implementing different "back end" implementations relatively simple.
  • Deployments can be easily scaled due to using Heat's ResourceGroup functionality.
In future instalments I'll dig further into the individual node implementations, ways to easily plug in site-specific additional configuration, and ways in which you can control and validate the deployments performed via TripleO.

Tuesday, 5 May 2015

Heat SoftwareConfig resources - primer/overview.

In this post, I'm going to provide an overview of Heat's Software Configuration resources, as a preface to digging in more detail into the structure of TripleO heat templates, which leverage SoftwareConfig functionality to install and configure the deployed OpenStack cloud.

Heat has supported SoftwareConfig and SoftwareDeployment resources since the Icehouse release, in an effort to provide flexible and non-opinionated abstractions which enable integration with existing software configuration tools and scripts.

Monday, 20 April 2015

Debugging TripleO Heat templates

Lately, I've been spending increasing amounts of time working with TripleO heat templates, and have noticed some recurring aspects of my workflow whilst debugging them which I thought may be worth sharing.

For the uninitiated, TripleO is an OpenStack deployment project, which aims to deploy and manage OpenStack using standard OpenStack API's.  In practice, this means using Nova and Ironic for baremetal node provisioning, and Heat to orchestrate the deployment and configuration of the nodes.

The TripleO heat templates, unlike most of the heat examples, are pretty complex.  They make extensive use of many "advanced" features, such as nested stacks, using provider resources via the environment and also many software config resources.

This makes TripleO a fairly daunting target to those wishing to debug and modify and/or debug the TripleO templates.

Fortunately TripleO templates, although large, have many repeated patterns, and good levels of abstraction and modularity.  Combined with some recently added heat interfaces, it becomes rapidly less daunting, as I will demonstrate in the worked example below:


Thursday, 11 September 2014

Using Heat ResourceGroup resources


This has come up a few times recently, so I wanted to share a howto showing how (and how not) to use the group resources in Heat, e.g OS::Heat::ResourceGroup and OS::Heat::AutoScalingGroup.

The key thing to remember when dealing with these resources is that they can multiply any number of resources (expressed as a heat stack), not just individual resources. This is a very cool feature when you get your head around it! :)

Lets go through a worked example, where we use ResourceGroup to create 5 identical servers, each with a cinder volume of the same size attached.

Wednesday, 16 April 2014

Heat auth model updates - part 2 "Stack Domain Users"

As promised, here's the second part of my updates on the Heat auth model, following on from part 1 describing our use of Keystone trusts.

This post will cover details of the recently implemented instance-users blueprint, which makes use of keystone domains to contain users related to credentials which are deployed inside instances created by heat.  If you just want to know how the new stuff works, you can skip to the last sections :)

So...why does heat create users at all?

Lets start with a bit of context.  Heat has historically needed to do some or all of the following:
  1. Provide metadata to agents inside instances, which poll for changes and apply the configuration expressed in the metadata to the instance.
  2. Signal completion of some action, typically configuration of software on a VM after it is booted (because nova moves the state of a VM to "Active" as soon as it spawns it, not when heat has fully configured it)
  3. Provide application level status or metrics from inside the instance, e.g to allow AutoScaling actions to be performed in response to some measure of performance or quality of service.
Heat provides API's which enable all of these things, but all of those API's require some sort of authentication, e.g credentials so whatever agent is running on the instance is able to access it.  So credentials must be deployed inside the instance, e.g here's how things work if you're using the heat-cfntools agents:

heat-cfntools agents data-flow with CFN-compatible API's

Wednesday, 9 April 2014

Heat auth model updates - part 1 Trusts

Over the last few months I've spent a lot of my time looking at ways to rework the heat auth model, in an attempt to solve two long-standing issues:


  1. Requirement to pass a password when creating a stack which may perform deferred orchestration actions (for example AutoScaling adjustments)
  2. Requirement for users to have administrative roles when creating certain types of resource.


So, fixes to these issues have been happening (in Havana and Icehouse respectively), but discussions with various folks indicates significant confusion re differentiating the two changes, probably because I've not got around to writing up the documentation yet (it's in progress, honest!) ;)

In an attempt to clear up the confusion, and provide some documentation ahead of the upcoming Icehouse Heat release, I'm planning to cover each feature in this and a subsequent post - below is a discussion of the "Requirement to pass a password" problem, and the method used to solve it.

Wednesday, 2 October 2013

Heat Providers/Environments 101

I've recently been experimenting with some cool new features we've added to Heat over the Havana cycle, testing things out in preparation for the Havana Release.

One potentially very powerful new abstraction is the Provider Resource method of defining nested stack resources.  Combined with the new environments capability to map template resource names to non-default implementations, it provides a very flexible way for both users and those deploying Heat to define custom resources based on Heat templates.

Firstly let me clarify what nested stacks are, following on from my previous post, and why we decided to provide this native interface to the functionality, rather than something similar to the existing Cloudformation compatible resource interface.

So nested stack resources enable you to specify a URL of another heat template, which will be used to create another Heat stack, owned by the stack which defines the AWS::CloudFormation::Stack.  This provides a way to implement composed Heat templates, and to reuse logically related template snippets.

The interface looks like this (using the new native HOT template syntax):

resources:
  nested:
    type: AWS::CloudFormation::Stack
    properties:
      TemplateURL: http://somewhere/something.yaml

There are several disadvantages to this interface:
  • Hard-coded URLs in your template
  • You have to have the nested templates accessible via a web-server somewhere
  • Hard to transparently substitute different versions of the nested implementation (without sedding! ;)
  • Passing Parameters is kind-of awkward (pass a nested map of parameter values)
  • Does not provide a way to define resources based on stack templates.
So we noticed something, which was the interface to stack templates is essentially very similar to the interface to resources, stack parameters map to resource properties, and stack outputs map to resource attributes.  Provider resources leverage this symmetry to provide a more flexible (and arguably easier to use) native interface to nested stack resources.

Ok, so how does it work!  Probably the easiest explanation is a simple worked example.

Say you define a simple template, like this example in our heat-templates repo, and you want to reuse it, as a nested stack via the Providers functionality, you simply need to do this:

Define an environment

The environment is used to map resource names to templates, and optionally can be used to define common stack parameters, so that they don't need to be passed every time you create a stack.

A user may pass an environment when creating (or updating) a stack, and a global environment may also be specified by the deployer (default location is /etc/heat/environment.d), using the same syntax.  The environment may override existing resources.

The resource_registry section is used to map resource names to template files:

resource_registry:
    My::WP::Server: https://raw.github.com/openstack/heat-templates/master/hot/F18/WordPress_Native.yaml

Or you can refer to a local file (local to the user running python-heatclient, which reads the file and attaches the content to the Heat API call creating the stack in the "files" parameter of the request):

resource_registry:
    My::WP::Server: file:///home/shardy/git/heat-templates/hot/F18/WordPress_Native.yaml

There are also some other possibilities, for example aliasing one resource name to another, which are described in our documentation.

Create Stack

Now you can simply create a stack template which references My::WP::Server:

# cat minimal_test.yaml 
heat_template_version: 2013-05-23

description: >
  Heat WordPress template, demonstrating Provider Resource.

parameters:
  user_key:
    type: string
    description : Name of a KeyPair to enable SSH access to the instance

resources:
  wordpress_instance:
    type: My::WP::Server
    properties:
        key_name: {get_param: user_key}

With an environment file:

# cat env_minimal.yaml 
resource_registry:
    My::WP::Server: file:///home/shardy/git/heat-templates/hot/F18/WordPress_Native.yaml

And then create the stack:

# heat stack-create test_stack1 --template-file=./minimal_test.yaml --environment-file=./env_minimal.yaml --parameters="user_key=$USER_key"

Optionally you could also specify the key via the environment too:


# cat env_key.yaml 
parameters:
    user_key: userkey
resource_registry:
    My::WP::Server: file:///home/shardy/git/heat-templates/hot/F18/WordPress_Native.yaml


heat stack-create test_stack2 --template-file=./minimal_test.yaml --environment-file=./env_key.yaml

This would create the nested template with a key_name parameter of "userkey"

So hopefully that provides an overview of this new feature, for more info please see our documentation and we're planning to add some example Provider/Environment examples to our example template repository soon.

Finally, kudos to Angus Salkeld, who implemented the majority of this functionality, thanks Angus! :)