Test Kitchen style testing for Salt

If you are already familiar with Test Kitchen then a lot of this guide should be straight forward.  ChefDK has most of the needed tools bundled up for you already, I recommend installing ChefDK and then extending it to work with Salt.

In addition to the Test Kitchen install dependencies, you will need to install the following (additional) gems in order to get Test Kitchen working with Salt:

  • kitchen-vagrant
  • kitchen-salt

Then create a “.kitchen.yml” file in your /srv/salt directory.  This file tells Test Kitchen how to load in its configuration so it can test out your Salt configurations.

Here is a sample of what your .kitchen.yml file might look like.

---
driver:
  name: vagrant

provisioner:
  name: salt_solo
  is_file_root: True
  pillars-from-files:
    base.sls: /srv/pillar/base.sls
  pillars:
  top.sls:
  base:
    '*':
      - base

platforms:
  - name: ubuntu-14.04

suites:
  - name: default

There is a good reference that describes the various options in the kitchen-salt docs.

I had to play around with this config to get things working correctly so you may need to make your own adjustments.  The key components are described in the “provisioner” section.  “is_file_root” is important because it tells the minion where to look for its configuration, it essentially says look at the top.sls file on the server that runs Test Kitchen.

Use “pillars-from-files” to manually add in any custom pillar data you have.  I had issues getting the default configuration to automatically add in pillar data so used this approach as a workaround.

Another caveat to mention here is that in order to get this method working I had to break the best practice of storing external Salt formulas in /srv/formulas and instead copy them directly in to the “root” diretory of /srv/salt.  So basically all of the logic and formulas will live in this base location.  If this point isn’t clear let me know and I can post more details.

Vagrant style testing

The next best alternative I have found to using the Salt driver for Test Kitchen is manually spinning up a customized vagrant box to test communication with the salt master or alternatively connecting via salt-ssh to run.

This method is a great compliment if you aren’t interested in running Salt in local mode and instead learning about and testing the salt-master and/or salt-ssh.  This method is also straight forward.

Here is what the custom config looks like for Vagrant.

# -*- mode: ruby -*-
# vi: set ft=ruby :

Vagrant.configure(2) do |config|

 # OS config
 config.vm.box = "ubuntu/trusty64"
 config.vm.hostname = "salt-minion"
 config.vm.network :private_network, ip: "192.168.33.10"

 # Copy Salt master files for masterless provisioning
 config.vm.synced_folder "/srv/salt/", "/srv/salt/"

 # Install/config Salt
 config.vm.provision :salt do |salt|
 salt.minion_config = "/etc/salt/minion"
 salt.run_highstate = false
 install_type = "daily"
 colorize = true

 # For remote master preseeding
 salt.minion_key = "salt-minion.pem"
 salt.minion_pub = "salt-minion.pub"

 # Debugging
 #salt.bootstrap_options = "-D"
 #salt.verbose = true

 end

 # Additional configuration
 config.vm.provision "shell", inline: "echo '192.168.1.170 salt' >> /etc/hosts"
 config.vm.provision "shell", inline: "apt-get install salt-ssh"

end

This config will do a few different things:

  • Configure a static address to make some testing easier
  • Dummy a host entry for your salt master
  • Bootstrap the salt installation
  • Copy over a centrally managed minion file (if you want to customize how the minion behaves)
  • Install salt-ssh if you want to play around with ssh functionality

Note:  To use salt-ssh you will need to create and entry in /etc/salt/roster for the Vagrant machine and set up credentials to connect.  All of the configuration options can be found in the Vagrant docs.  Obviously much more can be done in Vagrant but you will have to test all of the various options yourself to see what suits your needs.

To check current Salt keys, run the following commands on the master.  This should not return anything yet since we haven’t created the keys.

sudo salt-key -L

So with this configuration we are generating a key once and reusing it so we only need to accept the key once from the Salt master.  To generate the keys needed run the following command from the root vagrant directory.

sudo salt-key --gen-keys=salt-minion

Then to add the new entry on the Master (after bringing up the Vagrant box!):

sudo salt-key -a 'salt-minion'

Once this set of keys has been accepted, we can bring the minion VM up and down without having to worry about adding and deleting keys every time you need to test something.  Obviously this approach should not be taken outside of testing environments in to production.

Lastly, use this command to delete an old minion:

sudo salt-key -d salt-minion

Conclusion

Being new to Salt I found the combination of using the custom Vagrant box coupled with the Test Kitchen provisioner a great way to learn and also how to test Salt configurations.  The best part about using this method is that there is no additional work to getting the two methods to work together.  For example, after you have your directory structure set up correctly on the host system (master confg) then you will already have everything ready to go for the Test Kitchen as well as the Vagrant box method of testing.

I have found the combination to be very useful in my own learning so far of Salt.  Obviously this wont’ address all of the complexity of a deployment but is a great and easy way to get introduced to many of the concepts and ideas of Salt.

I am really enjoying Salt so far and I hope that readers can put some of my findings to help with their learning as well.

Read More

Quicktip: Manage Memory Usage with Supervisord

I have been using Supervisord for process management for quite a while now but had no idea it could manage memory usage (among other things) until just recently.

There is a Python project called Superlance which essentially adds some extra functionality to supervisord for managing processes and memory.  The docs are a little thin so I thought it would be a good idea to highlight some of the functionality for folks that just want a few examples of how it works or can be used in a useful way.

Obviously you will want to have supervisor installed and configured already.  That can be done with pip or via apt-get.  You will also need to make sure you have a proper [unix_http_server] section in your /etc/supervisor/supervisord.conf file.

To install Superlance (on Ubuntu 14.04).

sudo pip install superlance

This will download and install a handful of Python scripts that can then be plugged in to Supervisor.  Check the link above if you are interested in the other plugins.

Then you will need to add a section to your supervisor config for memmon to manage memory usgae.

[eventlistener:memmon]
command=memmon -p <program_name>=3GB
events=TICK_60

The “-p <program_name>” corresponds to the program header in your supervisor configuration.  There are other options available to manage group processes, etc. for more advanced use cases but this should cover most basic scenarios.

You will need to reload the supervisor configuration after your changes have been made.  Unforunately the supervisor process needs to be fully reloaded.

sudo supervisorctl reload

If you want to check that the the memmon script is available before restarting supervisor you can use reread.

sudo supervisorctl reread

I would suggest reading through the Superlance docs and checking out the other scripts.  This additional functionality really helps add another layer of functionality to supervisord that I didn’t know existed.

Read More

hello

DevOps Conferences

I did a post quite awhile ago that highlighted some of the cooler system admin and operations oriented conferences that I had on my radar at that time.  Since then I have changed jobs and am now currently in a DevOps oriented position, so I’d like to revisit the subject and update that list to reflect some of the cool conferences that are in the DevOps space.

I’d like to start off by saying first that even if you can’t make it to the bigger conferences, local groups and meet ups are also an excellent way to get out and meet other professionals that do what you do. Local groups are also an excellent way to stay in the loop on what’s current and also learn about what others are doing.  If you are interested in eventually becoming a presenter or speaker, local meet ups and groups can be a great way to get started.  There are numerous opportunities and communities (especially in bigger cities), check here for information or to see if there is a DevOps meet up near you.  If there is nothing near by, start one!  If you can’t find any DevOps groups look for Linux groups or developer groups and network from there, DevOps is beginning to become popular in broader circles.

After you get your feet wet with meet ups, the next place to start looking is conferences that sound like they might be interesting to you.  There are about a million different opportunities to choose from, from security conferences, developer conferences, server and network conferences, all the way down the line.  I am sticking with strictly DevOps related conferences because that is currently what I am interested and know the best.

Feel free to comment if I missed any conferences that you think should be on this list.

DevOps Days (Multiple dates)

Perhaps the most DevOps centric of all the conference list.  These conferences are a great way to meet with fellow DevOps professionals and network with them.  The space and industry is changing constantly and being on top of all of the changes is crucial to being successful.  Another nice thing about the DevOps days is that they are spread out around the country (and world) and spread out throughout the year so they are very accessible.  WARNING:  DevOps days are not tied to any one set of DevOps tools but rather the principles and techniques and how to apply them to different environments.  If you are looking for super in depth technical talks, this one may not be for you.

ChefConf (March)

The main Chef conference.  There are large conferences for the main configuration management tools but I chose to highlight Chef because that’s what we use at my job.  There are lots of good talks that have a Chef centered theme but also are great because the practices can be applied with other tools.  For example, there are many DevOps themes at ChefConf including continuous integration and deployment topics, how to scale environments, tying different tools together and just general configuration management techniques.  Highly recommend for Chef users, feel free to substitute the other big configuration management tool conferences here if Chef isn’t your cup of tea (Salt, Puppet, Ansible).

CoreOS Fest (May)

  • 2015 videos haven’t been posted yet

Admittedly, this is a much smaller and niche conference but is still awesome.  The conference is the first one put on by the folks at CoreOS and was designed to help the community keep up with what is going on in the CoreOS and container world.  The venue is pretty small but the content at this years conference was very good.  There were some epic announcements and talks at this years conference, including Tectonic announcements and Kubernetes deep dives, so if container technology is something you’re interested in then this conference would definitely be worth checking out.

Velocity (May)

This one just popped up on my DevOps conference radar.  I have been hearing good things about this conference for awhile now but have not had the opportunity to go to it.  It always has interesting speakers and topics and a number of the DevOps thought leaders show up for this event.  One cool thing about this conference is that there are a variety of different topics at any one time so it offers a nice, wide spectrum of information.  For example, there are technical tracks covering different areas of DevOps.

DockerCon (June)

Docker has been growing at a crazy pace so this seems like the big conference to go check out if you are in the container space.  This conference is similar to CoreOS fest but focuses more heavily on topics of Docker (obviously).  I haven’t had a chance to go to one of these yet but containers and Docker have so much momentum it is very difficult to avoid.  As well, many people believe that container technologies are going to be the path to the future so it is a good idea to be as close to the action as you can.

Monitorama (June)

This is one of the coolest conferences I think, but that is probably just because I am so obsessed with monitoring and metrics collection.  Monitoring seems to be one of those topics that isn’t always fun to deal with or work around but talks and technologies at this conference actually make me excited about monitoring.  To most, monitoring is a necessary evil and a lot of the content from this conference can help make your life easier and better in all aspects of monitoring, from new trends and tools to topics on how to correctly monitor and scale infrastructures.  Talks can be technical but well worth it, if monitoring is something that interests you.

AWS Re:Invent (November)

This one is a monster.  This is the big conference that AWS puts on every year to announce new products and technologies that they have been working on as well as provide some incredibly helpful technical talks.  I believe this conference is one of the pricier and more exclusive conferences but offers a lot in the way of content and details.  This conference offers some of the best, most technical topics of discussion that I have seen and has been invaluable as a learning resource.  All of the videos from the conference are posted on YouTube so you can get access to this information for free.  Obviously the content is related to AWS but I have found this to be a great way to learn.

Conclusion

Even if you don’t have a lot of time to travel or get out to these conferences, nearly all of them post video from the event so you can watch it whenever you want to.  This is an INCREDIBLE learning tool and resource that is FREE.  The only downside to the videos is that you can’t ask any questions, but it is easy to find the presenters contact info if you are interested and feel like reaching out.

That being said, you tend to get a lot more out of attending the conference.  The main benefit of going to conferences over watching the videos alone is that you get to meet and talk to others in the space and get a feel for what everybody else is doing as well as check out many cool tools that you might otherwise never hear about.  At every conference I attend, I always learn about some new tech that others are using that I have never heard of that is incredibly useful and I always run in to interesting people that I would otherwise not have the opportunity to meet.

So definitely if you can, get out to these conferences, meet and talk to people, and get as much out of them as you can.  If you can’t make it, check out the videos afterwards for some really great nuggets of information, they are a great way to keep your skills sharp and current.

If you have any more conferences to add to this list I would be happy to update it!  I am always looking for new conferences and DevOps related events.

Read More

Grafana dashboard

Composing a Graphite server with Docker

Recently our Graphite server needed to be overhauled, which I was not looking forward to.  Luckily Docker makes the process of building identical and reproducible images for configuring a new server much easier and painless than other methods.

Introduction

If you don’t know what Graphite is you can check out the documentation for more info.  Basically it is a tool to collect and aggregate metrics of pretty much any kind, in to a central location.  It is a great complement to something like statsd for metric collection and aggregation, which I will go over later.

The setup I will be describing today leverages a handful of components to work.  The first and most important part is Graphite.  This includes all of the parts that make up Graphite, including the carbon aggregator and carbon cache for the collection and processing of metrics as well as the whisper db for storing metrics.

There are several other alternative backends but I don’t have any experience with them so won’t be posting any details.  If you are interested, InfluxDB and OpenTSDB both look like interesting alternative backends to whisper for storing metrics.

The Problem

Graphite is known to be notoriously difficult to install and configure properly.  If you haven’t tried to set up Graphite before, give it a try.

Another argument that I hear quite a bit is that the Graphite workload doesn’t really fit in with the Docker model.  In a distributed or highly available architecture that might be the case but in the example I cove here, we are taking a different approach.

The design and implementation separates data on to an EBS volume which is a durable storage resource, so it doesn’t matter if the server were to have problems.  With our approach and process we can reprovision the server and have everything up and running in less than 5 minutes.

The benefit of doing it this way is obvious.  Another benefit of our approach is that we are levering the graphite-api package so that we can have access to all of the Graphite goodness without having to run all of the other bloats and then proxying it through ngingx/wsgi which helps with performance.  I will go over this set up in a little bit.  No Graphite server would be complete if it didn’t leverage Grafana, which turns out to be stupidly easy using the Docker approach.

If we were ever to try to expand this architecture I think a distributed model using EFS (currently in preview) along with some type of load balancer in front to distribute requests evenly may be a possibility.  If you have experience running Graphite across many nodes I would love to hear what you are doing.

The Solution

There are a few components to our architecture.  The first is a tool I have been writing about recently called Terraform.  We use this with some custom scripting to build the server, configure it and attach our Graphite data volume to the server.

Here is what a sample terraform config might look like to provision the server with the tools we want.  This server is provisioned to an AWS environment and leverages a number of variables.  You can check the docs on how variables work or if there is too much confusion I can post an example.

provider "aws" {
  access_key = "${var.access_key}"
  secret_key = "${var.secret_key}"
  region = "${var.region}"
}

resource "aws_instance" "graphite" {
  ami = "${lookup(var.amis, var.region)}"
  availability_zone = "us-east-1e"
  instance_type = "c3.xlarge"
  subnet_id = "${var.public-1e}"
  security_groups = ["${var.graphite}"]
  key_name = "XXX"
  user_data = "${file("../cloud-config/graphite.yml")}"

  root_block_device = {
    volume_type = "gp2"
    volume_size = "20"
  }

  connection {
    user = "username"
    key_file = "${var.key_path}"
  }

 # mount EBS
  provisioner "local-exec" {
     command = "aws ec2 attach-volume --region=us-east-1 --volume-id=${var.graphite_data_vol} --instance-id=${aws_instance.graphite.id} --device=/dev/xvdf"
  }

  provisioner "remote-exec" {
    inline = [
    "while [ ! -e /dev/xvdf ]; do sleep 1; done",
    "echo '/dev/xvdf /data ext4 defaults 0 0' | sudo tee -a /etc/fstab",
    "sudo mkdir /data && sudo mount -t ext4 /dev/xvdf /data"
  ]
 }

}

And optionally if you have an Elastic IP to use you can tack that on to your config

resource "aws_eip" "graphite" {
  instance = "${aws_instance.graphite.id}"
  vpc = true
}

The graphite server uses a mostly standard config and installs a few of the components that we need to run the server, docker, python, pip, docker-compose, etc.  Here is what a sample cloud config for the Graphite server might look like.

#cloud-config

# Make sure OS is up to date
apt_update: true
apt_upgrade: true
disable_root: true

# Connect to private repo
write_files:
 - path: /home/<user>/.dockercfg
 owner: user:group
 permissions: 0755
 content: |
 {
   "https://index.docker.io/v1/": {
   "auth": "XXX",
   "email": "email"
 }
 }

# Capture all subprocess output for troubleshooting cloud-init issues
output: {all: '| tee -a /var/log/cloud-init-output.log'}

packages:
 - python-dev
 - python-pip

# Install latest Docker version
runcmd:
 - apt-get -y install linux-image-extra-$(uname -r)
 - curl -sSL https://get.docker.com/ubuntu/ | sudo sh
 - usermod -a -G docker <user>
 - sg docker
 - sudo pip install -U docker-compose

# Reboot for changes to take
power_state:
 mode: reboot
 delay: "+1"

ssh_authorized_keys:
 - <put your ssh public key here>

Docker

This is where most of the magi happens.  As noted above, we are using Docker and a few of its tools to get everything working.  All the logic to get Graphite running is contained in the Dockerfile, which will require some customizing but is similar to the following.

# Building from Ubuntu base
FROM ubuntu:14.04.2

# This suppresses a bunch of annoying warnings from debconf
ENV DEBIAN_FRONTEND noninteractive

# Install all system dependencies
RUN \
 apt-get -qq install -y software-properties-common && \
 add-apt-repository -y ppa:chris-lea/node.js && \
 apt-get -qq update -y && \
 apt-get -qq install -y build-essential curl \
 # Graphite dependencies
 python-dev libcairo2-dev libffi-dev python-pip \
 # Supervisor
 supervisor \
 # nginx + uWSGI
 nginx uwsgi-plugin-python \
 # StatsD
 nodejs

# Install StatsD
RUN \
 mkdir -p /opt && \
 cd /opt && \
 curl -sLo statsd.tar.gz https://github.com/etsy/statsd/archive/v0.7.2.tar.gz && \
 tar -xzf statsd.tar.gz && \
 mv statsd-0.7.2 statsd

# Install Python packages for Graphite
RUN pip install graphite-api[sentry] whisper carbon

# Optional install graphite-api caching
# http://graphite-api.readthedocs.org/en/latest/installation.html#extra-dependencies
# RUN pip install -y graphite-api[cache]

# Configuration
# Graphite configs
ADD carbon.conf /opt/graphite/conf/carbon.conf
ADD storage-schemas.conf /opt/graphite/conf/storage-schemas.conf
ADD storage-aggregation.conf /opt/graphite/conf/storage-aggregation.conf
# Supervisord
ADD supervisord.conf /etc/supervisor/conf.d/supervisord.conf
# StatsD
ADD statsd_config.js /etc/statsd/config.js
# Graphite API
ADD graphite-api.yaml /etc/graphite-api.yaml
# uwsgi
ADD uwsgi.conf /etc/uwsgi.conf
# nginx
ADD nginx.conf /etc/nginx/nginx.conf
ADD basic_auth /etc/nginx/basic_auth

# nginx
EXPOSE 80 \
# graphite-api
8080 \
# Carbon line receiver
2003 \
# Carbon pickle receiver
2004 \
# Carbon cache query
7002 \
# StatsD UDP
8125 \
# StatsD Admin
8126

# Launch stack
CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/supervisord.conf"]

The other component we need is Grafana, which we don’t actually build but pull from the Dockerhub registry and inject our custom volume to.  This is all captured in our docker-compose.yml file listed below.

graphite:
  build: ./docker-graphite
  restart: always
  ports:
    - "8080:80"
    - "8125:8125/udp"
    - "8126:8126"
    - "2003:2003"
    - "2004:2004"
  volumes:
    - "/data/graphite:/opt/graphite/storage/whisper"

grafana:
  image: grafana/grafana
  restart: always
  ports:
    - "80:3000"
  volumes:
    - "/data/grafana:/var/lib/grafana"
  links:
    - graphite
  environment:
    - GF_SECURITY_ADMIN_PASSWORD=password123

We have open sourced our configuration and placed it on github so you can take a look at it to get a better idea of the configs and how everything is working with some working examples.  The github repo is a quick way to try out the stack without having to provision and build an environment to run this on.  If you are just interested in kicking the tires I suggest starting with the github repo.

The build directive above corresponds to the repo on github.

The last components is actually running the Docker containers.  As you can see we use docker-compose but we also need a way to start the containers automatically after a disruption like a reboot or something.  That is actually pretty easy.  On an Ubuntu (or system using upstart) you can create an init script to start up docker-compose or restart it automatically if it has problems.  Here I have created a file called /etc/init/graphite.conf with the following configuraiton.

description "Graphite"
start on filesystem and started docker
stop on runlevel [!2345]
respawn
chdir /home/user
exec docker-compose up

A systemd service would achieve a similar goal but the version of Ubuntu used here doesn’t leverage systemd.

After everything has been dropped in place and configured you can check your work by testing out Grafana by hitting the public IP address of your server.  If you hit the Grafana splash page everything should be working!

Grafana dashboard

 

 

 

 

 

 

 

 

 

 

Conclusion

There are many pieces to this puzzle and honestly we don’t have the requirement of having Graphite be 100% available and redundant so we can get away with a single server for our needs.  A separate EBS volume and Terraform allow us to rebuild the server quickly and automatically if something were to happen to the server.  Also, the way we have designed Graphite to run will be able to handle a substantial workload without falling over.  But if you are doing anything cool with Graphite HA or resiliency I would like to hear how you are doing it, there is always room for improvement.

If you are just interested in trying out the Graphite stack I highly suggest going over to the github repo and running the container stack to play around with the components, especially if you are interested in learning about how statsd and graphite collect metrics.  The Grafana interface give you a nice way to tap in to the metrics that get pumped in to Graphite.

Read More

Create a Kubernetes cluster on AWS and CoreOS with Terraform

Up until my recent discovery of Terraform, the process I had been using to test CoreOS and Kubernetes was somewhat cumbersome and manual.  There are still some manual steps and processes involved in the bootstrap and cluster creation process that need to get sorted out, but now I can bring environments up and down, quickly and automatically.  This is a HUGE time saver and also makes testing easier because these changes can happen in a matter of minutes rather than hours and can all be self documented for others to reference in a Github repo.  Great success.

NOTE:  This method seems to be broken as of the 0.14.2 release of Kubernetes.  The latest version I could get to work reliably was v0.13.1.  I am following the development and looking forward to the v1.0 release but won’t revisit this method until something stable has been shipped because there are still just too many changes going on.  With that said, v0.13.1 has a lot of useful functionality and this method is actually really easy to get working once you have the groundwork laid out.

Another benefit is that as the project develops and matures, the only thing that will need modified are the cloud configs I am using here.  So if you follow along you can use my configs as a template, feel free to use this as a base and modify the configs to get this working with a newer release.  As I said I will be revisiting the configs once things slow down a little and a v1 has been released.

Terraform

So the first component that we need to enable in this workflow is Terraform.  From their site, “Terraform is a tool for building, changing, and combining infrastructure safely and efficiently.”  Basically, Terraform is a command line tool for allowing you to implement your infrastructure as code across a variety of different infrastructure providers.  It should go without saying, being able to test environments across different platforms and cloud providers is a gigantic benefit.  It doesn’t lock you in to any one vendor and greatly helps simplify the process of creating complex infrastructures across different platforms.

Terraform is still a young project but has been maturing nicely and currently supports most of the functionality needed for this method to work (the missing stuff is in the dev pipeline and will be released in the near future).  Another benefit is that Terraform is much easier to use and understand than CloudFormation, which is  a propriety cloud provisioning tool available to AWS customers, which could be used if you are in a strictly AWS environment.

The first step is to download and install Terraform.  In this example I am using OSX but the instructions will be similar on Linux or other platforms.

cd /tmp
wget https://dl.bintray.com/mitchellh/terraform/terraform_0.3.7_darwin_amd64.zip
unzip terraform_0.3.7_darwin_amd64.zip
mv terraform* /usr/local/bin

After you have moved the binary you will need to source your shell.  I use zsh so I just ran “source ~/.zshrc” to update the path for terraform.

To test terraform out you can check the version to make sure it works.

terraform version

Now that Terraform is installed you will need to get some terraform files set up.  I suggest making a local terraform directory on your machine so you can create a repo out of it later if desired.  I like to split “services” up by creating different directories.  So within the terraform directory I have created a coreos directory as well as a kubernetes directory, each with their own variables file (which should be very similar).  I don’t know if this approach is a best practice but has been working well for me so far.  I will gladly update this workflow if there is a better way to do this.

Here is a sample of what the file and directory layout might look like.

cloud-config
  etcd-1.yml
  etcd-2.yml
  etcd-3.yml
  kube-master.yml
  kube-node.yml
etcd
  dns.tf
  etcd.tf
  variables.tf
kubernetes
  dns.tf
  kubernetes.tf
  variables.tf

As you can see there is a directory for Etcd as well as Kubernetes specific configurations.  You may also notice that there is a cloud-config directory.  This will be used as a central place to put configurations for the different services.

Etcd

With Terraform set up, the next component needed for this architecture to work is a functioning etcd cluster. I chose to use a separate 3 node cluster (spread across 3 AZ’s) for improved performance and resliency.  If one of the nodes goes down or away with a 3 node cluster it will still be operational, where if a 1 node cluster goes away you will be in much more trouble.  Additionally if you have other services or servers that need to leverage etcd you can just point them to this etcd cluster.

Luckily, with Terraform it is dead simple to spin up and down new clusters once you have your initial configurations set up and configured correctly.

At the time of this writing I am using the current stable version of CoreOS, which is 633.1.0, which uses version 0.4.8 of etcd.  According to the folks at CoreOS, the cloud configs for old versions of etcd should work once the new version has been released so moving to a the new 2.0 release should be easy once it hits the release channel but some tweaks or additional changes to the cloud configs may need to occur.

Configuration

Before we get in to the details of how all of this works, I would like to point out that many of the settings in these configuration files will be specific to users environments.  For example I am using an AWS VPC in the “us-east-1” region for this set up, so you may need to adjust some of the settings in these files to match your own scenario.  Other custom components may include security groups, subnet id’s, ssh keys, availability zones, etc.

Terraform offers resources for basically all network components on AWS so you could easily extend these configurations to build out your initial network and environment if you were starting a project like this from scratch.  You can check all the Terraform resources for the AWS provider here.

Warning: This guide assumes a few subtle things to work correctly.  The address scheme we are using for this environment is a 192.168.x.x, leveraging 3 subnets to spread the nodes out across for additional availability (b, c, e) in the US-East-1 AWS region.  Anything in the configuration that has been filled in with “XXX” represents a custom value that you will need to either create or obtain in your own environment and modify in the configuration files.

Finally, you will need to provide AWS credentials to allow Terraform to communicate with the API for creating and modifying resources.  You can see where these credentials should be filled in below in the variables.tf file.

variables.tf

variable "access_key" { 
 description = "AWS access key"
 default = "XXX"
}

variable "secret_key" { 
 description = "AWS secret access key"
 default = "XXX"
}

variable "region" {
 default = "us-east-1"
}

/* CoreOS AMI - 633.1.0 */

variable "amis" {
 description = "Base CoreOS AMI"
 default = {
 us-east-1 = "ami-d6033bbe" 
 }
}

Here is what an example CoreOS configs look like.

etcd.tf

provider "aws" {
 access_key = "${var.access_key}"
 secret_key = "${var.secret_key}"
 region = "${var.region}"
}

/* Etcd cluster */

resource "aws_instance" "etcd-01" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1e" 
 instance_type = "t2.micro"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = XXX"
 private_ip = "192.168.1.10"
 user_data = "${file("../cloud-config/etcd-1.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "20"
 } 
}

resource "aws_instance" "etcd-02" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1b" 
 instance_type = "t2.micro"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = "XXX"
 private_ip = "192.168.2.10"
 user_data = "${file("../cloud-config/etcd-2.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "20"
 } 
}

resource "aws_instance" "etcd-03" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1c" 
 instance_type = "t2.micro"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = "XXX"
 private_ip = "192.168.3.10"
 user_data = "${file("../cloud-config/etcd-3.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "20"
 } 
}

Below I have created a configuration file as a simaple way to create DNS records dynamically when spinning up the etcd cluster nodes.

dns.tf

 resource "aws_route53_record" "etcd-01" {
 zone_id = "XXX"
 name = "etcd-01.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.etcd-01.private_ip}"]
}

resource "aws_route53_record" "etcd-02" {
 zone_id = "XXX"
 name = "etcd-02.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.etcd-02.private_ip}"]
}

resource "aws_route53_record" "etcd-03" {
 zone_id = "XXX"
 name = "etcd-03.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.etcd-03.private_ip}"]
}

Once all of the configurations have been put in place and all look right you can test out what your configuration will look like with the “plan” command:

cd etcd
terraform plan

Make sure to change in to your etcd directory first.  This will examine your current configuration and calculate any changes.  If your environment is completely unconfigured then this command will return some output that explains what terraform is planning to do.

If you don’t want the input prompts when you run your plan command you can append the “-input=false” flag to bypass the configurations.

If everything looks okay with the plan you can tell Terraform to “apply” your conifgs with the following:

terraform apply
OR
terraform apply -input=false

If everything goes accordingly, after a few minutes you should have a new 3 node etcd cluster running on the lastest stable version of CoreOS with DNS records for interacting with the nodes!  To double check that the servers are being created you can check the AWS console to see if your newly defined servers have been created.  The console is a great way to double check that things all work okay and that the right values were created.

If you are having trouble with the cloud configs check the end of the post for the link to all of the etcd and Kubernetes cloud configs.

Kubernetes

The Kubernetes configuration is very similar to etcd.  It uses a variables.tf, kubernetes.tf and dns.tf file to configure the Kubernetes cluster.

The following configurations will build a v0.13.1 Kubernetes cluster with 1 master, and 3 worker nodes to begin with.  This config can be extended easily to scale the number of worker nodes to basically as many as you want (I could easily image the hundreds or thousands), simply by changing a few number in the configuration, barely adding any overhead to our current process and workflow, which is nice.  Because of these possibilities, Terraform allows for a large amount of flexibility in how you manage your infrastructure.

This configuration is using c3.large instances so be aware that your AWS bill may be affected if you spin nodes up and fail to turn them off when you are done testing.

provider "aws" {
 access_key = "${var.access_key}"
 secret_key = "${var.secret_key}"
 region = "${var.region}"
}

/* Kubernetes cluster */

resource "aws_instance" "kube-master" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1e" 
 instance_type = "c3.large"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = "XXX"
 private_ip = "192.168.1.100"
 user_data = "${file("../cloud-config/kube-master.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "20"
 } 
}

resource "aws_instance" "kube-e" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1e" 
 instance_type = "c3.large"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = "XXX"
 count = "1"
 user_data = "${file("../cloud-config/kube-node.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "100"
 } 
}

resource "aws_instance" "kube-b" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1b" 
 instance_type = "c3.large"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = "XXX"
 count = "1"
 user_data = "${file("../cloud-config/kube-node.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "100"
 } 
}

resource "aws_instance" "kube-c" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1c" 
 instance_type = "c3.large"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = "XXX"
 count = "1"
 user_data = "${file("../cloud-config/kube-node.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "100"
 } 
}

And our DNS configuration.

resource "aws_route53_record" "kube-master" {
 zone_id = "XXX"
 name = "kube-master.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.kube-master.private_ip}"]
}

resource "aws_route53_record" "kube-e" {
 zone_id = "XXX"
 name = "kube-e-test.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.kube-e.0.private_ip}"]
}

resource "aws_route53_record" "kube-b" {
 zone_id = "XXX"
 name = "kube-b.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.kube-b.0.private_ip}"]
}

resource "aws_route53_record" "kube-c" {
 zone_id = "XXX"
 name = "kube-c.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.kube-c.0.private_ip}"]
}

The variables file for Kubernetes should be identical to the etcd configuration so I have chosen not to place it here.  Just refer to the previous etcd/variables.tf file.

Resources

Since each cloud-config is slightly different (and would take up a lot more space) I have included those files in the below gist.  You will need to populate the “ssh_authorized_keys:” section with your own SSH public key and update any of the IP addresses to reflect your environment.  I apologize if there are any typo’s, there was a lot of cut and paste.

Cloud configs – https://gist.github.com/jmreicha/7923c295ab6110151127

Much of the configurations that I am using are based on the Kubernetes docs, as well as some of the specific cloud configs that I have adapted, which can be found here.

Another great place to get help with Kubernetes is the IRC channel which can be found on irc.freenode.net in the #google-containers channel.  The folks that hang out there are super friendly and can almost always answer any questions you have.

As I said, development is still pretty crazy.  You can check the releases page to check out all the latest stuff.

Conclusion

Yes this can seem very convoluted at first but if everything works how it should, you now have a quick and easy way to spin up identical etcd and/or a Kubernetes environments up or down at will, which is pretty powerful.  Also, this method is dramatically easier than most of the methods I have come across so far in my own adventures and testing.

Once you get through the initial confusion and learning curve this workflow becomes a huge timesaver for testing out different versions of Kubernetes and also for experimenting with etcd.  I haven’t quite automated the entire process but I imagine that it would be easy to spin entire environments up and down by gluing all of these pieces together with some simple shell scripts.

If you need to make any configuration updates, for example to put a new version of Kubernetes in place, you will need first update your Kubernetes master/node cloud configs and then rerun terraform apply to have it recreate your environment.

The cloud config changes will destroy any nodes that rely on the old configuration.  Therefore, you will need to make sure that if you make any changes to your cloud config files you are prepared to deal with the consequences!  Ideally you should get your etcd cluster to a good spot and then leave it alone and just play around with the Kubernetes components since both of those components have been separated in order to change the components out independently.

With this workflow you can already start to see the power of terraform even with this one example.  Terraform is quickly becoming one of my favorite automation and cloud tools and is providing a very easy way to define and build infrastructure though code and configurations.

Read More