xhyve vs vbox driver benchmarks for docker-machine

Getting a usable and productive dev environment working with Docker on OS X is not exactly trivial, although it is getting much better.  If you have spent any time working with docker-machine and Docker on OS X you’ve probably run across some type of roadblock to getting your dev environment working.

If you have used docker-machine you are probably familiar with the Virtualbox driver, the driver that ships as by default.  Obviously it works out of the box but if you have used Virtualbox for any amount of time you have probably discovered some of its quirks.  My biggest gripe thus far with Vbox is that their shared folders technology to sync files between the host and VM is slooooow.  In fact, I have written about my own workaround here.

I have run in to some other performance issues using VBox.  This write up is a very detailed comparison of the performance between VBox and VMWare.  The tl;dr of the post is that that the VMWare hypervisor has better performance.  To Oracle’s credit though, many of these performance issues have actually been addressed in the vbox 5.0.0 release.  So if you aren’t running on 5.x definitely make the jump.  The Docker Toolbox ships the newer release so there is no reason not to upgrade.

Making the jump to VBox 5.x may, and most likely should solve your problems but I have been curious about what other options are out there.  Recently, as of July 2015, the xhyve hypervisor project has been available on OS X.  xhyve is a port of the byhve project, which aims to bring high performance virtualization with a light footprint to OS X.  It is still very young but shows a lot of potential.

Even younger than the xhyve project itself is the xhyve driver for docker-machine.  It is so young that it is still not an officially supported driver yet, though it looks like it is well on its way.  Definitely keep an eye on the xhyve and docker-machine xhyve projects if you are looking for an alternative to either VBox or VMWare.  The xhyve docker-machine driver project has recently closed a ticket to be added to brew so it is much less complicated to get working.

Xhyve installation

I will be going over the bare minimum installation instructions to getting everything working.  If you are interested in more of the details on how to get the xhyve driver, I suggest taking a look at this awesome blog post.  The post goes in to depth on how to install and use the docker-machine xhyve driver if you are interested in a more in depth look at how to get things working.

Make sure you have brew installed first.  You will also need to have brew cask installed.  After you have brew installed you should be able to get it from the command line with the following command.

brew tap caskroom/cask

Once you have cask installed you should be able to install the remaining components.

brew update
brew install xhyve
brew cask install dockertoolbox

This might take a little bit depending on how fast your internet connection is.  After you have the toolbox installed, go grab the docker-machine xhyve driver.

brew install docker-machine-driver-xhyve

If you have the dockertoolbox installed already you might some errors in the output.  This just means there was a version conflict somewhere.  As of docker-machine version 0.5.6_1, support has been added for the xhyve driver.

There is currently a caveat to using this driver where you need to change some permissions.  This should hopefully be fixed in the future but is at least something to be aware of.

sudo chown root:wheel $(brew --prefix)/opt/docker-machine-driver-xhyve/bin/docker-machine-driver-xhyve
sudo chmod u+s $(brew --prefix)/opt/docker-machine-driver-xhyve/bin/docker-machine-driver-xhyve

You will also need to clean out your /etc/exports file if you have made changes.

sudo mv /etc/exports{,.backup} && touch /etc/exports

Then create the machine.

docker-machine create --driver xhyve --xhyve-experimental-nfs-share test

If you can interact with the Docker daemon you should be in business.

Benchmark results

The remainder of the post describes the benchmark and performance results of the VBox driver and the xhyve driver.  If you are only interested in getting the xhyve driver working then feel free to skim through the benchmarks, but be sure to take a look at the conclusion for the final verdict.

Below are the specs of the OS X machine that was used to run my benchmarks.

  • OS X 10.10.5
  • Virtualbox 5.0.12
  • xhyve 0.2.0
  • docker-machine-xyhve 0.2.2

Many of the ideas I used for the benchmark tests were taken from the post linked above.  It shows in great detail the methodology that was used to benchmark each of the drivers, which is useful because it gives some really good insight into the tools that were used and how the tests were performed.

Benchmarking on the boot2docker VM is tricky because it is mostly a read only file system and there is no package manger.  Therefore I relied on running the benchmarks inside containers, using a few different methodologies for my testing.  The first was borrowed from the simple-container-benchmarks project on Dockerhub.  This benchmark test gives a good idea of the overall write performance and CPU performance of a container running inside the VM.  For network performance I used the iperf3 image located on Dockerhub.

Below are the results of a few random runs for both the VBox driver as well as the xhyve driver.  I have left out the specific commands here as they are included in the links to each benchmark.  Use the links to each project for specific instructions on how to run the benchmarks yourself if you are interested.  The results were interesting because I was expecting the xhyve driver to outperform the VBox driver.

Virtualbox results

container benchmark results (FS write and CPU)

Client mode...
Target: 172.17.0.2
------------------------------
Performance benchmarks
------------------------------
dockerhost: tcp://192.168.99.100:2376
host: 172.17.0.2 a8b790317264
eth0: 172.17.0.2
date: Sat Jan 23 02:24:20 UTC 2016

------------------------------
FS write performance
------------------------------
1073741824 bytes (1.1 GB) copied, 2.39743 s, 448 MB/s
1073741824 bytes (1.1 GB) copied, 2.35377 s, 456 MB/s
1073741824 bytes (1.1 GB) copied, 1.9075 s, 563 MB/s
1073741824 bytes (1.1 GB) copied, 2.37838 s, 451 MB/s
1073741824 bytes (1.1 GB) copied, 2.03373 s, 528 MB/s
1073741824 bytes (1.1 GB) copied, 1.94024 s, 553 MB/s
1073741824 bytes (1.1 GB) copied, 1.99546 s, 538 MB/s
1073741824 bytes (1.1 GB) copied, 2.00287 s, 536 MB/s
1073741824 bytes (1.1 GB) copied, 1.5292 s, 702 MB/s
1073741824 bytes (1.1 GB) copied, 1.92617 s, 557 MB/s

------------------------------
CPU performance
------------------------------
268435456 bytes (268 MB) copied, 22.6775 s, 11.8 MB/s
268435456 bytes (268 MB) copied, 22.1466 s, 12.1 MB/s
268435456 bytes (268 MB) copied, 30.7552 s, 8.7 MB/s
268435456 bytes (268 MB) copied, 22.2861 s, 12.0 MB/s
268435456 bytes (268 MB) copied, 22.5571 s, 11.9 MB/s
268435456 bytes (268 MB) copied, 21.9901 s, 12.2 MB/s
268435456 bytes (268 MB) copied, 21.8232 s, 12.3 MB/s
268435456 bytes (268 MB) copied, 31.3903 s, 8.6 MB/s
268435456 bytes (268 MB) copied, 28.1219 s, 9.5 MB/s
268435456 bytes (268 MB) copied, 31.0172 s, 8.7 MB/s

------------------------------
System info
------------------------------
             total       used       free     shared    buffers     cached
Mem:       1019960     313288     706672     113104       7808     132532
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 58
Stepping:              9
CPU MHz:               2294.770
BogoMIPS:              4589.54
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              3072K

iperf results

Connecting to host 172.17.0.3, port 5201
[  4] local 172.17.0.4 port 39476 connected to 172.17.0.3 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  2.09 GBytes  17.9 Gbits/sec  2321   1.03 MBytes
[  4]   1.00-2.00   sec  2.46 GBytes  21.1 Gbits/sec  496    980 KBytes
[  4]   2.00-3.00   sec  2.24 GBytes  19.3 Gbits/sec  339   1.77 MBytes
[  4]   3.00-4.00   sec  2.54 GBytes  21.8 Gbits/sec  1355    389 KBytes
[  4]   4.00-5.00   sec  2.10 GBytes  18.0 Gbits/sec  106    495 KBytes
[  4]   5.00-6.00   sec  3.00 GBytes  25.7 Gbits/sec  217    411 KBytes
[  4]   6.00-7.00   sec  2.60 GBytes  22.4 Gbits/sec  440   1.72 MBytes
[  4]   7.00-8.00   sec  2.06 GBytes  17.7 Gbits/sec    0   1.72 MBytes
[  4]   8.00-9.00   sec  2.07 GBytes  17.8 Gbits/sec    0   1.72 MBytes
[  4]   9.00-10.00  sec  2.51 GBytes  21.6 Gbits/sec  876    713 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  23.7 GBytes  20.3 Gbits/sec  6150             sender
[  4]   0.00-10.00  sec  23.7 GBytes  20.3 Gbits/sec                  receiver

iperf Done.

xhyve results

container benchmark results (FS write and CPU)

Client mode...
Target: 172.17.0.2
------------------------------
Performance benchmarks
------------------------------
dockerhost:
host: 172.17.0.2 2c8d9ba61eae
eth0: 172.17.0.2
date: Sat Jan 23 02:08:15 UTC 2016

------------------------------
FS write performance
------------------------------
1073741824 bytes (1.1 GB) copied, 8.24671 s, 130 MB/s
1073741824 bytes (1.1 GB) copied, 5.89179 s, 182 MB/s
1073741824 bytes (1.1 GB) copied, 6.05392 s, 177 MB/s
1073741824 bytes (1.1 GB) copied, 5.37728 s, 200 MB/s
1073741824 bytes (1.1 GB) copied, 4.824 s, 223 MB/s
1073741824 bytes (1.1 GB) copied, 5.90409 s, 182 MB/s
1073741824 bytes (1.1 GB) copied, 5.22375 s, 206 MB/s
1073741824 bytes (1.1 GB) copied, 5.07298 s, 212 MB/s
1073741824 bytes (1.1 GB) copied, 5.89058 s, 182 MB/s
1073741824 bytes (1.1 GB) copied, 4.80828 s, 223 MB/s

------------------------------
CPU performance
------------------------------
268435456 bytes (268 MB) copied, 25.478 s, 10.5 MB/s
268435456 bytes (268 MB) copied, 31.3984 s, 8.5 MB/s
268435456 bytes (268 MB) copied, 24.698 s, 10.9 MB/s
268435456 bytes (268 MB) copied, 31.1973 s, 8.6 MB/s
268435456 bytes (268 MB) copied, 23.3705 s, 11.5 MB/s
268435456 bytes (268 MB) copied, 23.3973 s, 11.5 MB/s
268435456 bytes (268 MB) copied, 23.7405 s, 11.3 MB/s
268435456 bytes (268 MB) copied, 23.6118 s, 11.4 MB/s
268435456 bytes (268 MB) copied, 23.5606 s, 11.4 MB/s
268435456 bytes (268 MB) copied, 24.3341 s, 11.0 MB/s

------------------------------
System info
------------------------------
             total       used       free     shared    buffers     cached
Mem:       1020028     291632     728396      70356       6420      89824
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 58
Stepping:              9
CPU MHz:               2294.450
BogoMIPS:              4607.99
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              3072K

iperf results

Connecting to host 172.17.0.2, port 5201
[  4] local 172.17.0.3 port 49244 connected to 172.17.0.2 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  2.29 GBytes  19.7 Gbits/sec    0   1.90 MBytes
[  4]   1.00-2.00   sec  2.84 GBytes  24.4 Gbits/sec  567    953 KBytes
[  4]   2.00-3.00   sec  2.16 GBytes  18.6 Gbits/sec  327    667 KBytes
[  4]   3.00-4.00   sec  2.32 GBytes  19.9 Gbits/sec  166   1.52 MBytes
[  4]   4.00-5.00   sec  2.63 GBytes  22.6 Gbits/sec  565    769 KBytes
[  4]   5.00-6.00   sec  2.71 GBytes  23.3 Gbits/sec  608    583 KBytes
[  4]   6.00-7.00   sec  2.67 GBytes  22.9 Gbits/sec  217   1.40 MBytes
[  4]   7.00-8.00   sec  2.98 GBytes  25.6 Gbits/sec  782    498 KBytes
[  4]   8.00-9.00   sec  2.80 GBytes  24.0 Gbits/sec  359   1.01 MBytes
[  4]   9.00-10.00  sec  2.43 GBytes  20.9 Gbits/sec  883    467 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  25.8 GBytes  22.2 Gbits/sec  4474             sender
[  4]   0.00-10.00  sec  25.8 GBytes  22.2 Gbits/sec                  receiver

iperf Done.

Conclusion

I was on on the fence about VBox performance but the proof is in the pudding here with the test results.  The VBox driver had significantly better FS write performance (almost 2x).  CPU performance was about equal overall, and network throughput was also very similar.  I suspect CPU performance would be in favor of xhyve if these tests were run using VBox 4.x.  Regardless, equal CPU performance, similar network throughput and significantly better FS writes tip the scale in favor of the VBox driver.

As frustrating as it can be at times to use Vbox, many of its past performance issues have been fixed as of the v5.0 release.  The shared folder issue still exists but is largely taken care of by the great, easy to use tools that the Docker community has written, docker-machine-nfs is my favorite.

Surprisingly, or maybe not THAT surprisingly, xhyve actually performs worse that Virtualbox at this point.  xyhve itself is still a super young project and the docker-machine xhyve driver is still super young so there is definitely some room for growth.  That said, it was very straightforward to get xhyve and the docker-drive installed and configured, so I believe it is just a matter of time before the xhyve driver matures to a point where it can replace other drivers.  One down side of the xhyve driver is that it also suffers from the host to VM shared folder issue and the current best work around is to use the –nfs-share flag that the xhyve docker-machine driver offers.

I will definitely have my eye on the xhyve project moving forward because it looks to be a great alternative to other virtualization technologies for OS X once it reaches a point of maturity.  For now, VBox works more than sufficiently, has been around for a long time, is pretty much ubiquitous across platforms and the developers have shown that they are still actively working on improving the project with the recent 5.0 release.

Read More

Dockerizing Sentry

I have created a Github project that has basic instructions for getting started.  You can take a look over there for ideas of how all of this works and to get ideas for your own set up.

I used the following links as reference for my approach to Dockerizing Sentry.

https://registry.hub.docker.com/u/slafs/sentry
https://github.com/rchampourlier/docker-sentry

If you have configurations to use, it is probably a good idea to start from there.  You can check my Github repo for what a basic configuration looks like.  If you are starting from scratch or are using version 7.1.x or above you can use the “sentry init” command to generate a skeleton configuration to work from.

For this setup to work you will need the following prebuilt Docker images/containers. I suggest using something simple like docker-compose to stitch the containers together.

  • redis – https://registry.hub.docker.com/_/redis/
  • postgres – https://registry.hub.docker.com/_/postgres/
  • memcached – https://hub.docker.com/_/memcached/
  • nginx – https://hub.docker.com/_/nginx/

NOTE: If you are running this on OS X you may need to do some trickery and give special permission on the host (mac) level e.g. create ~/docker/postgres directory and give it the correct permission (I just used 777 recursively for testing, make sure to lock it down if you put this in production).

I wrote a little script in my Github project that will take care of setting up all of the directories on the host OS that need to be set up for data to persist.  The script also generates a self signed cert to use for proxying Sentry through Nginx.  Without the certificate, the statistics pages in the Sentry web interface will be broken.

To run the script, run the following command and follow the prompts.  Also make sure you have docker-compose installed beforehand to run all the needed command.

sudo ./setup.sh

The certs that get generated are self signed so you will see the red lock in your browser.  I haven’t tried it yet but I imagine using Let’s Encrytpt to create the certificates would be very easy.  Let me know if you have had any success generating Nginx certs for Docker containers, I might write a follow up post.

Preparing Postgres

After setting up directories and creating certificates, the first thing necessary to getting up and going is to add the Sentry superuser to Postgres (at least 9.4).  To do this, you will need to fire up the Postgres container.

docker-compose up -d postgres

Then to connect to the Postgres DB you can use the following command.

docker-compose run postgres sh -c 'exec psql -h "$POSTGRES_PORT_5432_TCP_ADDR" -p "$POSTGRES_PORT_5432_TCP_PORT" -U postgres'

Once you are logged in to the Postgres container you will need to set up a few Sentry DB related things.

First, create the role.

CREATE ROLE sentry superuser;

And then allow it to login.

ALTER ROLE sentry WITH LOGIN;

Create the Sentry DB.

CREATE DATABASE sentry;

When you are done in the container, \q will drop out of the postgresql shell.

After you’re done configuring the DB components you will need to “prime” Sentry by running it a first time.  This will probably take a little bit of time because it also requires you to build and pull all the other needed Docker images.

docker-compose build
docker-compose up

You will quickly notice if you try to browse to the Sentry URL (e.g. the IP/port of your Sentry container or docker-machine IP if you’re on OS X) that you will get errors in the logs and 503’s if you hit the site.

Repair the database (if needed)

To fix this you will need to run the following command on your DB to repair it if this is the first time you have run through the set up.

docker-compose run sentry sentry upgrade

The default Postgres database username and password is sentry in this setup, as part of the setup the upgrade prompt will ask you got create a new user and password, and make note of what those are.  You will definitely want to change these configs if you use this outside of a test or development environment.

After upgrading/preparing the database, you should be able to bring up the stack again.

docker-compose up -d && docker-compose logs

Now you should be able to get to the Sentry URL and start configuring .  To manage the username/password you can visit the /admin url and set up the accounts.

 

Next steps

The Sentry server should come up and allow you in but will likely need more configuration.  Using the power of docker-compose it is easy to add in any custom configurations you have.  For example, if you need to adjust sentry level configurations all you need to do is edit the file in ./sentry/sentry.conf.py and then restart the stack to pick up the changes.  Likewise, if you need to make changes to Nginx or celery, just edit the configuration file and bump the stack – using “docker-compose up -d”.

I have attempted to configure as many sane defaults in the base config to make the configuration steps easier.  You will probably want to check some of the following settings in the sentry/sentry.conf.py file.

  • SENTRY_ADMIN_EMAIL – For notifications
  • SENTRY_URL_PREFIX – This is especially important for getting stats working
  • SENTRY_ALLOW_ORIGIN – Where to allow communications from
  • ALLOWED_HOSTS – Which hosts can communicate with Sentry

If you have the SENTRY_URL_PREFIX set up correctly you should see something similar when you visit the /queue page, which indicates statistics are working.

Sentry Queue

If you want to set up any kind of email alerting, make sure to check out the mail server settings.

docker-compose.yml example file

The following configuration shows how the Sentry stack should look.  The meat of the logic is in this configuration but since docker-compose is so flexible, you can modify this to use any custom commands, different ports or any other configurations you may need to make Sentry work in your own environment.

# Caching
redis:
  image: redis:2.8
  hostname: redis
  ports:
    - "6379:6379"
   volumes:
     - "/data/redis:/data"

memcached:
  image: memcached
  hostname: memcached
  ports:
    - "11211:11211"

# Database
postgres:
  image: postgres:9.4
  hostname: postgres
  ports:
    - "5432:5432"
  volumes:
    - "/data/postgres/etc:/etc/postgresql"
    - "/data/postgres/log:/var/log/postgresql"
    - "/data/postgres/lib/data:/var/lib/postgresql/data"

# Customized Sentry configuration
sentry:
  build: ./sentry
  hostname: sentry
  ports:
    - "9000:9000"
    - "9001:9001"
  links:
    - postgres
    - redis
    - celery
    - memcached
  volumes:
    - "./sentry/sentry.conf.py:/home/sentry/.sentry/sentry.conf.py"


# Celery
celery:
  build: ./sentry
  hostname: celery
  environment:
    - C_FORCE_ROOT=true
  command: "sentry celery worker -B -l WARNING"
  links:
    - postgres
    - redis
    - memcached
  volumes:
    - "./sentry/sentry.conf.py:/home/sentry/.sentry/sentry.conf.py"

# Celerybeat
celerybeat:
  build: ./sentry
  hostname: celerybeat
  environment:
    - C_FORCE_ROOT=true
  command: "sentry celery beat -l WARNING"
  links:
    - postgres
    - redis
  volumes:
    - "./sentry/sentry.conf.py:/home/sentry/.sentry/sentry.conf.py"

# Nginx
nginx:
  image: nginx
  hostname: nginx
  ports:
    - "80:80"
    - "443:443"
  links:
    - sentry
  volumes:
    - "./nginx/sentry.conf:/etc/nginx/conf.d/default.conf"
    - "./nginx/sentry.crt:/etc/nginx/ssl/sentry.crt"
    - "./nginx/sentry.key:/etc/nginx/ssl/sentry.key"

The Dockerfiles for each of these component are fairly straight forward.  In fact, the same configs can be used for the Sentry, Celery and Celerybeat services.

Sentry

# Kombu breaks in 2.7.11
FROM python:2.7.10

# Set up sentry user
RUN groupadd sentry && useradd --create-home --home-dir /home/sentry -g sentry sentry
WORKDIR /home/sentry

# Sentry dependencies
RUN pip install \
 psycopg2 \
 mysql-python \
 supervisor \
 # Threading
 gevent \
 eventlet \
 # Memcached
 python-memcached \
 # Redis
 redis \
 hiredis \
 nydus

# Sentry
ENV SENTRY_VERSION 7.7.4
RUN pip install sentry==$SENTRY_VERSION

# Set up directories
RUN mkdir -p /home/sentry/.sentry \
 && chown -R sentry:sentry /home/sentry/.sentry \
 && chown -R sentry /var/log

# Configs
COPY sentry.conf.py /home/sentry/.sentry/sentry.conf.py

#USER sentry
EXPOSE 9000/tcp 9001/udp

# Making sentry commands easier to run
RUN ln -s /home/sentry/.sentry /root

CMD sentry --config=/home/sentry/.sentry/sentry.conf.py start

Since the customized Sentry config is rather lengthy, I will point you to the Github repo again.  There are a few values that you will need to provide but they should be pretty self explanatory.

Once the configs have all been put in to place you should be good to go.  A bonus piece would be to add an Upstart service that takes care of managing the stack if the server either gets rebooted or the containers manage to get stuck in an unstable state.  The configuration is a fairly easy thing to do and many other guides and posts have been written about how to accomplish this.

Read More

ECS cluster turnup with CoreOS and Terraform

Recently I have been evaluating different container clustering tools and technologies.  It has been a fun experience thus far, the tools and community being built around Docker have come a long time since I last looked.  So for today’s post I’d like to go over ECS a little bit.

ECS is essentially the AWS version of container management.  ECS takes care of managing your Docker (container) infrastructure by handling creation, management, destruction and scheduling as well as providing API integration with other AWS services, which is really powerful.  To get ECS up and running all you need to do is create an ECS cluster, either from the AWS console or from some other AWS integration like the CLI or Terraform, then install the agent on servers that you would like ECS to schedule work on.  After setting up the agent and cluster name you are basically ready to go, start by creating a task and then create a service to start running containers on the cluster.  Some cool new features have been announced at this years re:Invent conference but I haven’t had a chance yet to look at them yet.

First impression of ECS

The best part about testing ECS by far has been how easy it is to get set up and running.  It took less than 20 minutes to go from nothing to fully functioning cluster that was scheduling containers to hosts and receiving load.  I think the most powerful aspect of ECS is its integration with other AWS services.  For example, if you need to attach containers/services to a load balancer, the AWS infrastructure is already there so the different pieces of the infrastructure really mesh well together.

The biggest downside so far is that the ECS console interface is still clunky.  It is functional, and I have been able to use it to do everything I have needed but it just feels like it needs some polish and things are nested in menu’s and usually not easy to find.  I’m sure there are plans to improve the interface and as mentioned above some new features were recently announced, so I have a feeling there will be some nice improvements on the way.

I haven’t tried the CLI tool yet but it looks promising for automating containers and services.

Setting things up

Since I am a big fan of CoreOS I decided to try turning up my ECS cluster using CoreOS as the base OS and Terraform to do the heavy lifting and provisioning.

The first step is to create your cluster.  I noticed in the AWS console there was a configuration wizard that guides you through your first cluster which was annoying because there wasn’t a clean way to just create the cluster.  So you will need to follow the on screen instructions for getting your first environment set up.  If any of this is unclear there is a good guide for getting started with ECS here.

After your cluster has been created there is a menu that shows your ECS environments.

ECS cluster menu

 

 

 

 

 

 

 

 

 

 

Next, you will need to turn on the nodes that will be connecting to this cluster.  The first part of this is to get your cloud-config set up to connect to the cluster.  I used the CoreOS docs to set up the ECS agent, making sure to change the ECS_CLUSTER= section in the config.

#cloud-config

coreos:
  units:
  -
  name: amazon-ecs-agent.service
  command: start
  runtime: true
  content: |
  [Unit]
  Description=Amazon ECS Agent
  After=docker.service
  Requires=docker.service
  Requires=network-online.target
  After=network-online.target

  [Service]
  Environment=ECS_CLUSTER=my-cluster
  Environment=ECS_LOGLEVEL=warn
  Environment=ECS_CHECKPOINT=true
  ExecStartPre=-/usr/bin/docker kill ecs-agent
  ExecStartPre=-/usr/bin/docker rm ecs-agent
  ExecStartPre=/usr/bin/docker pull amazon/amazon-ecs-agent
  ExecStart=/usr/bin/docker run --name ecs-agent --env=ECS_CLUSTER=${ECS_CLUSTER} --env=ECS_LOGLEVEL=${ECS_LOGLEVEL} --env=ECS_CHECKPOINT=${ECS_CHECKPOINT} --publish=127.0.0.1:51678:51678 --volume=/var/run/docker.sock:/var/run/docker.sock --volume=/var/lib/aws/ecs:/data amazon/amazon-ecs-agent
  ExecStop=/usr/bin/docker stop ecs-agent

Note that the Environment=ECS_CLUSTER=my-cluster, this is the most important bit to get the server to check in to your cluster, assuming you named it “my-cluster”.  Feel free to add any other values your infrastructure may need.  Once you have the config how you want it, run it through the CoreOS cloud-config validator to make sure it checks out.  If everything looks okay there, your cloud-config should be ready to go.

You can find more info about how to configure the ECS agent in the docs here.

Once you have your cloud-config in order, you will need to get your Terraform “recipe” set up.  I used this awesome github project as the base for my own project.  The Terraform logic from there basically creates an AWS launch config and autoscaling group (and uses the cloud-config from above) to launch instances in to your cluster.  And the ECS agent takes care of the rest, once your servers are up and the agent is reporting in to the cluster.

launch_config.tf

resource "aws_launch_configuration" "ecs" {
  name = "ECS ${var.cluster_name}"
  image_id = "${var.ami}"
  instance_type = "${var.instance_type}"
  iam_instance_profile = "${var.iam_instance_profile}"
  key_name = "${var.key_name}"
  security_groups = ["${split(",", var.security_group_ids)}"]
  user_data = "${file("../cloud-config/ecs.yml")}"

  root_block_device = {
    volume_type = "gp2"
    volume_size = "40"
  }
}

Notice the user_data section.  This is where we inject the cloud config from above to provision CoreOS and launch the ECS agent.

autoscaler.tf

resource "aws_autoscaling_group" "ecs-cluster" {
  availability_zones = ["${split(",", var.availability_zones)}"]
  vpc_zone_identifier = ["${split(",", var.subnet_ids)}"]
  name = "ECS ${var.cluster_name}"
  min_size = "${var.min_size}"
  max_size = "${var.max_size}"
  desired_capacity = "${var.desired_capacity}"
  health_check_type = "EC2"
  launch_configuration = "${aws_launch_configuration.ecs.name}"
  health_check_grace_period = "${var.health_check_grace_period}"

  tag {
    key = "Env"
    value = "${var.environment_name}"
    propagate_at_launch = true
  }

  tag {
    key = "Name"
    value = "ECS ${var.cluster_name}"
    propagate_at_launch = true
  }
}

There are a few caveats I’d like to highlight with this approach.  First, I already have an AWS infrastructure in place that I was testing agains this.  So I didn’t have to do any of the extra work to create a VPC, or a gateway for the VPC.  I didn’t have to create the security groups and subnets either, I just added them to the Terraform code.

The other caveat is that if you want to use the Github project I linked to you will need to make sure that you populate the variables with your own environment specific values.  That is why having the VPC, subnets and security groups was handy for me.  Be sure to browse through the variables.tf file and substitute in your own values.  As an example,  I had to update the variables to use the CoreOS 766.4.0 image.  This AMI will be specific to your AWS region so make sure to look up the AMI first.

variable "ami" {
  /* CoreOS 766.4.0 */
  default = "ami-dbe71d9f"
  description = "AMI id to launch, must be in the region specified by the region variable"
}

Another part I had to modify to get the Github project to work was adding in my AWS credentials which look similar to the following.  Make sure to update these variables with your ID and secret.

provider "aws" {
  access_key = "${var.access_key}"
  secret_key = "${var.secret_key}"
  region = "${var.region}"
}

variable "access_key" {
  description = "AWS access key"
  default = "XXX"
}

variable "secret_key" {
  description = "AWS secret access key"
  default = "xxx"
}

Make sure to also copy/edit the autoscaling.tf and launch_config.tf files to reflect anything that is specific to your environment (Terraform will complain if there are issues).

After you have combed through the variables.tf and updated the Terraform files to your liking you can simply run terraform plan -input=false and see how Terraform will create the ASG for you.

If everything looks good, you can run terrafrom apply -input=false and Terraform will go out and start building your new ECS infrastructure for you.  After a few minutes check the EC2 console and your launch config and autoscaling group should be in there.  If that stuff all looks okay, check the ECS console and your new servers should show up and be ready to go to work for you!

NOTE: If you are starting from scratch, it is possible to do all of the infrastructure provisioning via Terraform but it is too far out of the scope of this post to cover because there are a lot of steps to it.

Read More

Graphite threshold alerting with Sensu

Instrumenting your code to report application level metrics is definitely one of the most powerful monitoring tasks you can accomplish.  It is damn satisfying to get working the first time as well.  Having the ability to look at your application and how it is performing at a granular level can help identify potential issues or bottlenecks but can also give you a greater understanding of how people are interacting with the application at a broad scale.  Everybody loves having these types of metrics to talk about their apps and products so this style of monitoring is a great win for the whole team.

I don’t want to dive in to the specifics of WHAT you should monitor here, that will be unique to every environment.  Instead of covering the what and how of instrumenting the code to report specific metrics, I will be running through an example of what the process might look like for instrumenting a check and alarm for monitoring and alerting purposes at an operations level.  I am not a developer, so I don’t spend a lot of time thinking about what types of things are important to collect metrics on.  Usually my job instead is to figure out how to monitor and alert effectively, based on the metrics that developers come up with.

Sensu has a great plugin to check Graphite thresholds in their plugin repo.  If you haven’t looked already, take a minute to glance over the options a little bit and see how the plugin works.  It is a pretty simple plugin but has been able to do everything I need it to.

One common monitoring task is to check how long requests are taking.  So in this example, we are querying the Graphite server and reporting a critical  status (status 1) if the request averages more than 7 seconds for a response time.

Here is the command you would run manually to check this threshold.  Make sure to download the script if you haven’t already, you can just copy the code directly or clone the repo if you are doing this manually.  If you are using Sensu you can use the sensu_plugin LWRP to grab the script (more on that below).

./check-data -s <servername:port> -t <graphite query> -c 7000 -u user -p password
./check-data -s graphite.example.com -t alias(stats.timer.server.response_time.mean, 'Mean') -c 7000 -u myuser -p awesomepassword

There are a few things to note.  The -s flag specifies which graphite server or endpoint to hit, -t specifies the target or the graphite query to run the script against, the -c flag sets the threshold, -u and -p are used if your Graphite server uses authentication.  If your Graphite instance is public it should probably use auth, otherwise if it is internal only, probably not as important.  Obviously these are just dummy values, included to give you a better idea of what a real command should look like.  Use your own values in their place.

The query we’re running is against a statsd metric that for mean response time for a request that gets recorded from the code (this is the developer instrumenting their code part I mentioned).  This check is specific to my environment so you will need to modify any of your queries to make sure to alert on a useful metric and threshold in your own environment.

Here’s an example of what the graphite graph (rendered in Grafana) looks like.

Sensu Graph

Obviously this is just a sample but it should give you the general idea of what to look for.

If you examine the script, there are a few Ruby Gem requirements to get the script to run, which you will need to be installed if you haven’t already.  They are sensu-plugin, json, json-uri and openssl.  You don’t need the sensu-plugin if you are just running the check manually but you WILL need to have it installed on the Sensu client that will be running the scheduled check.  That can be done manually or with the Sensu Chef recipe (specifically for turning on Sensu embedded ruby and ruby gems), which I recommend using anyway if you plan on doing any type of deployments at scale using Sensu.

Here is the Chef code looks like if you use Sensu to deploy this check automatically.

sensu_check "check_request_time" do 
  command "#{node['sensu']['plugindir']}/check-data.rb -s graphite.example.com -t \"alias(stats.timers.server.facedetection.response_time.mean, 'Mean')\" -c 7000 -a 360 -u myuser -p awesomepassword"
  handlers ["pagerduty", "slack"] 
  subscribers ["core"] 
  interval 60 
  standalone true 
  additional(:notification => "Request time above threshold", :occurrences => 5)
end

This should look familiar if you have any background using the Sensu Chef cookbook.  Basically we are using the sensu_check LWRP to execute the script with the different parameters we want, using the pagerduty and slack handlers, which are just fancy ways to pipe out the results of the check.  We are also saying we want to run this on a scheduled interval time of 60 seconds as a standalone check, which means it will be executed on the client node (not the Sensu server itself).  Finally, we are saying that after 5 failed checks we want to append a message to the handler that says what exactly is going wrong.

You can stick this logic in an existing recipe or create a new one that handles your metrics threshold checks.  I’m not sure what the best practice is for where to put the check but I have a recipe that runs standalone threshold checks that I stuck this logic in to and it seems to work.  Once the logic has been added you should be able to run chef-client for the new check to get picked up.

Read More

Intro to Systemd

I have a rocky relationship with Systemd.  On the one hand I love how powerful and extensive it is.  On the other hand, I hate how cumbersome and clunky it can sometimes feel.  There are a TON of moving components and it is very confusing to use if you have no experience with it.  The aim of this post is NOT to debate relative merits of Systemd but instead to take users through a few basic examples of how to accomplish tasks with Systemd and get familiar with how to manage systems with this framework.

My background is primarily with Debian/Ubuntu so moving over to this init system has been a learning curve.  The problem I had when I first made the transition is that there aren’t a lot of great resources currently for making the change.

Most of my knowledge has been pieced together from various blog posts and sites, the best of which is over at the Arch wiki.  As Systemd continues to mature it is becoming increasingly easier to find good guides and resources but the Arch wiki is still the defacto, go to place to find resources about how to use Systemd.

Another thing that has helped is forcing myself to use CoreOS, which, for better or worse has forced me to learn how to use Systemd.  It seems that most modern Linux distro’s are moving towards Systemd, so it’s probably worth it to at least begin experimenting with it at this point.  Ubuntu 15.04 as well as Debian 8 have both made the jump, along with other RedHat based distro’s, with more to follow.

While the learning curve can be a little steep to start with, you can learn about 80% of the things that Systemd can do with about 20% of the effort.  Getting the basics down takes a little bit of effort but will more than be enough to manage a system.  Before starting, as a disclaimer, I do not claim to be an expert but I have learned the hard way how things work and sometimes don’t work with Systemd.

Basics of Systemd

So now that we have a little bit of background stuff out of the way, we can look at some of the meat and potatoes of Systemd.  The first thing to do is get an idea of what services are running on your system.

systemctl

This will give you a listing of everything running on your system.  Usually there are a very large number of units listed.  This isn’t really important but say for example, there is a failed unit.  You can filter systemctl to only spit out failed units.

systemctl --failed

That’s pretty cool.  If, for examle there is a failed unit file you can drill down in to that unit specifically.

systemctl status <unit>

This will give you process and service information for the unit file and the last 10 lines of logs for the unit as well, which can be really handy for troubleshooting.

One of the easier ways to work with unit files is to use the edit subcommand.  This method removes much of the complexity of having to know exactly where all of the unit files live on the system.

systemctl edit --full <unit>

You can manage and interact with Systemd and the unit files in a straight forward way.  We will examine a few examples.

systemctl start <unit> - start a stopped service
systemctl stop <unit> - stop a running service
systemctl restart <unit> - restart a service
systemctl disable <unit> - stop unit from loading on boot
systemctl enable <unit>  - load unit on boot

If you make a change to a service or unit file (we will go over this later) you need to reload the Systemd daemon for it to pick up the changes to the file.

sytemctl daemon-reload

This is super important (on CoreOS at least) when troubleshooting because if you don’t run the reload your process will not update and will not change its behavior.

There are many many more tips and tricks for working with systemctl but this should give users a basic grasp on interacting with components of the system.  Again, for lots more details I advise that you check out the wiki.

Journalctl

If you are coming from older Debian/Ubuntu distros you will need to get familiar with using journalctl to view and manipulate your log files.

One of the first changes that people notice right away is that there aren’t any beloved log files /var/log in Systemd any more.  Well, there are still logs, but it isn’t what most folks are used to.

The files haven’t gone away, they are simply disguised as a binary file and are accessible via journald.  To check how much space your logs are taking up on disk you can use the following command.

journalctl --disk-usage

Journalctl is a powerful tool for doing just about everything you can think of with log files.  Because journald writes logs as a binary file you need this tool to interact with them.  That’s okay though because journalctl has pretty much all of the features needed to look through logs with its built in flags.  For example, if you want to tail a log file, you can use the “-f” flag to follow “-u” to specify the systemd unit and the systemd unit name to follow it as illustrated below (this is a CoreOS system so your results may vary on a different OS).

journalctl -f -u motdgen

To follow all the entries that journalctl is picking up (pretty much just tailing all the logs as an equivalent) you can just use this.

journalctl -f

Sometimes you only want to view the last X number of entries in a log file.  To look at older entries you can use the “-n” flag.

journalctl -n 100 -u motdgen

You can filter logs to only show messages since the last boot.

journalctl -b

There are many more options and the filtering capabilities of journalctl are very powerful.  Additionally, journalctl offers JSON outputs, filtering by log priority, granular filtering by time with –unit and –since flags, filtering on different boots, and more.

I suggest exploring on your own a little bit to see how far the capabilities can go.  Here is a link to all of the various flags that journalctl has as well as a good writeup to get a good idea of some of the more advanced capabilities of journalctl.

Drop-in Units

Writing unit files is somewhat of an art form.  I feel that it can quickly become such a complicated topic that I won’t discuss it in full detail here.

By default, Systemd unit files are written to /usr/lib/systemd/system and custom defined unit files are written to /etc/systemd/system.  Systemd looks at both locations for which files to load up.  Because of this, you can extend base units that live in the /usr/lib path by augmenting them in /etc/systemd.

This has a few implications.  First, it makes management of unit files a little bit easier to work with.  It also makes extending units a little bit easier.  You don’t need to worry about manipulting the system level units, you just add functionality on top of them in /etc/systemd.

Drop-in units are a handy feature if you need to extend a basic unit.  To make the drop-in work, it must follow the format of /etc/systemd/system/unit.d/override.conf, where unit.d is the full name of the service that you are extending.

The following example extends the CoreOS baked in etcd2 service

/etc/systemd/system/etcd2.service.d/30-configuration.conf

[Service]
# General settings
Environment=ETCD_DEBUG=true

Replacing Cron

Well not exactly.  Systemd doesn’t use cron jobs so if you want to get something to run on a schedule you will need to use a timer unit and associate a service with it.

[Unit]
Description=Log file cleaner (runs daily at midnight)

[Timer]
OnCalendar=daily

Notice that there is a log-cleaner.timer as well as a log-cleaner.service (below).  The easiest way I have found is to create a service/timer pair, which ensures that the timer runs the service based on its schedule.

[Unit]
Description=Log file cleaner

[Service]
ExecStart=/usr/bin/bash -c "truncate -s 1m /data/*.log"

The only thing that the service is doing is running a shell command which will get run based on the timer directive in the timer unit file.

The timer directive in the timer unit is pretty flexible.  For example, you can tell the timer to run based on the calendar either by day, week or month or alternately you can have the unit run services based on the system time by seconds, minutes or hours.

One of the down sides of timers is that there is not an easy way to email results.  With cron this is easily configurable.  Likewise, timers don’t have a random delay to run jobs like cron has.  So, timers are definitely no replacement for cron but have a lot of usable functionality, it is just confusing to learn at first, especially if you are used to cron.

Here is more involved tutorial for working with timers.  Also, the Arch wiki once again is great and has a dedicated section for timers.

Conclusion

This introduction is really just the beginning.  There are so many more pieces to the Systemd puzzle, I have just covered the basics, in fact I have deliberately chosen not to cover many of the topics and concepts because they can convolute the learning process and slow down the knowledge.  I chose to focus instead on the most basic ideas that are used most frequently in day to day administration.  Feel free to go do your own research and play around with the more advanced topics.

Systemd aims to solve a lot of problems but brings about a lot of complexity as well to accomplish this task.  In my opinion it is definitely worth investing the time and energy to learn this system and service manager though, pretty much all the Linux distro’s are moving towards it in the future and you would be doing yourself a disservice by not learning it.

Other areas of Systemd that aren’t covered in this post but are worth researching and playing with are:

  • Targets (analogous to runlevels)
  • Managing NTP with timedateclt
  • Network management with networkd
  • Hostname management with hostnamectl
  • Username management with loginctl

One of the easiest ways to get started with systemd is by grabbing a Vagrant box that leverages Systemd.  I like the CoreOS Vagrant box but there are many other options for getting started.  One of the best tools for learning is by doing, so take some of these commands and resources and go start playing around with Systemd.  Feel free if you have any extra additions or need help getting started with Systemd.

Read More