Running ELK on Docker

March 17, 2015September 22, 2015 Josh Reichardt 8 Comments

I wrote a post awhile back about how to get the ElasticSearch + Logstash + Kibana stack set up and recently have been very involved with Docker so thought it would be appropriate to update that post with the new Docker way of doing things.

Update (5/9/15) – I have created a github repo containing configs for running this. Reader Sergio also has a solution posted a similar solution on github, you can check it out here if you want to try it out.

I found a surprising lack of posts describing how to run the ELK stack with Docker and docker-compose. This post is much longer and more detailed than usual so feel free to jump around to different sections for details on different components. I am planning on doing a follow up on to this post with instructions about how to configure Logstash and the logstash-forwarder client as well as Kibana to do interesting things with logs stored in ElasticsSearch.

There are a lot of other posts about how to get the stack to work but they are either out of date already since the Docker world changes so fast or don’t cover specific details of how different bit work. The other thing I have observed is that most of the other guides are not done with Docker which is something that makes life easier.

So the first thing that I’ll cover is how to build the Docker images. If you are interested I can make this stuff available on the Docker hub as images or public Dockerfiles. However, for this article and in genreal I strongly prefer to write my own Dockerfiles so I will be posting my custom configs and files here rather than pulling other (sometimes official) prebuilt images.

Logstash

The first component we will get set up is the Logstash server. This setup is also using the log-courier input plugin. Log-courier is a more customizable and flexible client for forwarding logs to Logstash. I use both logstash-forwarder and log-courier in this configuration to allow for a more flexible setup.

The following is a Dockerfile that will build a Logstash 1.5.0 image. One thing to note about this approach is that you can swap out the LOGSTASH_VER and the image will be updated to the correct version automatically and will be ready to be deployed whem the image gets rebuilt.

FROM ubuntu:14.04

ENV DEBIAN_FRONTEND noninteractive
ENV LOGSTASH_VER 1.5.0.rc2
WORKDIR /opt

# Dependencies
RUN apt-get update -qq && \
 apt-get install -y -qq \
 wget \
 python \
 openjdk-7-jre-headless

# Install Logstash
RUN wget --quiet "https://download.elasticsearch.org/logstash/logstash/logstash-$LOGSTASH_VER.tar.gz" -O "/opt/logstash-$LOGSTASH_VER.tar.gz" --no-check-certificate && \
 tar zxf logstash-$LOGSTASH_VER.tar.gz && \
 mv logstash-$LOGSTASH_VER logstash

# Install plugins
RUN /opt/logstash/bin/plugin update logstash-output-zeromq
RUN /opt/logstash/bin/plugin install logstash-input-log-courier

# Config files
ADD server.conf /etc/logstash/server.conf
ADD logstash-forwarder.key /etc/logstash/logstash-forwarder.key
ADD logstash-forwarder.crt /etc/logstash/logstash-forwarder.crt

# lumberjack port
EXPOSE 4545
# log-courier port
EXPOSE 4546

CMD /opt/logstash/bin/logstash -f /etc/logstash/server.conf

This will install version 1.5.0.r2 the logstash-input-log-courier input for logstash, add certificates for the forwarding clients and start Logstash with the server configuration.

In addition to this Dockerfile you will need to generate some certificates for logstash-forwarder clients and the Logstash server itself to use, as well as the server configuration used by the Logstash server. Below I have a sample but extremely barebones server.conf configuration file.

input {
  lumberjack {
    port => 4545
    ssl_certificate => "/etc/logstash/logstash-forwarder.crt"
    ssl_key => "/etc/logstash/logstash-forwarder.key"
    codec => plain {
      charset => "ISO-8859-1"
    }
  }

  courier {
    port => 4546
    ssl_certificate => "/etc/logstash/logstash-forwarder.crt"
    ssl_key => "/etc/logstash/logstash-forwarder.key"
  }
}

output {
  elasticsearch {
    cluster => "elasticsearch"
  }
}

I will thicken this config up in the next post on how to customize Logstash and doing interesting things with Kibana. For now, we are defining a courier and lumberjack input, used to ingest logs in to Logstash as well as one output, which is telling Logstash what to do with the logs, in this example it is just stuffing them in to ES.

To generate the certificates needed for logstash/logstah-forwarder you can either follow the instructions on the logstash-forwarder github page or you can use the following command to generate the certs and subsequently inject them in to the Docker image. It should probably go without saying but make sure the version of openssl used to generate these is an update to date and secure version.

openssl req -x509 -nodes -sha256-days 1095 -newkey rsa:2048 -keyout logstash-forwarder.key -out logstash-forwarder.crt

You will need to follow a few prompts to fill out the certificate details, again you can reference the logstash-forwarder github page if you get stuck or are unsure of how to configure the certificate.

After the certs are generated make sure that the names of the output file names match up to the names in the above Dockerfile and that is pretty much it for getting Logstash ready.

ElasticSearch

The ElasticSearch image is also pretty straight forward. This will build the specified version from the ES_VER variable which is 1.4.4 currently and will configure a few things.

# Pull base image.
FROM dockerfile/java:oracle-java7

# Set install version
ENV ES_PKG_NAME elasticsearch-1.4.4

# Install ElasticSearch
RUN \
 cd / && \
 wget https://download.elasticsearch.org/elasticsearch/elasticsearch/$ES_PKG_NAME.tar.gz && \
 tar xvzf $ES_PKG_NAME.tar.gz && \
 rm -f $ES_PKG_NAME.tar.gz && \
 mv /$ES_PKG_NAME /elasticsearch

# Define mountable directories
VOLUME ["/data"]

# Define working directory
WORKDIR /data

# Custom ES config
ADD elasticsearch.yml /elasticsearch/config/elasticsearch.yml

# Define default command
CMD ["/elasticsearch/bin/elasticsearch"]

# Expose ports
EXPOSE 9200
EXPOSE 9300

The main key to getting ES to work is getting the configuration file set up correctly. In this example we are mounting local storage (/data) from the host OS in to the container so that if the container dies the data and indexes and other data aren’t wiped out. There are also a few other security configuration settings that get set here to lock things down and also to make Kibana 4 happy.

http.cors.allow-origin: "/.*/"
http.cors.enabled: true
cluster.name: elasticsearch
node.name: "logstash.domain.com"
path:
 data: /data/index
 logs: /data/log
 plugins: /data/plugins
 work: /data/work

ES is very straight forward to set up, you just set it up and it runs.

Kibana

This will build the newest iteration of Kibana, which is 4.0.0 as of this writing. If you aren’t living on the bleeding edge and want to know how to get Kibana 3.x.x working let me know and I will post the configuration for it.

FROM ubuntu:14.04

# Dependencies
RUN apt-get update -qq
RUN sudo apt-get install -y -qq nginx-full wget vim

# Kibana
RUN mkdir -p /opt/kibana
RUN wget https://download.elasticsearch.org/kibana/kibana/kibana-4.0.0-linux-x64.tar.gz -O /tmp/kibana.tar.gz && \
 tar zxf /tmp/kibana.tar.gz && mv kibana-4.0.0-linux-x64/* /opt/kibana/

# Configs
ADD kibana.yml /opt/kibana/config/kibana.yml

EXPOSE 5601

CMD /opt/kibana/bin/kibana

So the Dockerfile is pretty straight forward but there were a few tidbits to be aware of. Kibana 4.x.x if significantly different in how it works than 3.x.x so you will need to make a few adjustments if you are familiar with the old version.

You will need to pick and choose the bits out of the following configuration to suit your needs. For example, you will need to adjust the elasticsearch_url, username, password and will need to decide whether to turn ssl on or off. There are obviously more options but most of them probably don’t need to be adjusted for now. Here is what the sample config looks like.

port: 5601
host: "0.0.0.0"
elasticsearch_url: "http://logstash.domain.com:9200"
elasticsearch_preserve_host: true
kibana_index: ".kibana"
default_app_id: "discover"
request_timeout: 300000
shard_timeout: 0
verify_ssl: false

# Plugins that are included in the build, and no longer found in the plugins/ folder
bundled_plugin_ids:
 - plugins/dashboard/index
 - plugins/discover/index
 - plugins/doc/index
 - plugins/kibana/index
 - plugins/markdown_vis/index
 - plugins/metric_vis/index
 - plugins/settings/index
 - plugins/table_vis/index
 - plugins/vis_types/index
 - plugins/visualize/index

That’s pretty much it, most of the difficulty of getting the new version of Kibana working is in the configuration so if you want to tweak things or if something isn’t working that is the first place to look.

Docker Compose (glue the pieces together)

This is an integral part of the setup. This is what controls the different containers and what glues everything together. Luckily it is easy to get set up and working. If you aren’t familiar, docker-compose was recently rebranded from the old “fig” tool which has been branded as a Docker orchestration tool for running complex Docker applications easily.

The official docs are pretty good and detailed so you can visit their site if you have any questions about how to install or how to get any of the components working here.

The first step is to download and install docker compose. Here I am using and Ubuntu system.

sudo pip install -U docker-compose

There are a few docker-compose command line commands to be familiar with which we’ll get to next, but first I will post the sample docker-compose configuration file to test out your ELK stack.

kibana:

 build: /home/<user>/elk/kibana/4.0.0
 restart: always
 ports:
 - "5601:5601"
 links:
 - "elasticsearch:elasticsearch"

elasticsearch:

 build: /home/<user>/elk/elasticsearch/1.4.4
 restart: always
 ports:
 - "9200:9200"
 - "9300:9300"
 volumes:
 - "/data:/data"

logstash:

 build: /home/<user>/elk/logstash/logstash-1.5.0
 restart: always
 ports:
 - "4545:4545"
 - "4546:4546"

Most of the configuration is straight forward. Here are the main commands to get everything stitched to gether and working.

docker-compose build (from the directory the docker-compose.yml file is in)
docker-compose up (to test the stack)
docker-compose kill (bring it down)

After you have all the issues ironed out building and running and the stack is stable with no errors on start you can start up the stack in detached mode.

docker-compose up -d

Additionally, you can look at the logs if something smells fishy.

docker-compose logs

Design considerations

One thing that readers might wonder about is the scalability of this setup. This will scale up very easily but not out. However, this should be able to handle up to 100k events/second on the Logstash end so there will be other bottlenecks before the components (ES and Logstash) fall down. I haven’t pushed my own setup this far yet but have been able to get to around 30k/sec before Logstash dies, which I’m still investigating. Even with that amount of activity and Logstash choking, ES and Kibana don’t get affected.

So if you use this as a guide for a production setup I would recommend that you use a decently sized server, at least 4 CPU, 8 GB memory and adjust the memory and cpu options for the Logstash component if you plan on throwing a lot of logs at it (>30k/s). I will revisit once I have worked out all the performance issues with some best practices for making Logstash run more smoothly.

I would be interested in knowing how to scale this out if anybody has any recommendations but this setup should scale up decently at least for most scenarios. I have not played around with ES sharding across hosts but I imagine that it wouldn’t be super complicated, especially using container volume mounts to store the data and indexes at the hose OS level.

Mount an external EBS volume in AWS

February 25, 2015June 29, 2020 Josh Reichardt 10 Comments

Creating and attaching external volumes is one of those things in administration that is really nice to know how to do but for me is also one that doesn’t happen every day so it is really easy to forget how to do which makes it a little bit more painful, especially with deadlines and people watching over your shoulder. So, having said that, I think it is probably worth writing a post about how to do it because it happens just enough that I have trouble getting everything straight, and I’m sure others run in to this as well, so that’s what I will be writing about today.

There is good documentation for how to do this but there are a lot of separate steps so consolidating the components might be helpful to readers who stumble across this. I’m sure there are other ways to accomplish this but I don’t think it is necessary to cover everything here.

Create your “floating volume”

This step is straight forward. In the AWS EC2 console choose the type of volume this will be (SSD or magnetic), availability zone, and any other options here.

After your volume has been created you will want to attach it to an instance. This part is important because the changes could break your OS volume if you write to your fstab file incorrectly. In this example I am choosing to attach the EBS volume as /dev/xvdf, but you could name it differently if it corresponds to your setup.

After the volume has been mounted you can check that it has been picked up by the OS by either checking the /dev directory or by running fdisk -l and looking for the size of the disk you just attached.

It is worth pointing out that all of the steps in the AWS console can alternatively be done with the aws-cli tool. It is probably easier but for the sake of time and illustration I am leaving those steps out. Feel free to reach out if you are interested in the cli tool and I can update this post.

If you run fdisk -l you will notice that the device is empty, so you will need to format the disk. In this instance I am formatting the disk as ext4. So use the following command to format it.

sudo mkfs.ext4 /dev/xvdf

After the volume has been formatted you can mount it to your OS.

Attaching the volume

sudo mkdir /data
sudo mount -t ext4 /dev/xvdf /data

If you need to resize the filesystem for whatever reason, you can use the resize2fs command.

sudo resize2fs <mount point>
sudo resize2fs /dev/xvdf

Here you will create the directory (if it doesn’t already exist) to mount the volume to and then mount it. At this point it would be fine to be done if you just needed temporary access to the storage on this device. But if you want your mount to persist and to survive a reboot then you just add an entry to your /etc/fstab file to make sure the /data directory gets the volume mounted to it after a reboot. Something like the following would work.

/dev/xvdf       /data   ext4    defaults,nobootwait        0       0

The entry is pretty easy to follow but may be confusing for those who are not familiar with how fstab works. I will break down the various components here.

The first parameter is the location of the volume (/dev/xvdf) and is referred to as the file_system field.

The second parameter specifies where to mount the volume to (/data) and is referred to as the dir field.

The third field is the type. This is where you specify the file system type or device to be mounted. If you didn’t format this volume previously it would crate problems for OS when it tried to load in your volume from this file.

The fourth section is the options for the mount. Here, the defaults,nobootwait section is very important. If you don’t have the nobootwait option specified here then your OS could potentially hang on boot up if it couldn’t find the specified volume, so this option helps escape it if there are any problems.

The fifth field is to either enable or disable the dump option. Unless you are familiar with or use the dump command you will almost always set this to 0.

The last section is the pass section. This simply tells the OS if it should run an fsck or not on this volume. Here I have it set to 0 so it doesn’t get checked but for OS volumes, this could be important to turn on.

Next steps

There are many more things you can do with fstab so if you are interested in other options for how to mount volumes you can look at the fstab documentation for more insights and information.

If you ever wanted to float this volume to another host it would be easy to do, and would not require any new or special formatting since this was already taken care of here. The steps would looks similar to the following.

Unmount volume from current OS
Remove entry in /etc/fstab for volume mount
Detach mount in AWS console from current OS
Attach mount to new OS
Mount volume manually in new OS to test if it works
If the mount works add a new entry in /etc/fstab
Done

So that’s pretty much it. Hopefully this is useful for everybody.

Introduction to boot2docker

November 10, 2014August 31, 2015 Josh Reichardt 7 Comments

If you work on a Mac (or Windows) and use Docker then you probably have heard of boot2docker. If you haven’t heard of it before, boot2docker is basically a super lightweight Linux VM that is designed to run Docker containers. Unfortunately there is no support (yet) in Mac OS X or Windows kernels for running Docker natively so this lightweight VM must be used as an intermediary layer that allows the host Operating Systems to communicate with the Docker daemon running inside the VM. This solution really is not that limiting once you get introduced to and become comfortable with boot2docker and how to work around some of the current limitations.

Because Docker itself is such a new piece of software, the ecosystem and surrounding environment is still expanding and growing rapidly. As such, the tooling has not had a great deal of time to mature. So with pretty much anything that’s new, especially in the software and Open Source world, there are definitely some nuances and some things to be aware of when working with boot2docker.

That being said, the boot2docker project bridges a gap and does a great job of bringing Docker to an otherwise incompatible platform as well as making it easy to use across platforms, which especially useful for furthering the adoption of Docker among Mac and Windows users.

When getting started with boot2docker, it is important to note that there are a few different things going on under the hood.

Components

The first component is VirtualBox. If you are familiar with virtual machines, there’s pretty much nothing new here. It is the underpinning of running the VM and is a common tool for creating and managing VM’s. One important note here about VBox. This is currently the key to make volume sharing work with boot2docker to allow a user to pass local directories and files in to containers using its shared folder implementation. Unfortunately it has been pretty well documented that vboxsf (shared folders) have not great performance when compared to other solutions. This is something that the boot2docker team is aware of and working on for future reference. I have a workaround that I will outline below if anyone happens to hit some of these performance issues.

The next component is the VM. This is a super light weight image based on Tiny Core Linux and the 3.16.4 Linux kernel with AUFS to leverage Docker. Other than that there is pretty much nothing else to it. The TCL image is about 27MB and boots up in about 5 seconds, making it very fast, which is nice to get going with quickly. There are instructions on the boot2docker site for creating custom .iso’s if you are interested as well if you are’t interested in building your own customized TCL.

The final component is called boot2docker-cli, which is normally referred to as the management tool. This tools does a lot of the magic to get your host talking to the VM with minimal interaction. It is basically the glue or the duct tape that allows users to pass commands from a local shell in to the container and get Docker to do stuff.

Installation

It is pretty dead simple to get boot2docker set up and configured. You can download everything in one shot from the links on their site http://boot2docker.io or you can install manually on OSX with brew and a few other tools. I highly recommend the packaged installer, it is very straight forward and easy to follow and there is a good video depiction of the process on the boot2docker site.

If you choose to install everything with brew you can use the following commands as a reference. Obviously it will be assumed that brew is already installed and set up on your OSX system. The first step is to install boot2docker.

brew install boot2docker

You might need to install Virtualbox separately using this method, depending on whether or not you already have a good version of Virtualbox to use with boot2docker.

The following commands will assume you are starting from scratch and do not have VBox installed.

brew update
brew tap phinze/homebrew-cask
brew install brew-cask
brew cask install virtualbox

That is pretty much it for installation.

Usage

The boot2docker CLI is pretty straight forward to use. There are a bunch of commands to help users interface with the boot2docker VM from the command line. The most basic and simple usage to initialize and create a vanilla boot2docker VM can be done with the following command.

boot2docker init

This will pull down the correct image and get the environment set up. Once the VM has been created (see the tricks sections for a bit of customization) you are ready to bring up the VM.

boot2docker start

This command will simply start up the boot2docker VM and run some behind the scenes tasks to help make using the VM seamless. Sometimes you will be asked to set ENV variables here, just go ahead and follow the instructions to add them.

There are a few other nice commands that help you interact with the boot2docker VM. For example if you are having trouble communicating with the VM you can run the ip command to gather network information.

boot2docker ip

If the VM somehow gets shut off or you cannot access it you can check its status.

boot2docker status

Finally there is a nice help command that serves as a good guide for interacting with the VM in various ways.

boot2docker help

The commands listed in this section will for the most part cover 90% of interaction and usage of the boot2docker VM. There is a little bit of advanced usage with the cli covered below in the tricks section.

Tricks

You can actually modify some of the default the behavior of your boot2docker VM by altering some of the underlying boot2docker configurations. For example, boot2docker will look in $HOME/.boot2docker/profile for any custom settings you may have. If you want to change any network settings, adjust memory or cpu or a number of settings, you would simply change the profile to reflect the updated changes.

You can also override the defaults when you create your boot2docker VM by passing some arguments in. If you want to change the memory or disk size by default, you would run something like

boot2docker init --memory=4096 --disksize=60000

Notice the –disksize=60000. Docker likes to take up a lot of disk space for some of its operations, so if you can afford to, I would very highly recommending that you adjust the default disk size for the VM to avoid any strange running out of disk issues. Most Macbooks or Windows machines have plenty of extra resources and big disks so usually there isn’t a good reason to not leverage the extra horsepower for your VM.

Troubleshooting

One very useful command for gathering information about your boot2docker environment is the boot2docker config command. This command will give you all the basic information about the running config. This can be very valuable when you are trying to troubleshoot different types of errors.

If you are familiar with boot2docker already you might have noticed that it isn’t a perfect solution and there are some weird quirks and nuances. For example, if you put your host machine to sleep while the boot2docker VM is still running and then attempt to run things in Docker you may get some quirky results or things just won’t work. This is due to the time skew that occurs when you put the machine to sleep and wake it up again, you can check the github issue for details. You can quickly check if the boot2docker VM is out of sync with this command.

date -u; boot2docker ssh date -u

If you notice that the times don’t match up then you know to update your time settings. The best fix for now that I have found for now is to basically reset the time settings by wrapping the following commands in to a script.

#!/bin/sh
 
# Fix NTP/Time
boot2docker ssh -- sudo killall -9 ntpd
boot2docker ssh -- sudo ntpclient -s -h pool.ntp.org
boot2docker ssh -- sudo ntpd -p pool.ntp.org

For about 95% of the time skew issues you can simply run sudo ntpclient -s -h pool.ntp.org to take care of the issue.

Another interesting boot2docker oddity is that sometimes you will not be able to connect to the Docker daemon or will sometimes receive other strange errors. Usually this indicates that the environment variables that get set by boot2docker have disappeared, if you close your terminal window or something similar for example. Both of the following errors indicate the issue.

dial unix /var/run/docker.sock: no such file or directory

Cannot connect to the Docker daemon. Is 'docker -d' running on this host?

The solution is to either add the ENV variables back in to the terminal session by hand or just as easily modify your bashrc config file to read the values in when the terminal loads up. Here are the variables that need to be reset, or appended to your bashrc.

export DOCKER_CERT_PATH=/Users/jmreicha/.boot2docker/certs/boot2docker-vm
export DOCKER_TLS_VERIFY=1
export DOCKER_HOST=tcp://192.168.59.103:2376

Assuming your boot2docker VM has an address of 192.168.59.103 and a port of 2376 for communication.

Shared folders

This has been my biggest gripe so far with boot2docker as I’m sure it has been for others as well. Mostly I am upset that vboxsf are horrible and in all fairness the boot2docker people have been awesome getting this far to get things working with vboxsf as of release 1.3. Another caveat to note if you aren’t aware is that currently, if you pass volumes to docker with “-v”, the directory you share must be located within the “/Users” directory on OSX. Obviously not a huge issue but something to be aware if you happen to have problems with volume sharing.

The main issue with vboxsf is that it does not do any sort of caching sort of caching so when you are attempting to share a large amount of small files (big git repo’s) or anything that is filesystem read heavy (grunt) performance becomes a factor. I have been exploring different workarounds because of this limitation but have not found very many that I could convince our developers to use. Others have had luck by creating a container that runs SMB or have been able to share a host directory in to the boot2docker vm with sshfs but I have not had great success with these options. If anybody has these working please let me know I’d love to see how to get them working.

The best solution I have come up with so far is using vagrant with a customized version of boot2docker with NFS support enabled, which has very little “hacking” to get working which is nice. And a good enough selling point for me is the speed increase by using NFS instead of vboxsf, it’s pretty staggering actually.

This is the project that I have been using https://vagrantcloud.com/yungsang/boxes/boot2docker. Thanks to @yungsang for putting this project together. Basically it uses a custom vagrant-box based off of the boot2docker iso to accomplish its folder sharing with the awesome customization that Vagrant provides.

To get this workaround to work, grab the vagrantfile from the link provided above and put that in to the location you would like to run Vagrant from. The magic sauce in the volume sharing is in this line.

onfig.vm.synced_folder ".", "/vagrant", type: "nfs"

Which tells Vagrant to share your current directory in to the boot2docker VM in the /vagrant directory, using NFS. I would suggest modifying the default CPU and memory as well if your machine is beefy enough.

v.cpus = 4
v.memory = 4096

After you make your adjustments, you just need to spin up the yungsang version of boot2docker and jump in to the VM.

vagrant up
vagrant ssh

From within the VM you can run your docker commands just like you normally would. Ports get forwarded through to your local machine like magic and everybody is happy.

Useful tools for Docker development

October 26, 2014August 31, 2015 Josh Reichardt 17 Comments

Docker is still a young project, and as such the ecosystem around it hasn’t quite matured to the point that many people feel quite comfortable using it at this point. It is nice to have such a fast growing set of tools, however the downside to all of this is that many of the tools are not production ready. I think as the ecosystem solidifies and Docker adoption grows we will see a healthy set of solid, production ready tools that are built off of the current generation of tools.

Once you get introduced to the concepts and ideas behind Docker you quickly realize the power and potential that it holds. Inevitably though, there comes a “now what?” moment where you basically realize that Docker can do some interesting things but get stuck because there are barriers to simply dropping Docker into a production environment.

One problem is that you can’t simply “turn on” Docker in your environment, so you need tools to help manage images and containers, manage orchestration, development, etc. So there are a number of challenged to take Docker and start doing useful and interesting things with it once you get past the introductory novelty of building an image and deploying simple containers.

I will attempt to make sense of the current state of Docker and to help take some of the guesswork out of which tools to use in which situations and scenarios for those that are hesitant to adopt Docker. This post will focus mostly around the development aspects of the Docker ecosystem because that is a nice gateway to working with and getting acquainted with Docker.

Boot2Docker

As you may be aware, Docker does not (yet) support MacOSX or Windows. This can definitely be a hindrance for adopting and building Docker acceptance amongst developers. Boot2Docker massively simplifies this issue by essentially creating a sandbox to work with Docker as a thin layer between Docker and Mac (or Windows) via the boot2docker VM.

You can check it out here, but essentially you will download a package and install it and you are ready to start hacking away on Docker on your Mac. Definitely a must for Mac OS as well as Windows users that are looking to begin their Docker journey, because the complexity is completely removed.

Behind the scenes, a number of things get abstracted away and simplified with Boot2docker, like setting up SSH keys, managing network interfaces, setting up VM integrations and guest additions, etc. Boot2docker also bundles together with a cli for managing the VM that manages docker so it is easy to manage and configure the VM from the terminal.

CoreOS

It would take many blog posts to try to describe everything that CoreOS and its tooling can do. The reason I am mentioning it here is because CoreOS is one of those core building blocks that are recently becoming necessary in any Docker environment. Docker as it is today, is not specifically designed for distributed workloads and as such doesn’t provide much of the tooling around how to solve challenges that accompany distributed systems. However, CoreOS bridges this gap very well.

CoreOS is a minimal Linux distribution that aims to help with a number of Docker related tasks and challenges. It is distributed by its design so can do some really interesting things with images and containers using etcd, systemd, fleetd, confd and others as the platform continues to evolve.

Because of this tooling and philosophy, CoreOS machines can be rebooted on the fly without interrupting services and clustered processes across machines. This means that maintenance can occur whenever and wherever, which makes the resiliency factor very high for CoreOS servers.

Another highlight is its security model, which is a push based model. For example, instead of manually updating servers with security patches, the CoreOS maintainers periodically push updates to servers, alleviating the need to update all the time. This was very nice when the latest shellshock vulnerability was released because within a day or so, a patch was automatically pushed to all CoreOS servers, automating the otherwise tedious process of updating servers, especially without config management tools.

Fig

Fig is a must have for anybody that works with Docker on a regular basis, ie developers. Fig allows you to define your environment in a simple YAML config file and then bring up an entire development environment in one command, with fig up.

Fig works very well for a development work flow because you can rapidly prototype and test how Docker images will work together and eliminate issues that might crop up without being able to test things so easily. For example if you are working on an application stack you can simply define how the different containers should work and interact together from the fig file.

The downside to fig is that in its current form, it isn’t really equipped to deal with distributed Docker hosts, something that you will find a large number of projects are attempting to solve and simplify. This shouldn’t be an issue though, if you are aware of its limitations beforehand and know that there are some workloads that fig is not built for.

Panamax

This is a cool project out of Century Link Labs that aims to solve problems around docker app development and orchestration. Panamax is similar to Fig in that it stitches containers together logically but is slightly different in a few regards. First, Panamax builds off of CoreOS to leverage some of its built in tools, etcd, fleetd, etc. Another thing to note is that currently Panamax only supports single host deployments. The creators of Panamax have stated that clustered support and multi host tenancy is in the works but for now you will have to use Panamax on a single host.

Panamax simplifies Docker images and application orchestration (kind of), in the background and additionally places a nice layer of abstraction on top of this process so that managing the Docker image “stack” becomes even easier, through a slick GUI. With the GUI you can set environment variables, link containers together, bind ports and volumes.

Panamax draws a number of its concepts from Fig. It uses templates as the underlying way to compose containers and applications, which is similar to the Fig config files, as both use YAML files to compose and orchestrate Docker container behavior. Another cool thing about Panamax is that there is a public template repo for getting different application and container stacks up and running, so the community participation is a really nice aspect of the project.

If setting up a command line config file isn’t an ideal solution in your environment, this tool is definitely worth a look. Panamax is a great way to quickly develop and prototype Docker containers and applications.

Flocker

This is a very young but interesting project. The project looks interesting because of the way that it handles and deals volume management. Right now one of the biggest challenges to widespread Docker adoption is exactly the problem that Flocker solves in its ability to persist storage across distributed hosts.

From their github page:

Flocker is a data volume manager and multi-host Docker cluster management tool. With it you can control your data using the same tools you use for your stateless applications by harnessing the power of ZFS on Linux.

Basically, Flocker is using some ZFS magic behind the scenes to allow volumes to float between servers, to allow for persistent storage across machines and containers. That is a huge win for building distributed systems that require persisnt data and storage, eg databases.

Definitely keep an eye on this project for improvements and look for them to push this area in the future. The creators have said it isn’t production ready just quite yet but is a great tool to use in a test or staging environment.

Flynn

Flynn touts itself as a Platform as a Service (PasS) built on Docker, in a very similar vein to Heroku. Having a Docker PaaS is a huge win for developers because it simplifies developer workflow. There are some great benefits of having a PaaS in your environment, the subject could easily expand to be its own topic of conversation.

The approach that Flynn takes (and Paas in general) is that operations should be a product team. With Flynn, ops can provide the platform and developers can focus on their tasks much more easily, developing software, testing and generally freeing developers the time to focus on development tasks instead of fighting operations. Flynn does a nice job of decoupling operations tasks from dev tasks so that the developers don’t need to rely on operations to do their work and operations don’t need to concern themselves with development tasks which can cause friction and create efficiency issues.

Flynn works by basically tying a number of different tools together created specifically to solve challenges of building a PaaS to perform their workloads via Docker (scheduling, persistent storage, orchestration, clustering, etc) as one single entity.

Currently its developers state that Flynn is not quite suitable for production use yet, but it is still mature enough to use and play around with and even deploy apps to.

Deis

Deis is another PaaS for Docker, aiming to solve the same problems and challenges that Flynn does, so there is definitely some overlap in the projects, as far as end users are concerned. There is a nice CLI tool for manaing and intereacting with Deis and it offers much of the same functionality that either heroku or Flynn offer. Deis can do things like horizontal application scaling, supports many different application frameworks and is Open Source.

Deis is similar in concept to Flynn in that it aims to solve PaaS challenges but they are quite different in their implementation and how they actually achieve their goals.

Both Flynn and Deis aim to create platforms to build Docker apps on top of but do so in somewhat different means. As the creator Deis explains, Deis is very much more practical in its approach to solving PaaS issues because it is basically taking a number of available technologies and tools that have already been created and is fitting them together only creating the pieces that are missing, while Flynn seems to be very much more ambitious in its approach due to the fact that it is implementing a number of its own tooling and solutions, including its own scheduler, registration service, etc and only relying on a few tools that are already in existence. For example, while Flynn does all of these different things, Deis leverages CoreOS to do many of the tasks it needs to operate and work correctly while minimally bolting on tooling that it needs to function correctly.

Conclusion

As the Docker ecosystem continues to evolve, more and more options seem to be sprouting up. There are already a number of great tools in the space but as the community continues to evolve I believe that the current tools will continue to improve and new and useful tools will be built for Docker specific workloads. It is really cool to see how the Docker ecosystem is growing and how the tools and technologies are disrupting traditional views on a number of areas in tech including virtualization, DevOps, development, deployments and application development, among others.

I anticipate the adoption of Docker to continue growing for the foreseeable future as the core Docker project continues to improve and stabilize as well as the tools tools built around it that I have discussed here. It will be interesting to see where things are even six months from now in regards to the adoption and use cases that Docker has created.

Autosnap AWS snapshot and volume management tool

October 3, 2014August 31, 2015 Josh Reichardt 19 Comments

This is my first serious attempt at a Python tool on github. I figured it was about time, as I’ve been leveraging Open Source tools for a long time, I might as well try to give a little bit back. Please check out the project and leave feedback by emailing, opening a github or issue or commenting here, I’d love to see what can be done with this tool, there are lots of bugs to shake out and things to improve. Even better if you have some code you’d like to contribute, this is very much a work in progress!

Here is the project – https://github.com/jmreicha/autosnap.

Introduction

Essentially, this tool is designed to ease the management of the snapshot and volume lifecycle in an AWS environment. I have discovered that snapshots and volumes can be used together to form a simple backup management system, so by simplifying the management of these resources, by utilizing the power of the AWS API, you can easily manage backups of your AWS data.

While this obviously isn’t a full blown backup tool, it can do a few handy things like leverage tags to create and destroy backups based on custom expiration dates and create snapshots based on a few other criteria, all managed with tags. Another cool thing about handling backups this way is that you get amazing resiliency by storing snapshots to S3, as well as dirt cheap storage. Obviously if you have a huge number of servers and volumes your mileage will vary, but this solution should scale up in to the hundreds, if not thousands pretty easily. The last big bonus is that you can nice granularity for backups.

For example, if you wanted to keep a weeks worth of backups across all your servers in a region, you would simply use this tool to set an expiration tag of 7 days and voila. You will have rolling backups, based on snapshots for the previous seven days. You can get the backup schedule fairly granular, because the snapshots are tagged down to the hour. It would be easy to get them down to the second if that is something people would find useful, I could see DB snapshots being important enough but for now it is set to the hour.

The one drawback is that this needs to be run on a daily basis so you would need to add it to a cron job or some other tool that runs tasks periodically. Not a drawback really as much of a side note to be aware of.

Configuration

There is a tiny bit of overhead to get started, so I will show you how to get going. You will need to either set up a config file or let autosnap build you one. By default, autosnap will help create one the first time you run it, so you can use this command to build it:

autosnap

If you would like to provide your own config, create a file called ‘.config‘ in the base directory of this project. Check the README on the github page for the config variables and for any clarifications you may need.

Usage

Use the –help flag to get a feeling for some of the functions of this tool.

$ autosnap --help

usage: autosnap [--config] [--list-vols] [--manage-vols] [--unmanage-vols]
 [--list-snaps] [--create-snaps] [--remove-snaps] [--dry-run]
 [--verbose] [--version] [--help]

optional arguments:
 --config          create or modify configuration file
 --list-vols       list managed volumes
 --manage-vols     manage all volumes
 --unmanage-vols   unmanage all volumes
 --list-snaps      list managed snapshots
 --create-snaps    create a snapshot if it is managed
 --remove-snaps    remove a snapshot if it is managed
 --version         show program's version number and exit
 --help            display this help and exit

The first thing you will need to do is let autosnap manage the volumes in a region:

autosnap --manage-vols

This command will simply add some tags to help with the management of the volumes. Next, you can take a look and see what volumes got picked up and are now being managed by autosnap

autosnap --list-vols

To take a snapshot of all the volumes that are being managed:

autosnap --create-snaps

And you can take a look at your snapshots:

autosnap --list-snaps

Just as easily you can remove snapshots older than the specified expiration date:

autosnap --remove-snaps

There are some other useful features and flags but the above commands are pretty much the meat and potatoes of how to use this tool.

Conclusion

I know this is not going to be super useful for everybody but it is definitely a nice tool to have if you work with AWS volumes and snapshots on a semi regular basis. As I said, this can easily be improved so I’d love to hear what kinds of things to add or change to make this a great tool. I hope to start working on some more interesting projects and tools in the near future, so stay tuned.