Recently our Graphite server needed to be overhauled, which I was not looking forward to. Luckily Docker makes the process of building identical and reproducible images for configuring a new server much easier and painless than other methods.
Introduction
If you don’t know what Graphite is you can check out the documentation for more info. Basically it is a tool to collect and aggregate metrics of pretty much any kind, in to a central location. It is a great complement to something like statsd for metric collection and aggregation, which I will go over later.
The setup I will be describing today leverages a handful of components to work. The first and most important part is Graphite. This includes all of the parts that make up Graphite, including the carbon aggregator and carbon cache for the collection and processing of metrics as well as the whisper db for storing metrics.
There are several other alternative backends but I don’t have any experience with them so won’t be posting any details. If you are interested, InfluxDB and OpenTSDB both look like interesting alternative backends to whisper for storing metrics.
The Problem
Graphite is known to be notoriously difficult to install and configure properly. If you haven’t tried to set up Graphite before, give it a try.
Another argument that I hear quite a bit is that the Graphite workload doesn’t really fit in with the Docker model. In a distributed or highly available architecture that might be the case but in the example I cove here, we are taking a different approach.
The design and implementation separates data on to an EBS volume which is a durable storage resource, so it doesn’t matter if the server were to have problems. With our approach and process we can reprovision the server and have everything up and running in less than 5 minutes.
The benefit of doing it this way is obvious. Another benefit of our approach is that we are levering the graphite-api package so that we can have access to all of the Graphite goodness without having to run all of the other bloats and then proxying it through ngingx/wsgi which helps with performance. I will go over this set up in a little bit. No Graphite server would be complete if it didn’t leverage Grafana, which turns out to be stupidly easy using the Docker approach.
If we were ever to try to expand this architecture I think a distributed model using EFS (currently in preview) along with some type of load balancer in front to distribute requests evenly may be a possibility. If you have experience running Graphite across many nodes I would love to hear what you are doing.
The Solution
There are a few components to our architecture. The first is a tool I have been writing about recently called Terraform. We use this with some custom scripting to build the server, configure it and attach our Graphite data volume to the server.
Here is what a sample terraform config might look like to provision the server with the tools we want. This server is provisioned to an AWS environment and leverages a number of variables. You can check the docs on how variables work or if there is too much confusion I can post an example.
provider "aws" { access_key = "${var.access_key}" secret_key = "${var.secret_key}" region = "${var.region}" } resource "aws_instance" "graphite" { ami = "${lookup(var.amis, var.region)}" availability_zone = "us-east-1e" instance_type = "c3.xlarge" subnet_id = "${var.public-1e}" security_groups = ["${var.graphite}"] key_name = "XXX" user_data = "${file("../cloud-config/graphite.yml")}" root_block_device = { volume_type = "gp2" volume_size = "20" } connection { user = "username" key_file = "${var.key_path}" } # mount EBS provisioner "local-exec" { command = "aws ec2 attach-volume --region=us-east-1 --volume-id=${var.graphite_data_vol} --instance-id=${aws_instance.graphite.id} --device=/dev/xvdf" } provisioner "remote-exec" { inline = [ "while [ ! -e /dev/xvdf ]; do sleep 1; done", "echo '/dev/xvdf /data ext4 defaults 0 0' | sudo tee -a /etc/fstab", "sudo mkdir /data && sudo mount -t ext4 /dev/xvdf /data" ] } }
And optionally if you have an Elastic IP to use you can tack that on to your config
resource "aws_eip" "graphite" { instance = "${aws_instance.graphite.id}" vpc = true }
The graphite server uses a mostly standard config and installs a few of the components that we need to run the server, docker, python, pip, docker-compose, etc. Here is what a sample cloud config for the Graphite server might look like.
#cloud-config # Make sure OS is up to date apt_update: true apt_upgrade: true disable_root: true # Connect to private repo write_files: - path: /home/<user>/.dockercfg owner: user:group permissions: 0755 content: | { "https://index.docker.io/v1/": { "auth": "XXX", "email": "email" } } # Capture all subprocess output for troubleshooting cloud-init issues output: {all: '| tee -a /var/log/cloud-init-output.log'} packages: - python-dev - python-pip # Install latest Docker version runcmd: - apt-get -y install linux-image-extra-$(uname -r) - curl -sSL https://get.docker.com/ubuntu/ | sudo sh - usermod -a -G docker <user> - sg docker - sudo pip install -U docker-compose # Reboot for changes to take power_state: mode: reboot delay: "+1" ssh_authorized_keys: - <put your ssh public key here>
Docker
This is where most of the magi happens. As noted above, we are using Docker and a few of its tools to get everything working. All the logic to get Graphite running is contained in the Dockerfile, which will require some customizing but is similar to the following.
# Building from Ubuntu base FROM ubuntu:14.04.2 # This suppresses a bunch of annoying warnings from debconf ENV DEBIAN_FRONTEND noninteractive # Install all system dependencies RUN \ apt-get -qq install -y software-properties-common && \ add-apt-repository -y ppa:chris-lea/node.js && \ apt-get -qq update -y && \ apt-get -qq install -y build-essential curl \ # Graphite dependencies python-dev libcairo2-dev libffi-dev python-pip \ # Supervisor supervisor \ # nginx + uWSGI nginx uwsgi-plugin-python \ # StatsD nodejs # Install StatsD RUN \ mkdir -p /opt && \ cd /opt && \ curl -sLo statsd.tar.gz https://github.com/etsy/statsd/archive/v0.7.2.tar.gz && \ tar -xzf statsd.tar.gz && \ mv statsd-0.7.2 statsd # Install Python packages for Graphite RUN pip install graphite-api[sentry] whisper carbon # Optional install graphite-api caching # http://graphite-api.readthedocs.org/en/latest/installation.html#extra-dependencies # RUN pip install -y graphite-api[cache] # Configuration # Graphite configs ADD carbon.conf /opt/graphite/conf/carbon.conf ADD storage-schemas.conf /opt/graphite/conf/storage-schemas.conf ADD storage-aggregation.conf /opt/graphite/conf/storage-aggregation.conf # Supervisord ADD supervisord.conf /etc/supervisor/conf.d/supervisord.conf # StatsD ADD statsd_config.js /etc/statsd/config.js # Graphite API ADD graphite-api.yaml /etc/graphite-api.yaml # uwsgi ADD uwsgi.conf /etc/uwsgi.conf # nginx ADD nginx.conf /etc/nginx/nginx.conf ADD basic_auth /etc/nginx/basic_auth # nginx EXPOSE 80 \ # graphite-api 8080 \ # Carbon line receiver 2003 \ # Carbon pickle receiver 2004 \ # Carbon cache query 7002 \ # StatsD UDP 8125 \ # StatsD Admin 8126 # Launch stack CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/supervisord.conf"]
The other component we need is Grafana, which we don’t actually build but pull from the Dockerhub registry and inject our custom volume to. This is all captured in our docker-compose.yml file listed below.
graphite: build: ./docker-graphite restart: always ports: - "8080:80" - "8125:8125/udp" - "8126:8126" - "2003:2003" - "2004:2004" volumes: - "/data/graphite:/opt/graphite/storage/whisper" grafana: image: grafana/grafana restart: always ports: - "80:3000" volumes: - "/data/grafana:/var/lib/grafana" links: - graphite environment: - GF_SECURITY_ADMIN_PASSWORD=password123
We have open sourced our configuration and placed it on github so you can take a look at it to get a better idea of the configs and how everything is working with some working examples. The github repo is a quick way to try out the stack without having to provision and build an environment to run this on. If you are just interested in kicking the tires I suggest starting with the github repo.
The build directive above corresponds to the repo on github.
The last components is actually running the Docker containers. As you can see we use docker-compose but we also need a way to start the containers automatically after a disruption like a reboot or something. That is actually pretty easy. On an Ubuntu (or system using upstart) you can create an init script to start up docker-compose or restart it automatically if it has problems. Here I have created a file called /etc/init/graphite.conf with the following configuraiton.
description "Graphite" start on filesystem and started docker stop on runlevel [!2345] respawn chdir /home/user exec docker-compose up
A systemd service would achieve a similar goal but the version of Ubuntu used here doesn’t leverage systemd.
After everything has been dropped in place and configured you can check your work by testing out Grafana by hitting the public IP address of your server. If you hit the Grafana splash page everything should be working!
Conclusion
There are many pieces to this puzzle and honestly we don’t have the requirement of having Graphite be 100% available and redundant so we can get away with a single server for our needs. A separate EBS volume and Terraform allow us to rebuild the server quickly and automatically if something were to happen to the server. Also, the way we have designed Graphite to run will be able to handle a substantial workload without falling over. But if you are doing anything cool with Graphite HA or resiliency I would like to hear how you are doing it, there is always room for improvement.
If you are just interested in trying out the Graphite stack I highly suggest going over to the github repo and running the container stack to play around with the components, especially if you are interested in learning about how statsd and graphite collect metrics. The Grafana interface give you a nice way to tap in to the metrics that get pumped in to Graphite.