Grafana dashboard

Composing a Graphite server with Docker

Recently our Graphite server needed to be overhauled, which I was not looking forward to.  Luckily Docker makes the process of building identical and reproducible images for configuring a new server much easier and painless than other methods.

Introduction

If you don’t know what Graphite is you can check out the documentation for more info.  Basically it is a tool to collect and aggregate metrics of pretty much any kind, in to a central location.  It is a great complement to something like statsd for metric collection and aggregation, which I will go over later.

The setup I will be describing today leverages a handful of components to work.  The first and most important part is Graphite.  This includes all of the parts that make up Graphite, including the carbon aggregator and carbon cache for the collection and processing of metrics as well as the whisper db for storing metrics.

There are several other alternative backends but I don’t have any experience with them so won’t be posting any details.  If you are interested, InfluxDB and OpenTSDB both look like interesting alternative backends to whisper for storing metrics.

The Problem

Graphite is known to be notoriously difficult to install and configure properly.  If you haven’t tried to set up Graphite before, give it a try.

Another argument that I hear quite a bit is that the Graphite workload doesn’t really fit in with the Docker model.  In a distributed or highly available architecture that might be the case but in the example I cove here, we are taking a different approach.

The design and implementation separates data on to an EBS volume which is a durable storage resource, so it doesn’t matter if the server were to have problems.  With our approach and process we can reprovision the server and have everything up and running in less than 5 minutes.

The benefit of doing it this way is obvious.  Another benefit of our approach is that we are levering the graphite-api package so that we can have access to all of the Graphite goodness without having to run all of the other bloats and then proxying it through ngingx/wsgi which helps with performance.  I will go over this set up in a little bit.  No Graphite server would be complete if it didn’t leverage Grafana, which turns out to be stupidly easy using the Docker approach.

If we were ever to try to expand this architecture I think a distributed model using EFS (currently in preview) along with some type of load balancer in front to distribute requests evenly may be a possibility.  If you have experience running Graphite across many nodes I would love to hear what you are doing.

The Solution

There are a few components to our architecture.  The first is a tool I have been writing about recently called Terraform.  We use this with some custom scripting to build the server, configure it and attach our Graphite data volume to the server.

Here is what a sample terraform config might look like to provision the server with the tools we want.  This server is provisioned to an AWS environment and leverages a number of variables.  You can check the docs on how variables work or if there is too much confusion I can post an example.

provider "aws" {
  access_key = "${var.access_key}"
  secret_key = "${var.secret_key}"
  region = "${var.region}"
}

resource "aws_instance" "graphite" {
  ami = "${lookup(var.amis, var.region)}"
  availability_zone = "us-east-1e"
  instance_type = "c3.xlarge"
  subnet_id = "${var.public-1e}"
  security_groups = ["${var.graphite}"]
  key_name = "XXX"
  user_data = "${file("../cloud-config/graphite.yml")}"

  root_block_device = {
    volume_type = "gp2"
    volume_size = "20"
  }

  connection {
    user = "username"
    key_file = "${var.key_path}"
  }

 # mount EBS
  provisioner "local-exec" {
     command = "aws ec2 attach-volume --region=us-east-1 --volume-id=${var.graphite_data_vol} --instance-id=${aws_instance.graphite.id} --device=/dev/xvdf"
  }

  provisioner "remote-exec" {
    inline = [
    "while [ ! -e /dev/xvdf ]; do sleep 1; done",
    "echo '/dev/xvdf /data ext4 defaults 0 0' | sudo tee -a /etc/fstab",
    "sudo mkdir /data && sudo mount -t ext4 /dev/xvdf /data"
  ]
 }

}

And optionally if you have an Elastic IP to use you can tack that on to your config

resource "aws_eip" "graphite" {
  instance = "${aws_instance.graphite.id}"
  vpc = true
}

The graphite server uses a mostly standard config and installs a few of the components that we need to run the server, docker, python, pip, docker-compose, etc.  Here is what a sample cloud config for the Graphite server might look like.

#cloud-config

# Make sure OS is up to date
apt_update: true
apt_upgrade: true
disable_root: true

# Connect to private repo
write_files:
 - path: /home/<user>/.dockercfg
 owner: user:group
 permissions: 0755
 content: |
 {
   "https://index.docker.io/v1/": {
   "auth": "XXX",
   "email": "email"
 }
 }

# Capture all subprocess output for troubleshooting cloud-init issues
output: {all: '| tee -a /var/log/cloud-init-output.log'}

packages:
 - python-dev
 - python-pip

# Install latest Docker version
runcmd:
 - apt-get -y install linux-image-extra-$(uname -r)
 - curl -sSL https://get.docker.com/ubuntu/ | sudo sh
 - usermod -a -G docker <user>
 - sg docker
 - sudo pip install -U docker-compose

# Reboot for changes to take
power_state:
 mode: reboot
 delay: "+1"

ssh_authorized_keys:
 - <put your ssh public key here>

Docker

This is where most of the magi happens.  As noted above, we are using Docker and a few of its tools to get everything working.  All the logic to get Graphite running is contained in the Dockerfile, which will require some customizing but is similar to the following.

# Building from Ubuntu base
FROM ubuntu:14.04.2

# This suppresses a bunch of annoying warnings from debconf
ENV DEBIAN_FRONTEND noninteractive

# Install all system dependencies
RUN \
 apt-get -qq install -y software-properties-common && \
 add-apt-repository -y ppa:chris-lea/node.js && \
 apt-get -qq update -y && \
 apt-get -qq install -y build-essential curl \
 # Graphite dependencies
 python-dev libcairo2-dev libffi-dev python-pip \
 # Supervisor
 supervisor \
 # nginx + uWSGI
 nginx uwsgi-plugin-python \
 # StatsD
 nodejs

# Install StatsD
RUN \
 mkdir -p /opt && \
 cd /opt && \
 curl -sLo statsd.tar.gz https://github.com/etsy/statsd/archive/v0.7.2.tar.gz && \
 tar -xzf statsd.tar.gz && \
 mv statsd-0.7.2 statsd

# Install Python packages for Graphite
RUN pip install graphite-api[sentry] whisper carbon

# Optional install graphite-api caching
# http://graphite-api.readthedocs.org/en/latest/installation.html#extra-dependencies
# RUN pip install -y graphite-api[cache]

# Configuration
# Graphite configs
ADD carbon.conf /opt/graphite/conf/carbon.conf
ADD storage-schemas.conf /opt/graphite/conf/storage-schemas.conf
ADD storage-aggregation.conf /opt/graphite/conf/storage-aggregation.conf
# Supervisord
ADD supervisord.conf /etc/supervisor/conf.d/supervisord.conf
# StatsD
ADD statsd_config.js /etc/statsd/config.js
# Graphite API
ADD graphite-api.yaml /etc/graphite-api.yaml
# uwsgi
ADD uwsgi.conf /etc/uwsgi.conf
# nginx
ADD nginx.conf /etc/nginx/nginx.conf
ADD basic_auth /etc/nginx/basic_auth

# nginx
EXPOSE 80 \
# graphite-api
8080 \
# Carbon line receiver
2003 \
# Carbon pickle receiver
2004 \
# Carbon cache query
7002 \
# StatsD UDP
8125 \
# StatsD Admin
8126

# Launch stack
CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/supervisord.conf"]

The other component we need is Grafana, which we don’t actually build but pull from the Dockerhub registry and inject our custom volume to.  This is all captured in our docker-compose.yml file listed below.

graphite:
  build: ./docker-graphite
  restart: always
  ports:
    - "8080:80"
    - "8125:8125/udp"
    - "8126:8126"
    - "2003:2003"
    - "2004:2004"
  volumes:
    - "/data/graphite:/opt/graphite/storage/whisper"

grafana:
  image: grafana/grafana
  restart: always
  ports:
    - "80:3000"
  volumes:
    - "/data/grafana:/var/lib/grafana"
  links:
    - graphite
  environment:
    - GF_SECURITY_ADMIN_PASSWORD=password123

We have open sourced our configuration and placed it on github so you can take a look at it to get a better idea of the configs and how everything is working with some working examples.  The github repo is a quick way to try out the stack without having to provision and build an environment to run this on.  If you are just interested in kicking the tires I suggest starting with the github repo.

The build directive above corresponds to the repo on github.

The last components is actually running the Docker containers.  As you can see we use docker-compose but we also need a way to start the containers automatically after a disruption like a reboot or something.  That is actually pretty easy.  On an Ubuntu (or system using upstart) you can create an init script to start up docker-compose or restart it automatically if it has problems.  Here I have created a file called /etc/init/graphite.conf with the following configuraiton.

description "Graphite"
start on filesystem and started docker
stop on runlevel [!2345]
respawn
chdir /home/user
exec docker-compose up

A systemd service would achieve a similar goal but the version of Ubuntu used here doesn’t leverage systemd.

After everything has been dropped in place and configured you can check your work by testing out Grafana by hitting the public IP address of your server.  If you hit the Grafana splash page everything should be working!

Grafana dashboard

 

 

 

 

 

 

 

 

 

 

Conclusion

There are many pieces to this puzzle and honestly we don’t have the requirement of having Graphite be 100% available and redundant so we can get away with a single server for our needs.  A separate EBS volume and Terraform allow us to rebuild the server quickly and automatically if something were to happen to the server.  Also, the way we have designed Graphite to run will be able to handle a substantial workload without falling over.  But if you are doing anything cool with Graphite HA or resiliency I would like to hear how you are doing it, there is always room for improvement.

If you are just interested in trying out the Graphite stack I highly suggest going over to the github repo and running the container stack to play around with the components, especially if you are interested in learning about how statsd and graphite collect metrics.  The Grafana interface give you a nice way to tap in to the metrics that get pumped in to Graphite.

Josh Reichardt

Josh is the creator of this blog, a system administrator and a contributor to other technology communities such as /r/sysadmin and Ops School. You can also find him on Twitter and Facebook.