Category Archives: Sysadmin

DevOps Conferences

hello

I did a quite awhile ago that highlighted some of the cooler system admin and operations oriented conferences that I had on my radar at that time.  Since then I have changed jobs and am now currently in a DevOps oriented position, so I’d like to revisit the subject and update that list to reflect some of the cool conferences that are in the DevOps space.

I’d like to start off by saying first that even if you can’t make it to the bigger conferences, local groups and meet ups are also an excellent way to get out and meet other professionals that do what you do. Local groups are also an excellent way to stay in the loop on what’s current and also learn about what others are doing.  If you are interested in eventually becoming a presenter or speaker, local meet ups and groups can be a great way to get started.  There are numerous opportunities and communities (especially in bigger cities), check here for information or to see if there is a DevOps meet up near you.  If there is nothing near by, start one!  If you can’t find any DevOps groups look for Linux groups or developer groups and network from there, DevOps is beginning to become popular in broader circles.

After you get your feet wet with meet ups, the next place to start looking is conferences that sound like they might be interesting to you.  There are about a million different opportunities to choose from, from security conferences, developer conferences, server and network conferences, all the way down the line.  I am sticking with strictly DevOps related conferences because that is currently what I am interested and know the best.

Feel free to comment if I miss any conferences that are DevOps related and you think should be added to the list.

DevOps Days (Multiple dates)

Perhaps the most DevOps centric of all the conference list.  These conferences are a great way to meet with fellow DevOps professionals and network with them.  The space and industry is changing constantly and being on top of all of the changes is crucial to being successful.  Another nice thing about the DevOps days is that they are spread out around the country (and world) and spread out throughout the year so they are very accessible.  WARNING:  DevOps days are not tied to any one set of DevOps tools but rather the principles and techniques and how to apply them to different environments.  If you are looking for super in depth technical talks, this one may not be for you.

ChefConf (March)

The main Chef conference.  There are large conferences for the main configuration management tools but I chose to highlight Chef because that’s what we use at my job.  There are lots of good talks that have a Chef centered theme but also are great because the practices can be applied with other tools.  For example, there are many DevOps themes at ChefConf including continuous integration and deployment topics, how to scale environments, tying different tools together and just general configuration management techniques.  Highly recommend for Chef users, feel free to substitute the other big configuration management tool conferences here if Chef isn’t your cup of tea (Salt, Puppet, Ansible).

CoreOS Fest (May)

  • 2015 videos haven’t been posted yet

Admittedly, this is a much smaller and niche conference but is still awesome.  The conference is the first one put on by the folks at CoreOS and was designed to help the community keep up with what is going on in the CoreOS and container world.  The venue is pretty small but the content at this years conference was very good.  There were some epic announcements and talks at this years conference, including Tectonic announcements and Kubernetes deep dives, so if container technology is something you’re interested in then this conference would definitely be worth checking out.

Velocity (May)

This one just popped up on my DevOps conference radar.  I have been hearing good things about this conference for awhile now but have not had the opportunity to go to it.  It always has interesting speakers and topics and a number of the DevOps thought leaders show up for this event.  One cool thing about this conference is that there are a variety of different topics at any one time so it offers a nice, wide spectrum of information.  For example, there are technical tracks covering different areas of DevOps.

DockerCon (June)

Docker has been growing at a crazy pace so this seems like the big conference to go check out if you are in the container space.  This conference is similar to CoreOS fest but focuses more heavily on topics of Docker (obviously).  I haven’t had a chance to go to one of these yet but containers and Docker have so much momentum it is very difficult to avoid.  As well, many people believe that container technologies are going to be the path to the future so it is a good idea to be as close to the action as you can.

Monitorama (June)

This is one of the coolest conferences I think, but that is probably just because I am so obsessed with monitoring and metrics collection.  Monitoring seems to be one of those topics that isn’t always fun to deal with or work around but talks and technologies at this conference actually make me excited about monitoring.  To most, monitoring is a necessary evil and a lot of the content from this conference can help make your life easier and better in all aspects of monitoring, from new trends and tools to topics on how to correctly monitor and scale infrastructures.  Talks can be technical but well worth it, if monitoring is something that interests you.

AWS Re:Invent (November)

This one is a monster.  This is the big conference that AWS puts on every year to announce new products and technologies that they have been working on as well as provide some incredibly helpful technical talks.  I believe this conference is one of the pricier and more exclusive conferences but offers a lot in the way of content and details.  This conference offers some of the best, most technical topics of discussion that I have seen and has been invaluable as a learning resource.  All of the videos from the conference are posted on YouTube so you can get access to this information for free.  Obviously the content is related to AWS but I have found this to be a great way to learn.

Conclusion

Even if you don’t have a lot of time to travel or get out to these conferences, nearly all of them post video from the event so you can watch it whenever you want to.  This is an INCREDIBLE learning tool and resource that is FREE.  The only downside to the videos is that you can’t ask any questions, but it is easy to find the presenters contact info if you are interested and feel like reaching out.

That being said, you tend to get a lot more out of attending the conference.  The main benefit of going to conferences over watching the videos alone is that you get to meet and talk to others in the space and get a feel for what everybody else is doing as well as check out many cool tools that you might otherwise never hear about.  At every conference I attend, I always learn about some new tech that others are using that I have never heard of that is incredibly useful and I always run in to interesting people that I would otherwise not have the opportunity to meet.

So definitely if you can, get out to these conferences, meet and talk to people, and get as much out of them as you can.  If you can’t make it, check out the videos afterwards for some really great nuggets of information, they are a great way to keep your skills sharp and current.

If you have any more conferences to add to this list I would be happy to update it!  I am always looking for new conferences and DevOps related events.

Composing a Graphite server with Docker

Grafana dashboard

Recently our Graphite server needed to be overhauled, which I was not looking forward to.  Luckily Docker makes the process of building identical and reproducible images for configuring a new server much easier and painless than other methods.

Introduction

If you don’t know what Graphite is you can check out the documentation for more info.  Basically it is a tool to collect and aggregate metrics of pretty much any kind, in to a central location.  It is a great complement to something like statsd for metric collection and aggregation, which I will go over later.

The setup I will be describing today leverages a handful of components to work.  The first and most important part is Graphite.  This includes all of the parts that make up Graphite, including the carbon aggregator and carbon cache for the collection and processing of metrics as well as the whisper db for storing metrics.

There are several other alternative backends but I don’t have any experience with them so won’t be posting any details.  If you are interested, InfluxDB and OpenTSDB both look like interesting alternative backends to whisper for storing metrics.

The Problem

Graphite is known to be notoriously difficult to install and configure properly.  If you haven’t tried to set up Graphite before, give it a try.

Another argument that I hear quite a bit is that the Graphite workload doesn’t really fit in with the Docker model.  In a distributed or highly available architecture that might be the case but in the example I cove here, we are taking a different approach.

The design and implementation separates data on to an EBS volume which is a durable storage resource, so it doesn’t matter if the server were to have problems.  With our approach and process we can reprovision the server and have everything up and running in less than 5 minutes.

The benefit of doing it this way is obvious.  Another benefit of our approach is that we are levering the graphite-api package so that we can have access to all of the Graphite goodness without having to run all of the other bloats and then proxying it through ngingx/wsgi which helps with performance.  I will go over this set up in a little bit.  No Graphite server would be complete if it didn’t leverage Grafana, which turns out to be stupidly easy using the Docker approach.

If we were ever to try to expand this architecture I think a distributed model using EFS (currently in preview) along with some type of load balancer in front to distribute requests evenly may be a possibility.  If you have experience running Graphite across many nodes I would love to hear what you are doing.

The Solution

There are a few components to our architecture.  The first is a tool I have been writing about recently called Terraform.  We use this with some custom scripting to build the server, configure it and attach our Graphite data volume to the server.

Here is what a sample terraform config might look like to provision the server with the tools we want.  This server is provisioned to an AWS environment and leverages a number of variables.  You can check the docs on how variables work or if there is too much confusion I can post an example.

provider "aws" {
  access_key = "${var.access_key}"
  secret_key = "${var.secret_key}"
  region = "${var.region}"
}

resource "aws_instance" "graphite" {
  ami = "${lookup(var.amis, var.region)}"
  availability_zone = "us-east-1e"
  instance_type = "c3.xlarge"
  subnet_id = "${var.public-1e}"
  security_groups = ["${var.graphite}"]
  key_name = "XXX"
  user_data = "${file("../cloud-config/graphite.yml")}"

  root_block_device = {
    volume_type = "gp2"
    volume_size = "20"
  }

  connection {
    user = "username"
    key_file = "${var.key_path}"
  }

 # mount EBS
  provisioner "local-exec" {
     command = "aws ec2 attach-volume --region=us-east-1 --volume-id=${var.graphite_data_vol} --instance-id=${aws_instance.graphite.id} --device=/dev/xvdf"
  }

  provisioner "remote-exec" {
    inline = [
    "while [ ! -e /dev/xvdf ]; do sleep 1; done",
    "echo '/dev/xvdf /data ext4 defaults 0 0' | sudo tee -a /etc/fstab",
    "sudo mkdir /data && sudo mount -t ext4 /dev/xvdf /data"
  ]
 }

}

And optionally if you have an Elastic IP to use you can tack that on to your config

resource "aws_eip" "graphite" {
  instance = "${aws_instance.graphite.id}"
  vpc = true
}

The graphite server uses a mostly standard config and installs a few of the components that we need to run the server, docker, python, pip, docker-compose, etc.  Here is what a sample cloud config for the Graphite server might look like.

#cloud-config

# Make sure OS is up to date
apt_update: true
apt_upgrade: true
disable_root: true

# Connect to private repo
write_files:
 - path: /home/<user>/.dockercfg
 owner: user:group
 permissions: 0755
 content: |
 {
   "https://index.docker.io/v1/": {
   "auth": "XXX",
   "email": "email"
 }
 }

# Capture all subprocess output for troubleshooting cloud-init issues
output: {all: '| tee -a /var/log/cloud-init-output.log'}

packages:
 - python-dev
 - python-pip

# Install latest Docker version
runcmd:
 - apt-get -y install linux-image-extra-$(uname -r)
 - curl -sSL https://get.docker.com/ubuntu/ | sudo sh
 - usermod -a -G docker <user>
 - sg docker
 - sudo pip install -U docker-compose

# Reboot for changes to take
power_state:
 mode: reboot
 delay: "+1"

ssh_authorized_keys:
 - <put your ssh public key here>

Docker

This is where most of the magi happens.  As noted above, we are using Docker and a few of its tools to get everything working.  All the logic to get Graphite running is contained in the Dockerfile, which will require some customizing but is similar to the following.

# Building from Ubuntu base
FROM ubuntu:14.04.2

# This suppresses a bunch of annoying warnings from debconf
ENV DEBIAN_FRONTEND noninteractive

# Install all system dependencies
RUN \
 apt-get -qq install -y software-properties-common && \
 add-apt-repository -y ppa:chris-lea/node.js && \
 apt-get -qq update -y && \
 apt-get -qq install -y build-essential curl \
 # Graphite dependencies
 python-dev libcairo2-dev libffi-dev python-pip \
 # Supervisor
 supervisor \
 # nginx + uWSGI
 nginx uwsgi-plugin-python \
 # StatsD
 nodejs

# Install StatsD
RUN \
 mkdir -p /opt && \
 cd /opt && \
 curl -sLo statsd.tar.gz https://github.com/etsy/statsd/archive/v0.7.2.tar.gz && \
 tar -xzf statsd.tar.gz && \
 mv statsd-0.7.2 statsd

# Install Python packages for Graphite
RUN pip install graphite-api[sentry] whisper carbon

# Optional install graphite-api caching
# http://graphite-api.readthedocs.org/en/latest/installation.html#extra-dependencies
# RUN pip install -y graphite-api[cache]

# Configuration
# Graphite configs
ADD carbon.conf /opt/graphite/conf/carbon.conf
ADD storage-schemas.conf /opt/graphite/conf/storage-schemas.conf
ADD storage-aggregation.conf /opt/graphite/conf/storage-aggregation.conf
# Supervisord
ADD supervisord.conf /etc/supervisor/conf.d/supervisord.conf
# StatsD
ADD statsd_config.js /etc/statsd/config.js
# Graphite API
ADD graphite-api.yaml /etc/graphite-api.yaml
# uwsgi
ADD uwsgi.conf /etc/uwsgi.conf
# nginx
ADD nginx.conf /etc/nginx/nginx.conf
ADD basic_auth /etc/nginx/basic_auth

# nginx
EXPOSE 80 \
# graphite-api
8080 \
# Carbon line receiver
2003 \
# Carbon pickle receiver
2004 \
# Carbon cache query
7002 \
# StatsD UDP
8125 \
# StatsD Admin
8126

# Launch stack
CMD ["/usr/bin/supervisord", "-c", "/etc/supervisor/supervisord.conf"]

The other component we need is Grafana, which we don’t actually build but pull from the Dockerhub registry and inject our custom volume to.  This is all captured in our docker-compose.yml file listed below.

graphite:
  build: ./docker-graphite
  restart: always
  ports:
    - "8080:80"
    - "8125:8125/udp"
    - "8126:8126"
    - "2003:2003"
    - "2004:2004"
  volumes:
    - "/data/graphite:/opt/graphite/storage/whisper"

grafana:
  image: grafana/grafana
  restart: always
  ports:
    - "80:3000"
  volumes:
    - "/data/grafana:/var/lib/grafana"
  links:
    - graphite
  environment:
    - GF_SECURITY_ADMIN_PASSWORD=password123

We have open sourced our configuration and placed it on github so you can take a look at it to get a better idea of the configs and how everything is working with some working examples.  The github repo is a quick way to try out the stack without having to provision and build an environment to run this on.  If you are just interested in kicking the tires I suggest starting with the github repo.

The build directive above corresponds to the repo on github.

The last components is actually running the Docker containers.  As you can see we use docker-compose but we also need a way to start the containers automatically after a disruption like a reboot or something.  That is actually pretty easy.  On an Ubuntu (or system using upstart) you can create an init script to start up docker-compose or restart it automatically if it has problems.  Here I have created a file called /etc/init/graphite.conf with the following configuraiton.

description "Graphite"
start on filesystem and started docker
stop on runlevel [!2345]
respawn
chdir /home/user
exec docker-compose up

A systemd service would achieve a similar goal but the version of Ubuntu used here doesn’t leverage systemd.

After everything has been dropped in place and configured you can check your work by testing out Grafana by hitting the public IP address of your server.  If you hit the Grafana splash page everything should be working!

Grafana dashboard

 

 

 

 

 

 

 

 

 

 

Conclusion

There are many pieces to this puzzle and honestly we don’t have the requirement of having Graphite be 100% available and redundant so we can get away with a single server for our needs.  A separate EBS volume and Terraform allow us to rebuild the server quickly and automatically if something were to happen to the server.  Also, the way we have designed Graphite to run will be able to handle a substantial workload without falling over.  But if you are doing anything cool with Graphite HA or resiliency I would like to hear how you are doing it, there is always room for improvement.

If you are just interested in trying out the Graphite stack I highly suggest going over to the github repo and running the container stack to play around with the components, especially if you are interested in learning about how statsd and graphite collect metrics.  The Grafana interface give you a nice way to tap in to the metrics that get pumped in to Graphite.

Create a Kubernetes cluster on AWS and CoreOS with Terraform

Up until my recent discovery of Terraform, the process I had been using to test CoreOS and Kubernetes was somewhat cumbersome and manual.  There are still some manual steps and processes involved in the bootstrap and cluster creation process that need to get sorted out, but now I can bring environments up and down, quickly and automatically.  This is a HUGE time saver and also makes testing easier because these changes can happen in a matter of minutes rather than hours and can all be self documented for others to reference in a Github repo.  Great success.

NOTE:  This method seems to be broken as of the 0.14.2 release of Kubernetes.  The latest version I could get to work reliably was v0.13.1.  I am following the development and looking forward to the v1.0 release but won’t revisit this method until something stable has been shipped because there are still just too many changes going on.  With that said, v0.13.1 has a lot of useful functionality and this method is actually really easy to get working once you have the groundwork laid out.

Another benefit is that as the project develops and matures, the only thing that will need modified are the cloud configs I am using here.  So if you follow along you can use my configs as a template, feel free to use this as a base and modify the configs to get this working with a newer release.  As I said I will be revisiting the configs once things slow down a little and a v1 has been released.

Terraform

So the first component that we need to enable in this workflow is Terraform.  From their site, “Terraform is a tool for building, changing, and combining infrastructure safely and efficiently.”  Basically, Terraform is a command line tool for allowing you to implement your infrastructure as code across a variety of different infrastructure providers.  It should go without saying, being able to test environments across different platforms and cloud providers is a gigantic benefit.  It doesn’t lock you in to any one vendor and greatly helps simplify the process of creating complex infrastructures across different platforms.

Terraform is still a young project but has been maturing nicely and currently supports most of the functionality needed for this method to work (the missing stuff is in the dev pipeline and will be released in the near future).  Another benefit is that Terraform is much easier to use and understand than CloudFormation, which is  a propriety cloud provisioning tool available to AWS customers, which could be used if you are in a strictly AWS environment.

The first step is to download and install Terraform.  In this example I am using OSX but the instructions will be similar on Linux or other platforms.

cd /tmp
wget https://dl.bintray.com/mitchellh/terraform/terraform_0.3.7_darwin_amd64.zip
unzip terraform_0.3.7_darwin_amd64.zip
mv terraform* /usr/local/bin

After you have moved the binary you will need to source your shell.  I use zsh so I just ran “source ~/.zshrc” to update the path for terraform.

To test terraform out you can check the version to make sure it works.

terraform version

Now that Terraform is installed you will need to get some terraform files set up.  I suggest making a local terraform directory on your machine so you can create a repo out of it later if desired.  I like to split “services” up by creating different directories.  So within the terraform directory I have created a coreos directory as well as a kubernetes directory, each with their own variables file (which should be very similar).  I don’t know if this approach is a best practice but has been working well for me so far.  I will gladly update this workflow if there is a better way to do this.

Here is a sample of what the file and directory layout might look like.

cloud-config
  etcd-1.yml
  etcd-2.yml
  etcd-3.yml
  kube-master.yml
  kube-node.yml
etcd
  dns.tf
  etcd.tf
  variables.tf
kubernetes
  dns.tf
  kubernetes.tf
  variables.tf

As you can see there is a directory for Etcd as well as Kubernetes specific configurations.  You may also notice that there is a cloud-config directory.  This will be used as a central place to put configurations for the different services.

Etcd

With Terraform set up, the next component needed for this architecture to work is a functioning etcd cluster. I chose to use a separate 3 node cluster (spread across 3 AZ’s) for improved performance and resliency.  If one of the nodes goes down or away with a 3 node cluster it will still be operational, where if a 1 node cluster goes away you will be in much more trouble.  Additionally if you have other services or servers that need to leverage etcd you can just point them to this etcd cluster.

Luckily, with Terraform it is dead simple to spin up and down new clusters once you have your initial configurations set up and configured correctly.

At the time of this writing I am using the current stable version of CoreOS, which is 633.1.0, which uses version 0.4.8 of etcd.  According to the folks at CoreOS, the cloud configs for old versions of etcd should work once the new version has been released so moving to a the new 2.0 release should be easy once it hits the release channel but some tweaks or additional changes to the cloud configs may need to occur.

Configuration

Before we get in to the details of how all of this works, I would like to point out that many of the settings in these configuration files will be specific to users environments.  For example I am using an AWS VPC in the “us-east-1″ region for this set up, so you may need to adjust some of the settings in these files to match your own scenario.  Other custom components may include security groups, subnet id’s, ssh keys, availability zones, etc.

Terraform offers resources for basically all network components on AWS so you could easily extend these configurations to build out your initial network and environment if you were starting a project like this from scratch.  You can check all the Terraform resources for the AWS provider here.

Warning: This guide assumes a few subtle things to work correctly.  The address scheme we are using for this environment is a 192.168.x.x, leveraging 3 subnets to spread the nodes out across for additional availability (b, c, e) in the US-East-1 AWS region.  Anything in the configuration that has been filled in with “XXX” represents a custom value that you will need to either create or obtain in your own environment and modify in the configuration files.

Finally, you will need to provide AWS credentials to allow Terraform to communicate with the API for creating and modifying resources.  You can see where these credentials should be filled in below in the variables.tf file.

variables.tf

variable "access_key" { 
 description = "AWS access key"
 default = "XXX"
}

variable "secret_key" { 
 description = "AWS secret access key"
 default = "XXX"
}

variable "region" {
 default = "us-east-1"
}

/* CoreOS AMI - 633.1.0 */

variable "amis" {
 description = "Base CoreOS AMI"
 default = {
 us-east-1 = "ami-d6033bbe" 
 }
}

Here is what an example CoreOS configs look like.

etcd.tf

provider "aws" {
 access_key = "${var.access_key}"
 secret_key = "${var.secret_key}"
 region = "${var.region}"
}

/* Etcd cluster */

resource "aws_instance" "etcd-01" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1e" 
 instance_type = "t2.micro"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = XXX"
 private_ip = "192.168.1.10"
 user_data = "${file("../cloud-config/etcd-1.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "20"
 } 
}

resource "aws_instance" "etcd-02" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1b" 
 instance_type = "t2.micro"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = "XXX"
 private_ip = "192.168.2.10"
 user_data = "${file("../cloud-config/etcd-2.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "20"
 } 
}

resource "aws_instance" "etcd-03" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1c" 
 instance_type = "t2.micro"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = "XXX"
 private_ip = "192.168.3.10"
 user_data = "${file("../cloud-config/etcd-3.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "20"
 } 
}

Below I have created a configuration file as a simaple way to create DNS records dynamically when spinning up the etcd cluster nodes.

dns.tf

 resource "aws_route53_record" "etcd-01" {
 zone_id = "XXX"
 name = "etcd-01.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.etcd-01.private_ip}"]
}

resource "aws_route53_record" "etcd-02" {
 zone_id = "XXX"
 name = "etcd-02.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.etcd-02.private_ip}"]
}

resource "aws_route53_record" "etcd-03" {
 zone_id = "XXX"
 name = "etcd-03.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.etcd-03.private_ip}"]
}

Once all of the configurations have been put in place and all look right you can test out what your configuration will look like with the “plan” command:

cd etcd
terraform plan

Make sure to change in to your etcd directory first.  This will examine your current configuration and calculate any changes.  If your environment is completely unconfigured then this command will return some output that explains what terraform is planning to do.

If you don’t want the input prompts when you run your plan command you can append the “-input=false” flag to bypass the configurations.

If everything looks okay with the plan you can tell Terraform to “apply” your conifgs with the following:

terraform apply
OR
terraform apply -input=false

If everything goes accordingly, after a few minutes you should have a new 3 node etcd cluster running on the lastest stable version of CoreOS with DNS records for interacting with the nodes!  To double check that the servers are being created you can check the AWS console to see if your newly defined servers have been created.  The console is a great way to double check that things all work okay and that the right values were created.

If you are having trouble with the cloud configs check the end of the post for the link to all of the etcd and Kubernetes cloud configs.

Kubernetes

The Kubernetes configuration is very similar to etcd.  It uses a variables.tf, kubernetes.tf and dns.tf file to configure the Kubernetes cluster.

The following configurations will build a v0.13.1 Kubernetes cluster with 1 master, and 3 worker nodes to begin with.  This config can be extended easily to scale the number of worker nodes to basically as many as you want (I could easily image the hundreds or thousands), simply by changing a few number in the configuration, barely adding any overhead to our current process and workflow, which is nice.  Because of these possibilities, Terraform allows for a large amount of flexibility in how you manage your infrastructure.

This configuration is using c3.large instances so be aware that your AWS bill may be affected if you spin nodes up and fail to turn them off when you are done testing.

provider "aws" {
 access_key = "${var.access_key}"
 secret_key = "${var.secret_key}"
 region = "${var.region}"
}

/* Kubernetes cluster */

resource "aws_instance" "kube-master" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1e" 
 instance_type = "c3.large"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = "XXX"
 private_ip = "192.168.1.100"
 user_data = "${file("../cloud-config/kube-master.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "20"
 } 
}

resource "aws_instance" "kube-e" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1e" 
 instance_type = "c3.large"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = "XXX"
 count = "1"
 user_data = "${file("../cloud-config/kube-node.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "100"
 } 
}

resource "aws_instance" "kube-b" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1b" 
 instance_type = "c3.large"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = "XXX"
 count = "1"
 user_data = "${file("../cloud-config/kube-node.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "100"
 } 
}

resource "aws_instance" "kube-c" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1c" 
 instance_type = "c3.large"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = "XXX"
 count = "1"
 user_data = "${file("../cloud-config/kube-node.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "100"
 } 
}

And our DNS configuration.

resource "aws_route53_record" "kube-master" {
 zone_id = "XXX"
 name = "kube-master.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.kube-master.private_ip}"]
}

resource "aws_route53_record" "kube-e" {
 zone_id = "XXX"
 name = "kube-e-test.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.kube-e.0.private_ip}"]
}

resource "aws_route53_record" "kube-b" {
 zone_id = "XXX"
 name = "kube-b.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.kube-b.0.private_ip}"]
}

resource "aws_route53_record" "kube-c" {
 zone_id = "XXX"
 name = "kube-c.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.kube-c.0.private_ip}"]
}

The variables file for Kubernetes should be identical to the etcd configuration so I have chosen not to place it here.  Just refer to the previous etcd/variables.tf file.

Resources

Since each cloud-config is slightly different (and would take up a lot more space) I have included those files in the below gist.  You will need to populate the “ssh_authorized_keys:” section with your own SSH public key and update any of the IP addresses to reflect your environment.  I apologize if there are any typo’s, there was a lot of cut and paste.

Cloud configs – https://gist.github.com/jmreicha/7923c295ab6110151127

Much of the configurations that I am using are based on the Kubernetes docs, as well as some of the specific cloud configs that I have adapted, which can be found here.

Another great place to get help with Kubernetes is the IRC channel which can be found on irc.freenode.net in the #google-containers channel.  The folks that hang out there are super friendly and can almost always answer any questions you have.

As I said, development is still pretty crazy.  You can check the releases page to check out all the latest stuff.

Conclusion

Yes this can seem very convoluted at first but if everything works how it should, you now have a quick and easy way to spin up identical etcd and/or a Kubernetes environments up or down at will, which is pretty powerful.  Also, this method is dramatically easier than most of the methods I have come across so far in my own adventures and testing.

Once you get through the initial confusion and learning curve this workflow becomes a huge timesaver for testing out different versions of Kubernetes and also for experimenting with etcd.  I haven’t quite automated the entire process but I imagine that it would be easy to spin entire environments up and down by gluing all of these pieces together with some simple shell scripts.

If you need to make any configuration updates, for example to put a new version of Kubernetes in place, you will need first update your Kubernetes master/node cloud configs and then rerun terraform apply to have it recreate your environment.

The cloud config changes will destroy any nodes that rely on the old configuration.  Therefore, you will need to make sure that if you make any changes to your cloud config files you are prepared to deal with the consequences!  Ideally you should get your etcd cluster to a good spot and then leave it alone and just play around with the Kubernetes components since both of those components have been separated in order to change the components out independently.

With this workflow you can already start to see the power of terraform even with this one example.  Terraform is quickly becoming one of my favorite automation and cloud tools and is providing a very easy way to define and build infrastructure though code and configurations.

Performance tuning ELK stack

ELK stack

Building on my previous post describing how to quickly set up a centralized logging solution using the ElasticSearch + Logstash + Kibana (ELK) stack, we have a fully functioning, Docker based ELK stack agreggating and handling our logs.  The only problem is that performance is either waaay too slow or the stack seems to crash.  As I worked through this problem myself, I found a few settings that vastly improved the stability and performance of my ELK stack.

So in this post I will share some of the useful tweaks and changes that worked in my own environment to help squeeze additional performance out of the setup.

Hopefully these adjustments will help others!

Adjusting Logstash

Out of the box, Logstash does a pretty good job of setting things up with reasonable default values.  The main adjustment that I have found to be useful is setting the default number of Logstsash “workers” when the Logstash process starts.  A good rule of thumb would be one worker per CPU.  So if the server has 4 cpu’s, your Logstash start up command would look similar to the following.

/opt/logstash/bin/logstash --verbose -w 4 -f /etc/logstash/server.conf

The important bit to note is the “-w 4″ part.  A poorly configured server.conf file may also lead to performance issues but that will be very specific to the user.  For the most part, unless the configuration contains many conditionals and expensive codec calls or excessive filtering, performance here should be stable.

If you are concerned about utilization I recommend watching cpu and memory consumption by the Logstash process, signs that there could be a configuration issue would be cpu maxing out.

The main thing to be aware of with a custom number of workers is that some codecs may not work properly because they are not thread safe.  If you are using the “multiline” codec in any of your inputs then you will not be able to leverage multiple workers, or if you are using multiple workers you won’t be able to use the codec until the thread safe problems have been fixed.  The good news is that this is a known issue and is being worked on, hopefully will be fixed by the time 1.5.0 is released.  It tripped me up initially and so I thought I would mention the issue.

Increase Java heap memory

It turns out that ElasticSearch is a bit of a memory hog once you start actually sending data through Logstash to have ES consume.  In my own testing, I discovered that logs would mysteriously stop being recorded in to ES and consequently would fail to show up in my dashboards.

The first tweak to make is to increase the amount of memory available to Java to process ES indices.  I have discovered that there is a script that ES uses to load up Java when it starts, which is passing in a value of 1GB of RAM to start.

After some digging, I discovered that the default ES configuration I was using was quickly running out of memory and was crashing because the ES heap memory couldn’t keep up with the load (mostly indexes).

Here is a sample of the errors I was seeing when ES and Logstash stopped processing logs.

message [out of memory][OutOfMemoryError[Java heap space]]

This was a good starting point for investigating.  Basically, what this means is that the ES container had a Java heap memory setting set to 1GB which was exhausting the the memory allocated to ES, even though there was much more memory available on the server.

To increase this memory limit, we will override this script with our own custom values.

This script is called “elasticsearch.sh.in” and we will be overriding the default value ES_MAX_MEM with a value of “4g” as show below.  The general guideline that has been recommended is to use a value here of about 50% of the total amount of memory.  So if your server has 8GB of memory then setting it to 4GB here will be the 50% we are looking for.

There are many other options that can be overridden but the most import value is the max memory value that we have updated.

We can inject this custom value as an environment variable in our Dockerfile which makes managing custom configurations much easier if we need to make additions or adjustments later on.

ENV ES_HEAP_SIZE=8g

I am posting the script that sets the values below as a reference in case there are other values you need to override.  Again, we can use use environmental variables to set these up in our Dockefile if needed.

#!/bin/sh

ES_CLASSPATH=$ES_CLASSPATH:$ES_HOME/lib/elasticsearch-1.5.0.jar:$ES_HOME/lib/*:$ES_HOME/lib/sigar/*

if [ "x$ES_MIN_MEM" = "x" ]; then
 ES_MIN_MEM=256m
fi
if [ "x$ES_MAX_MEM" = "x" ]; then
 ES_MAX_MEM=1g
fi
if [ "x$ES_HEAP_SIZE" != "x" ]; then
 ES_MIN_MEM=$ES_HEAP_SIZE
 ES_MAX_MEM=$ES_HEAP_SIZE
fi

# min and max heap sizes should be set to the same value to avoid
# stop-the-world GC pauses during resize, and so that we can lock the
# heap in memory on startup to prevent any of it from being swapped
# out.
JAVA_OPTS="$JAVA_OPTS -Xms${ES_MIN_MEM}"
JAVA_OPTS="$JAVA_OPTS -Xmx${ES_MAX_MEM}"

# new generation
if [ "x$ES_HEAP_NEWSIZE" != "x" ]; then
 JAVA_OPTS="$JAVA_OPTS -Xmn${ES_HEAP_NEWSIZE}"
fi

# max direct memory
if [ "x$ES_DIRECT_SIZE" != "x" ]; then
 JAVA_OPTS="$JAVA_OPTS -XX:MaxDirectMemorySize=${ES_DIRECT_SIZE}"
fi

# set to headless, just in case
JAVA_OPTS="$JAVA_OPTS -Djava.awt.headless=true"

# Force the JVM to use IPv4 stack
if [ "x$ES_USE_IPV4" != "x" ]; then
 JAVA_OPTS="$JAVA_OPTS -Djava.net.preferIPv4Stack=true"
fi

JAVA_OPTS="$JAVA_OPTS -XX:+UseParNewGC"
JAVA_OPTS="$JAVA_OPTS -XX:+UseConcMarkSweepGC"

JAVA_OPTS="$JAVA_OPTS -XX:CMSInitiatingOccupancyFraction=75"
JAVA_OPTS="$JAVA_OPTS -XX:+UseCMSInitiatingOccupancyOnly"

# GC logging options
if [ "x$ES_USE_GC_LOGGING" != "x" ]; then
 JAVA_OPTS="$JAVA_OPTS -XX:+PrintGCDetails"
 JAVA_OPTS="$JAVA_OPTS -XX:+PrintGCTimeStamps"
 JAVA_OPTS="$JAVA_OPTS -XX:+PrintClassHistogram"
 JAVA_OPTS="$JAVA_OPTS -XX:+PrintTenuringDistribution"
 JAVA_OPTS="$JAVA_OPTS -XX:+PrintGCApplicationStoppedTime"
 JAVA_OPTS="$JAVA_OPTS -Xloggc:/var/log/elasticsearch/gc.log"
fi

# Causes the JVM to dump its heap on OutOfMemory.
JAVA_OPTS="$JAVA_OPTS -XX:+HeapDumpOnOutOfMemoryError"
# The path to the heap dump location, note directory must exists and have enough
# space for a full heap dump.
#JAVA_OPTS="$JAVA_OPTS -XX:HeapDumpPath=$ES_HOME/logs/heapdump.hprof"

# Disables explicit GC
JAVA_OPTS="$JAVA_OPTS -XX:+DisableExplicitGC"

# Ensure UTF-8 encoding by default (e.g. filenames)
JAVA_OPTS="$JAVA_OPTS -Dfile.encoding=UTF-8"

Then when Elasticsearch starts up it will look for this custom configuration script and start Java with the desired 4GB of memory.  This is one easy way to squeeze performance out of your server without making any other changes or modifying your server.

Modify Elasticsearch configuration

This one is also very easy to put in to place.  We are already using the elasticsearch.yml so the only thing that needs to be done is to create some additional configurations to this file, rebuild the container, and restart the ES container with the updated values.

A good setting to configure to help control ES memory usage is to set the indices field cache size.  Limiting this indices cache size makes sense because you rarely need to retrieve logs that are older than a few days.  By default ES will hold old indices in memory and will never let them go.  So unless you have unlimited memory than it makes sense to limit the memory in this scenario.

To limit the cache size simply add the following value anywhere in your custom elasticsearch.yml configuration file.  This setting and adjusting the Java heap memory size should be enough to get started but there are a few other things that might be worth checking.

indices.fielddata.cache.size:  40%

If you only make one change, add this line to your ES configuration!  This setting will let go of the oldest indices first so you won’t be dropping new indices, 9/10 times this is probably what you want when accessing data in Logstash.  More information about controlling memory usage can be found here.

Another idea worth looking at for an easy performance boost would be disabling swap if it has been enabled.  Again, in most cloud environment and images swap is turned off, but it is always a setting worth checking.

To bypass the OS swap setting you can simply configure a no swap value in ES by adding the following to your elasticsearch.yml configurtion file.

bootstrap.mlockall: true

To check that this has value has been configured properly you can run this command.

curl http://localhost:9200/_nodes/process?pretty

This may cause memory warnings when ES starts up (eg, nuable to lock JVM memory (ENOMEM). This can result in part of the JVM being swapped out. Increase RLIMIT_MEMLOCK (ulimit).) but you should be able to ignore these warnings.  If you are concerned, turn these limits off at the OS level which is demonstrated below.

Misc

Other low hanging fruit includes disabling open file limits on the OS.  ES can run in to problems if there is a cap on the amount of files that its processes can open or have open at a time.  I have run in to open file limit issues before and they are never fun to deal with.  This shouldn’t be an issue if you are running ES in a Docker container with the Ubuntu 14.04 base image.

If you aren’t sure about the open file limits for ES you can run the following command to get a better idea of the current limits.

ulimit -n

Make sure both the soft and hard limits are either set to unlimited or to an extremely high number.

This should take care of most if not all of the stability issues.  After putting these changes in place in my own environment I went from multiple crashes per day to so far none in over a week.  If you are still experiencing issues you might want to take a look at the ES production deployment guide for help or the #logstash and #elasticsearch IRC channels on freenode.

Running ELK on Docker

ELK stack

I wrote a post awhile back about how to get the ElasticSearch + Logstash + Kibana stack set up and recently have been very involved with Docker so thought it would be appropriate to update that post with the new Docker way of doing things.

Update (5/9/15) – I have created a github repo containing configs for running this.  Reader Sergio also has a solution posted a similar solution on github, you can check it out here if you want to try it out.

I found a surprising lack of posts describing how to run the ELK stack with Docker and docker-compose.  This post is much longer and more detailed than usual so feel free to jump around to different sections for details on different components.  I am planning on doing a follow up on to this post with instructions about how to configure Logstash and the logstash-forwarder client as well as Kibana to do interesting things with logs stored in ElasticsSearch.

There are a lot of other posts about how to get the stack to work but they are either out of date already since the Docker world changes so fast or don’t cover specific details of how different bit work.  The other thing I have observed is that most of the other guides are not done with Docker which is something that makes life easier.

So the first thing that I’ll cover is how to build the Docker images.  If you are interested I can make this stuff available on the Docker hub as images or public Dockerfiles.  However, for this article and in genreal I strongly prefer to write my own Dockerfiles so I will be posting my custom configs and files here rather than pulling other (sometimes official) prebuilt images.

Logstash

The first component we will get set up is the Logstash server.  This setup is also using the log-courier input plugin.  Log-courier is a more customizable and flexible client for forwarding logs to Logstash.  I use both logstash-forwarder and log-courier in this configuration to allow for a more flexible setup.

The following is a Dockerfile that will build a Logstash 1.5.0 image.  One thing to note about this approach is that you can swap out the LOGSTASH_VER and the image will be updated to the correct version automatically and will be ready to be deployed whem the image gets rebuilt.

FROM ubuntu:14.04

ENV DEBIAN_FRONTEND noninteractive
ENV LOGSTASH_VER 1.5.0.rc2
WORKDIR /opt

# Dependencies
RUN apt-get update -qq && \
 apt-get install -y -qq \
 wget \
 python \
 openjdk-7-jre-headless

# Install Logstash
RUN wget --quiet "https://download.elasticsearch.org/logstash/logstash/logstash-$LOGSTASH_VER.tar.gz" -O "/opt/logstash-$LOGSTASH_VER.tar.gz" --no-check-certificate && \
 tar zxf logstash-$LOGSTASH_VER.tar.gz && \
 mv logstash-$LOGSTASH_VER logstash

# Install plugins
RUN /opt/logstash/bin/plugin update logstash-output-zeromq
RUN /opt/logstash/bin/plugin install logstash-input-log-courier

# Config files
ADD server.conf /etc/logstash/server.conf
ADD logstash-forwarder.key /etc/logstash/logstash-forwarder.key
ADD logstash-forwarder.crt /etc/logstash/logstash-forwarder.crt

# lumberjack port
EXPOSE 4545
# log-courier port
EXPOSE 4546

CMD /opt/logstash/bin/logstash -f /etc/logstash/server.conf

This will install version 1.5.0.r2 the logstash-input-log-courier input for logstash, add certificates for the forwarding clients and start Logstash with the server configuration.

In addition to this Dockerfile you will need to generate some certificates for logstash-forwarder clients and the Logstash server itself to use, as well as the server configuration used by the Logstash server.  Below I have a sample but extremely barebones server.conf configuration file.

input {
  lumberjack {
    port => 4545
    ssl_certificate => "/etc/logstash/logstash-forwarder.crt"
    ssl_key => "/etc/logstash/logstash-forwarder.key"
    codec => plain {
      charset => "ISO-8859-1"
    }
  }

  courier {
    port => 4546
    ssl_certificate => "/etc/logstash/logstash-forwarder.crt"
    ssl_key => "/etc/logstash/logstash-forwarder.key"
  }
}

output {
  elasticsearch {
    cluster => "elasticsearch"
  }
}

I will thicken this config up in the next post on how to customize Logstash and doing interesting things with Kibana.  For now, we are defining a courier and lumberjack input, used to ingest logs in to Logstash as well as one output, which is telling Logstash what to do with the logs, in this example it is just stuffing them in to ES.

To generate the certificates needed for logstash/logstah-forwarder you can either follow the instructions on the logstash-forwarder github page or you can use the following command to generate the certs and subsequently inject them in to the Docker image.  It should probably go without saying but make sure the version of openssl used to generate these is an update to date and secure version.

openssl req -x509 -nodes -sha256-days 1095 -newkey rsa:2048 -keyout logstash-forwarder.key -out logstash-forwarder.crt

You will need to follow a few prompts to fill out the certificate details, again you can reference the logstash-forwarder github page if you get stuck or are unsure of how to configure the certificate.

After the certs are generated make sure that the names of the output file names match up to the names in the above Dockerfile and that is pretty much it for getting Logstash ready.

ElasticSearch

The ElasticSearch image is also pretty straight forward.  This will build the specified version from the ES_VER variable which is 1.4.4 currently and will configure a few things.

# Pull base image.
FROM dockerfile/java:oracle-java7

# Set install version
ENV ES_PKG_NAME elasticsearch-1.4.4

# Install ElasticSearch
RUN \
 cd / && \
 wget https://download.elasticsearch.org/elasticsearch/elasticsearch/$ES_PKG_NAME.tar.gz && \
 tar xvzf $ES_PKG_NAME.tar.gz && \
 rm -f $ES_PKG_NAME.tar.gz && \
 mv /$ES_PKG_NAME /elasticsearch

# Define mountable directories
VOLUME ["/data"]

# Define working directory
WORKDIR /data

# Custom ES config
ADD elasticsearch.yml /elasticsearch/config/elasticsearch.yml

# Define default command
CMD ["/elasticsearch/bin/elasticsearch"]

# Expose ports
EXPOSE 9200
EXPOSE 9300

The main key to getting ES to work is getting the configuration file set up correctly.  In this example we are mounting local storage (/data) from the host OS in to the container so that if the container dies the data and indexes and other data aren’t wiped out.  There are also a few other security configuration settings that get set here to lock things down and also to make Kibana 4 happy.

http.cors.allow-origin: "/.*/"
http.cors.enabled: true
cluster.name: elasticsearch
node.name: "logstash.domain.com"
path:
 data: /data/index
 logs: /data/log
 plugins: /data/plugins
 work: /data/work

ES is very straight forward to set up, you just set it up and it runs.

Kibana

This will build the newest iteration of Kibana, which is 4.0.0 as of this writing.  If you aren’t living on the bleeding edge and want to know how to get Kibana 3.x.x working let me know and I will post the configuration for it.

FROM ubuntu:14.04

# Dependencies
RUN apt-get update -qq
RUN sudo apt-get install -y -qq nginx-full wget vim

# Kibana
RUN mkdir -p /opt/kibana
RUN wget https://download.elasticsearch.org/kibana/kibana/kibana-4.0.0-linux-x64.tar.gz -O /tmp/kibana.tar.gz && \
 tar zxf /tmp/kibana.tar.gz && mv kibana-4.0.0-linux-x64/* /opt/kibana/

# Configs
ADD kibana.yml /opt/kibana/config/kibana.yml

EXPOSE 5601

CMD /opt/kibana/bin/kibana

So the Dockerfile is pretty straight forward but there were a few tidbits to be aware of.  Kibana 4.x.x if significantly different in how it works than 3.x.x so you will need to make a few adjustments if you are familiar with the old version.

You will need to pick and choose the bits out of the following configuration to suit your needs.  For example, you will need to adjust the elasticsearch_url, username, password and will need to decide whether to turn ssl on or off.  There are obviously more options but most of them probably don’t need to be adjusted for now.  Here is what the sample config looks like.

port: 5601
host: "0.0.0.0"
elasticsearch_url: "http://logstash.domain.com:9200"
elasticsearch_preserve_host: true
kibana_index: ".kibana"
default_app_id: "discover"
request_timeout: 300000
shard_timeout: 0
verify_ssl: false

# Plugins that are included in the build, and no longer found in the plugins/ folder
bundled_plugin_ids:
 - plugins/dashboard/index
 - plugins/discover/index
 - plugins/doc/index
 - plugins/kibana/index
 - plugins/markdown_vis/index
 - plugins/metric_vis/index
 - plugins/settings/index
 - plugins/table_vis/index
 - plugins/vis_types/index
 - plugins/visualize/index

That’s pretty much it, most of the difficulty of getting the new version of Kibana working is in the configuration so if you want to tweak things or if something isn’t working that is the first place to look.

Docker Compose (glue the pieces together)

This is an integral part of the setup.  This is what controls the different containers and what glues everything together.  Luckily it is easy to get set up and working.  If you aren’t familiar, docker-compose was recently rebranded from the old “fig” tool which has been branded as a Docker orchestration tool for running complex Docker applications easily.

The official docs are pretty good and detailed so you can visit their site if you have any questions about how to install or how to get any of the components working here.

The first step is to download and install docker compose.  Here I am using and Ubuntu system.

sudo pip install -U docker-compose

There are a few docker-compose command line commands to be familiar with which we’ll get to next, but first I will post the sample docker-compose configuration file to test out your ELK stack.

kibana:

 build: /home/<user>/elk/kibana/4.0.0
 restart: always
 ports:
 - "5601:5601"
 links:
 - "elasticsearch:elasticsearch"

elasticsearch:

 build: /home/<user>/elk/elasticsearch/1.4.4
 restart: always
 ports:
 - "9200:9200"
 - "9300:9300"
 volumes:
 - "/data:/data"

logstash:

 build: /home/<user>/elk/logstash/logstash-1.5.0
 restart: always
 ports:
 - "4545:4545"
 - "4546:4546"

Most of the configuration is straight forward.  Here are the main commands to get everything stitched to gether and working.

  • docker-compose build (from the directory the docker-compose.yml file is in)
  • docker-compose up (to test the stack)
  • docker-compose kill (bring it down)

After you have all the issues ironed out building and running and the stack is stable with no errors on start you can start up the stack in detached mode.

  • docker-compose up -d

Additionally, you can look at the logs if something smells fishy.

  • docker-compose logs

Design considerations

One thing that readers might wonder about is the scalability of this setup.  This will scale up very easily but not out.  However, this should be able to handle up to 100k events/second on the Logstash end so there will be other bottlenecks before the components (ES and Logstash) fall down.  I haven’t pushed my own setup this far yet but have been able to get to around 30k/sec before Logstash dies, which I’m still investigating.  Even with that amount of activity and Logstash choking, ES and Kibana don’t get affected.

So if you use this as a guide for a production setup I would recommend that you use a decently sized server, at least 4 CPU, 8 GB memory and adjust the memory and cpu options for the Logstash component if you plan on throwing a lot of logs at it (>30k/s).  I will revisit once I have worked out all the performance issues with some best practices for making Logstash run more smoothly.

I would be interested in knowing how to scale this out if anybody has any recommendations but this setup should scale up decently at least for most scenarios.  I have not played around with ES sharding across hosts but I imagine that it wouldn’t be super complicated, especially using container volume mounts to store the data and indexes at the hose OS level.