Export SNMP metrics with the Prometheus Operator

There are quite a few use cases for monitoring outside of Kubernetes, especially for previously built infrastructure and otherwise legacy systems.  Additional monitoring adds an extra layer of complexity to your monitoring setup and configuration, but fortunately Prometheus makes this extra complexity easier to manage and maintain, inside of Kubernetes.

In this post I will describe a nice clean way to monitor things that are internal to Kubernetes using Prometheus and the Prometheus Operator.  The advantage of this approach is that it allows the Operator to manage and monitor infrastructure, and it allows Kubernetes to do what it’s good at; make sure the things you want are running for you in an easy to maintain, declarative manifest.

If you are already familiar with the concepts in Kubernetes then this post should be pretty straight forward.  Otherwise, you can pretty much copy/paste most of these manifests into your cluster and you should have a good way to monitor things in your environment that are external to Kubernetes.

Below is an example of how to monitor external network devices using the Prometheus SNMP exporter.  There are many other exporters that can be used to monitor infrastructure that is external to Kubernetes but currently it is recommended to set up these configurations outside of the Prometheus Operator to basically separate monitoring concerns (which I plan on writing more about in the future).

Create the deployment and service

Here is what the deployment might look like.

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: snmp-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: snmp-exporter
  template:
    metadata:
    labels:
      app: snmp-exporter
  spec:
    containers:
    - image: oakman/snmp-exporter
    command: ["/bin/snmp_exporter"]
    args: ["--config.file=/etc/snmp_exporter/snmp.yml"]
    name: snmp-exporter
    ports:
    - containerPort: 9116
      name: metrics

And the accompanying service.

apiVersion: v1
kind: Service
metadata:
  labels:
    app: snmp-exporter
  name: snmp-exporter
spec:
  ports:
  - name: http-metrics
    port: 9116
    protocol: TCP
    targetPort: metrics
  selector:
    app: snmp-exporter

At this point you would have a pod in your cluster, attached to a static IP address.  To see if it worked you can check to make sure a service IP was created.  The service is basically what the Operator uses to create targets in Prometheus.

kubectl get sv

From this point you can 1) set up your own instance of Prometheus using Helm or by deploying via yml manifests or 2) set up the Prometheus Operator.

Today we will walk through option 2, although I will probably cover option 1 at some point in the future.

Setting up the Prometheus Operator

The beauty of using the Prometheus Operator is that it gives you a way to quickly add or change Prometheus specific configuration via the Kubernetes API (custom resource definition) and some custom objects provided by the operator, including AlertManager, ServiceMonitor and Prometheus objects.

The first step is to install Helm, which is a little bit outside of the scope of this post but there are lots of good guides on how to do it.  With Helm up and running you can easily install the operator and the accompanying kube-prometheus manifests which give you access to lots of extra Kubernetes metrics, alerts and dashboards.

helm repo add coreos https://s3-eu-west-1.amazonaws.com/coreos-charts/stable/
helm install --name prometheus-operator --set rbacEnable=true --namespace monitoring coreos/prometheus-operator
helm install coreos/kube-prometheus --name kube-prometheus --namespace monitoring

After a few moments you can check to see that resources were created correctly as a quick test.

kubectl get pods -n monitoring

NOTE: You may need to manually add the “prometheus” service account to the monitoring namespace after creating everything.  I ran into some issues because Helm didn’t do this automatically.  You can check this with kubectl get events.

Prometheus Operator configuration

Below are steps for creating custom objects (CRDs) that the Prometheus Operator uses to automatically generate configuration files and handle all of the other management behind the scenes.

These objects are wired up in a way that configs get reloaded and Prometheus will automatically get updated when it sees a change.  These object definitions basically convert all of the Prometheus configuration into a format that is understood by Kubernetes and converted to Prometheus configuration with the operator.

First we make a servicemonitor for monitoring the the snmp exporter.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: snmp-exporter
    prometheus: kube-prometheus # tie servicemonitor to correct Prometheus
  name: snmp-exporter
spec:
  jobLabel: k8s-app
  selector:
    app: snmp-exporter
  namespaceSelector:
    matchNames:
    - monitoring

  endpoints:
  - interval: 60s
    port: http-metrics
    params:
      module:
      - if_mib # Select which SNMP module to use
      target:
      - 1.2.3.4 # Modify this to point at the SNMP target to monitor
    path: "/snmp"
    targetPort: 9116

Next, we create a custom alert and tie it our Prometheus Operator.  The alert doesn’t do anything useful, but is a good demo for showing how easy it is to add and manage alerts using the Operator.

Create an alert-example.yml configuration file, add it as a configmap to k8s and mount it in as a configuration with the ruleSelector label selector and the prometheus operator will do the rest. Below shows how to hook up a test rule into an existing Prometheus (kube-prometheus) alert manager, handled by the prometheus-operator.

kind: ConfigMap
apiVersion: v1
metadata:
 name: josh-test
 namespace: monitoring
 labels:
 role: alert-rules # Standard convention for organizing alert rules
 prometheus: kube-prometheus # tie to correct Prometheus
data:
 test.rules.yaml: |
 groups:
 - name: test.rules # Top level description in Prometeheus
 rules:
 - alert: TestAlert
 expr: vector(1)

Once you have created the rule definition via configmap just use kubectl to create it.

kubectl create -f alert-example.yml -n monitoring

Testing and troubleshooting

You will probably need to port forward the pod to get access to the IP and port in the cluster

kubectl port-forward snmp-exporter-<name> 9116

Then you should be able to visit the pod in your browser (or with curl).

localhost:9116

The exporter itself does a lot more so you will probably want to play around with it.  I plan on covering more of the details of other external exporters and more customized configurations for the SNMP exporter.

For example, if you want to do any sort of monitoring past basic interface stats, etc. you will need to generate and build your own set of MIBs to gather metrics from your infrastructure and also reconfigure your ServiceMonitor object in Kubernetes to use the correct MIBs so that the Operator updates the configuration correctly.

Conclusion

The amount of options for how to use Prometheus is one area of confusion when it comes to using Prometheus, especially for newcomers.  There are lots of ways to do things and there isn’t much direction on how to use them, which can also be viewed as a strength since it allows for so much flexibility.

In some situations it makes sense to use an external (non Operator managed Prometheus) when you need to do things like manage and tune your own configuration files.  Likewise, the Prometheus Operator is a great fit when you are mostly only concerned about managing and monitoring things inside Kubernetes and don’t need to do much external monitoring.

That said, there is some support for external monitoring using the Prometheus Operator, which I want to write about in a different post.  This support is limited to a handful of different external exporters (for the time being) so the best advice is to think about what kind of monitoring is needed and choose the best solution for your own use case.  It may turn out that both types of configurations are needed, but it may also end up being just as easy to use one method or another to manage Prometheus and its configurations.

Read More

Test Rancher 2.0 using Minikube

If you haven’t heard yet, Rancher recently revealed news that they will be building out a new v2.0 of their popular container orchestration and management platform to be built specifically to run on top of Kubernetes.  In the container realm, Kubernetes has recently become a clear favorite in the battle of orchestration and container management.  There are still other options available, but it is becoming increasingly clear that Kubernetes has the largest community, user base and overall feature set so a lot of the new developments are building onto Kubernetes rather than competing with it directly.  Ultimately I think this move to build on Kubernetes will be good for the container and cloud community as companies can focus more narrowly now on challenges tied specifically around security, networking, management, etc, rather than continuing to invent ways to run containers.

With Minikube and the Docker for Mac app, testing out this new Rancher 2.0 functionality is really easy.  I will outline the (rough) process below, but a lot of the nuts and bolts are hidden in Minikube and Rancher.  So if you’re really interested in learning about what’s happening behind the scenes, you can take a look at the Minikube and Rancher logs in greater detail.

Speaking of Minkube and Rancher, there are a few low level prerequisites you will need to have installed and configured to make this process work smoothly, which are listed out below.

Prerequisites

  • Tested on OSX
  • Get Minikube working – use the Kubernetes/Minikube notes as a reference (you may need to bump memory to 4GB)
  • Working version of kubectl
  • Install and configure docker for mac app

I won’t cover the installation of these perquisites, but I have blogged about a few of them before and have provided links above for instructions on getting started if you aren’t familiar with any of them.

Get Rancher 2.0 working locally

The quick start guide on the Rancher website has good details for getting this working – http://rancher.com/docs/rancher/v2.0/en/quick-start-guide/.  On OSX you can use the Docker for Mac app to get a current version of Docker and compose.  After Docker is installed, the following command will start the Rancher container for testing.

docker run -d --restart=unless-stopped -p 8080:8080 --name rancher-server rancher/server:preview

Check that you can access the Rancher 2.0 UI by navigating to http://localhost:8080 in your browser.

If you wanted to dummy a host name to make access a little bit easier you could just add an extra entry to /etc/hosts.

Import Minikube

You can import an existing cluster into the Rancher environment.  Here we will import the local Minikube instance we got going earlier so we can test out some of the new Rancher 2.0 functionality.  Alternately you could also add a host from a cloud provider.

In Rancher go to Hosts, Use Existing Kubernetes.

Use existing Kubernetes

Then grab the IP address that your local machine is using on your network.  If you aren’t familiar, on OSX you can reach into the terminal and type “ifconfig” and pull out the IP your machine is using.  Also make sure to set the port to 8080, unless you otherwise modified the port map earlier when starting Rancher.

host registration url

Registering the host will generate a command to run that applies configuration on the Kubernetes cluster.  Just copy this kubectl command in Rancher and run it against your Minikube machine.

kubectl url

The above command will join Minikube into the Rancher environment and allow Rancher to manage it.  Wait a minute for the Rancher components (mainly the rancher-agent continer/pod) to bootstrap into the Minikube environment.  Once everything is up and running, you can check things with kubectl.

kubectl get pods --all-namespaces | grep rancher

Alternatively, to verify this, you can open the Kubernetes dashboard with the “minikube dashboard” command and see the rancher-agent running.

kubernetes dashboard

On the Rancher side of things, after a few minutes, you should see the Minikube instance show up in the Rancher UI.

rancher dashboard

That’s it.  You now have a working Rancher 2.0 instance that is connected to a Kubernetes cluster (Minikube).  Getting the environment to this point should give you enough visibility into Rancher and Kubernetes to start tinkering and learning more about the new features that Rancher 2.0 offers.

The new Rancher 2.0 UI is nice and simplifies a lot of the painful aspects of managing and administering a Kubernetes cluster.  For example, on each host, there are metrics for memory, cpu, disk, etc. as well as specs about the server and its hardware.  There are also built in conveniences for dealing with load balancers, secrets and other components that are normally a pain to deal with.  While 2.0 is still rough around the edges, I see a lot of promise in the idea of building a management platform on top Kubernetes to make administrative tasks easier, and you can still exec to the container for the UI and check logs easily, which is one of my favorite parts about Rancher.  The extra visualization is a nice touch for folks that aren’t interested in the CLI or don’t need to know how things work at a low level.

When you’re done testing, simply stop the rancher container and start it again whenever you need to test.  Or just blow away the container and start over if you want to start Rancher again from scratch.

Read More

Create a Kubernetes cluster on AWS and CoreOS with Terraform

Up until my recent discovery of Terraform, the process I had been using to test CoreOS and Kubernetes was somewhat cumbersome and manual.  There are still some manual steps and processes involved in the bootstrap and cluster creation process that need to get sorted out, but now I can bring environments up and down, quickly and automatically.  This is a HUGE time saver and also makes testing easier because these changes can happen in a matter of minutes rather than hours and can all be self documented for others to reference in a Github repo.  Great success.

NOTE:  This method seems to be broken as of the 0.14.2 release of Kubernetes.  The latest version I could get to work reliably was v0.13.1.  I am following the development and looking forward to the v1.0 release but won’t revisit this method until something stable has been shipped because there are still just too many changes going on.  With that said, v0.13.1 has a lot of useful functionality and this method is actually really easy to get working once you have the groundwork laid out.

Another benefit is that as the project develops and matures, the only thing that will need modified are the cloud configs I am using here.  So if you follow along you can use my configs as a template, feel free to use this as a base and modify the configs to get this working with a newer release.  As I said I will be revisiting the configs once things slow down a little and a v1 has been released.

Terraform

So the first component that we need to enable in this workflow is Terraform.  From their site, “Terraform is a tool for building, changing, and combining infrastructure safely and efficiently.”  Basically, Terraform is a command line tool for allowing you to implement your infrastructure as code across a variety of different infrastructure providers.  It should go without saying, being able to test environments across different platforms and cloud providers is a gigantic benefit.  It doesn’t lock you in to any one vendor and greatly helps simplify the process of creating complex infrastructures across different platforms.

Terraform is still a young project but has been maturing nicely and currently supports most of the functionality needed for this method to work (the missing stuff is in the dev pipeline and will be released in the near future).  Another benefit is that Terraform is much easier to use and understand than CloudFormation, which is  a propriety cloud provisioning tool available to AWS customers, which could be used if you are in a strictly AWS environment.

The first step is to download and install Terraform.  In this example I am using OSX but the instructions will be similar on Linux or other platforms.

cd /tmp
wget https://dl.bintray.com/mitchellh/terraform/terraform_0.3.7_darwin_amd64.zip
unzip terraform_0.3.7_darwin_amd64.zip
mv terraform* /usr/local/bin

After you have moved the binary you will need to source your shell.  I use zsh so I just ran “source ~/.zshrc” to update the path for terraform.

To test terraform out you can check the version to make sure it works.

terraform version

Now that Terraform is installed you will need to get some terraform files set up.  I suggest making a local terraform directory on your machine so you can create a repo out of it later if desired.  I like to split “services” up by creating different directories.  So within the terraform directory I have created a coreos directory as well as a kubernetes directory, each with their own variables file (which should be very similar).  I don’t know if this approach is a best practice but has been working well for me so far.  I will gladly update this workflow if there is a better way to do this.

Here is a sample of what the file and directory layout might look like.

cloud-config
  etcd-1.yml
  etcd-2.yml
  etcd-3.yml
  kube-master.yml
  kube-node.yml
etcd
  dns.tf
  etcd.tf
  variables.tf
kubernetes
  dns.tf
  kubernetes.tf
  variables.tf

As you can see there is a directory for Etcd as well as Kubernetes specific configurations.  You may also notice that there is a cloud-config directory.  This will be used as a central place to put configurations for the different services.

Etcd

With Terraform set up, the next component needed for this architecture to work is a functioning etcd cluster. I chose to use a separate 3 node cluster (spread across 3 AZ’s) for improved performance and resliency.  If one of the nodes goes down or away with a 3 node cluster it will still be operational, where if a 1 node cluster goes away you will be in much more trouble.  Additionally if you have other services or servers that need to leverage etcd you can just point them to this etcd cluster.

Luckily, with Terraform it is dead simple to spin up and down new clusters once you have your initial configurations set up and configured correctly.

At the time of this writing I am using the current stable version of CoreOS, which is 633.1.0, which uses version 0.4.8 of etcd.  According to the folks at CoreOS, the cloud configs for old versions of etcd should work once the new version has been released so moving to a the new 2.0 release should be easy once it hits the release channel but some tweaks or additional changes to the cloud configs may need to occur.

Configuration

Before we get in to the details of how all of this works, I would like to point out that many of the settings in these configuration files will be specific to users environments.  For example I am using an AWS VPC in the “us-east-1” region for this set up, so you may need to adjust some of the settings in these files to match your own scenario.  Other custom components may include security groups, subnet id’s, ssh keys, availability zones, etc.

Terraform offers resources for basically all network components on AWS so you could easily extend these configurations to build out your initial network and environment if you were starting a project like this from scratch.  You can check all the Terraform resources for the AWS provider here.

Warning: This guide assumes a few subtle things to work correctly.  The address scheme we are using for this environment is a 192.168.x.x, leveraging 3 subnets to spread the nodes out across for additional availability (b, c, e) in the US-East-1 AWS region.  Anything in the configuration that has been filled in with “XXX” represents a custom value that you will need to either create or obtain in your own environment and modify in the configuration files.

Finally, you will need to provide AWS credentials to allow Terraform to communicate with the API for creating and modifying resources.  You can see where these credentials should be filled in below in the variables.tf file.

variables.tf

variable "access_key" { 
 description = "AWS access key"
 default = "XXX"
}

variable "secret_key" { 
 description = "AWS secret access key"
 default = "XXX"
}

variable "region" {
 default = "us-east-1"
}

/* CoreOS AMI - 633.1.0 */

variable "amis" {
 description = "Base CoreOS AMI"
 default = {
 us-east-1 = "ami-d6033bbe" 
 }
}

Here is what an example CoreOS configs look like.

etcd.tf

provider "aws" {
 access_key = "${var.access_key}"
 secret_key = "${var.secret_key}"
 region = "${var.region}"
}

/* Etcd cluster */

resource "aws_instance" "etcd-01" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1e" 
 instance_type = "t2.micro"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = XXX"
 private_ip = "192.168.1.10"
 user_data = "${file("../cloud-config/etcd-1.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "20"
 } 
}

resource "aws_instance" "etcd-02" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1b" 
 instance_type = "t2.micro"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = "XXX"
 private_ip = "192.168.2.10"
 user_data = "${file("../cloud-config/etcd-2.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "20"
 } 
}

resource "aws_instance" "etcd-03" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1c" 
 instance_type = "t2.micro"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = "XXX"
 private_ip = "192.168.3.10"
 user_data = "${file("../cloud-config/etcd-3.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "20"
 } 
}

Below I have created a configuration file as a simaple way to create DNS records dynamically when spinning up the etcd cluster nodes.

dns.tf

 resource "aws_route53_record" "etcd-01" {
 zone_id = "XXX"
 name = "etcd-01.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.etcd-01.private_ip}"]
}

resource "aws_route53_record" "etcd-02" {
 zone_id = "XXX"
 name = "etcd-02.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.etcd-02.private_ip}"]
}

resource "aws_route53_record" "etcd-03" {
 zone_id = "XXX"
 name = "etcd-03.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.etcd-03.private_ip}"]
}

Once all of the configurations have been put in place and all look right you can test out what your configuration will look like with the “plan” command:

cd etcd
terraform plan

Make sure to change in to your etcd directory first.  This will examine your current configuration and calculate any changes.  If your environment is completely unconfigured then this command will return some output that explains what terraform is planning to do.

If you don’t want the input prompts when you run your plan command you can append the “-input=false” flag to bypass the configurations.

If everything looks okay with the plan you can tell Terraform to “apply” your conifgs with the following:

terraform apply
OR
terraform apply -input=false

If everything goes accordingly, after a few minutes you should have a new 3 node etcd cluster running on the lastest stable version of CoreOS with DNS records for interacting with the nodes!  To double check that the servers are being created you can check the AWS console to see if your newly defined servers have been created.  The console is a great way to double check that things all work okay and that the right values were created.

If you are having trouble with the cloud configs check the end of the post for the link to all of the etcd and Kubernetes cloud configs.

Kubernetes

The Kubernetes configuration is very similar to etcd.  It uses a variables.tf, kubernetes.tf and dns.tf file to configure the Kubernetes cluster.

The following configurations will build a v0.13.1 Kubernetes cluster with 1 master, and 3 worker nodes to begin with.  This config can be extended easily to scale the number of worker nodes to basically as many as you want (I could easily image the hundreds or thousands), simply by changing a few number in the configuration, barely adding any overhead to our current process and workflow, which is nice.  Because of these possibilities, Terraform allows for a large amount of flexibility in how you manage your infrastructure.

This configuration is using c3.large instances so be aware that your AWS bill may be affected if you spin nodes up and fail to turn them off when you are done testing.

provider "aws" {
 access_key = "${var.access_key}"
 secret_key = "${var.secret_key}"
 region = "${var.region}"
}

/* Kubernetes cluster */

resource "aws_instance" "kube-master" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1e" 
 instance_type = "c3.large"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = "XXX"
 private_ip = "192.168.1.100"
 user_data = "${file("../cloud-config/kube-master.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "20"
 } 
}

resource "aws_instance" "kube-e" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1e" 
 instance_type = "c3.large"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = "XXX"
 count = "1"
 user_data = "${file("../cloud-config/kube-node.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "100"
 } 
}

resource "aws_instance" "kube-b" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1b" 
 instance_type = "c3.large"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = "XXX"
 count = "1"
 user_data = "${file("../cloud-config/kube-node.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "100"
 } 
}

resource "aws_instance" "kube-c" {
 ami = "${lookup(var.amis, var.region)}"
 availability_zone = "us-east-1c" 
 instance_type = "c3.large"
 subnet_id = "XXX"
 security_groups = ["XXX"]
 key_name = "XXX"
 count = "1"
 user_data = "${file("../cloud-config/kube-node.yml")}"

 root_block_device = {
 device_name = "/dev/xvda"
 volume_type = "gp2"
 volume_size = "100"
 } 
}

And our DNS configuration.

resource "aws_route53_record" "kube-master" {
 zone_id = "XXX"
 name = "kube-master.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.kube-master.private_ip}"]
}

resource "aws_route53_record" "kube-e" {
 zone_id = "XXX"
 name = "kube-e-test.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.kube-e.0.private_ip}"]
}

resource "aws_route53_record" "kube-b" {
 zone_id = "XXX"
 name = "kube-b.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.kube-b.0.private_ip}"]
}

resource "aws_route53_record" "kube-c" {
 zone_id = "XXX"
 name = "kube-c.example.domain"
 type = "A"
 ttl = "300"
 records = ["${aws_instance.kube-c.0.private_ip}"]
}

The variables file for Kubernetes should be identical to the etcd configuration so I have chosen not to place it here.  Just refer to the previous etcd/variables.tf file.

Resources

Since each cloud-config is slightly different (and would take up a lot more space) I have included those files in the below gist.  You will need to populate the “ssh_authorized_keys:” section with your own SSH public key and update any of the IP addresses to reflect your environment.  I apologize if there are any typo’s, there was a lot of cut and paste.

Cloud configs – https://gist.github.com/jmreicha/7923c295ab6110151127

Much of the configurations that I am using are based on the Kubernetes docs, as well as some of the specific cloud configs that I have adapted, which can be found here.

Another great place to get help with Kubernetes is the IRC channel which can be found on irc.freenode.net in the #google-containers channel.  The folks that hang out there are super friendly and can almost always answer any questions you have.

As I said, development is still pretty crazy.  You can check the releases page to check out all the latest stuff.

Conclusion

Yes this can seem very convoluted at first but if everything works how it should, you now have a quick and easy way to spin up identical etcd and/or a Kubernetes environments up or down at will, which is pretty powerful.  Also, this method is dramatically easier than most of the methods I have come across so far in my own adventures and testing.

Once you get through the initial confusion and learning curve this workflow becomes a huge timesaver for testing out different versions of Kubernetes and also for experimenting with etcd.  I haven’t quite automated the entire process but I imagine that it would be easy to spin entire environments up and down by gluing all of these pieces together with some simple shell scripts.

If you need to make any configuration updates, for example to put a new version of Kubernetes in place, you will need first update your Kubernetes master/node cloud configs and then rerun terraform apply to have it recreate your environment.

The cloud config changes will destroy any nodes that rely on the old configuration.  Therefore, you will need to make sure that if you make any changes to your cloud config files you are prepared to deal with the consequences!  Ideally you should get your etcd cluster to a good spot and then leave it alone and just play around with the Kubernetes components since both of those components have been separated in order to change the components out independently.

With this workflow you can already start to see the power of terraform even with this one example.  Terraform is quickly becoming one of my favorite automation and cloud tools and is providing a very easy way to define and build infrastructure though code and configurations.

Read More

Kubernetes resize and rolling updates

If you haven’t heard of or used Kubernetes yet, I highly recommend taking a look (see the link below).  I won’t take too much time here today to talk about the Kubernetes project because there is just too much to cover.  Instead I will be writing a series of posts about how to work with Kubernetes and share some tricks and tips that I have discovered in my experiences so far with the tool.  Since the project is still very young and moving incredibly quickly, the best place to get information is either the IRC channel (#google-containers), the mailing list, or their github project.  Please go look at the github project if you are new to Kubernetes, or are interested in learning more about it, especially their docs and examples sections.

As I said, updates and progress have been extremely fast paced, so it isn’t uncommon for things in the Kubernetes project to seem obselete before they have even been implemented.  For example, the command line tool for interacting with a Kubernetes cluster has already changed faces a few times, which was confusing to me when I first started out.  Kubecfg is on the way out and the project maintainers are working on removing old references to it.  On the flip side, the kubectl command is maturing quite nicely and will be around for awhile, along with the subcommand that I will be describing.

Now that I have all the basic background stuff out of the way; the version of kubectl I am using for this demonstration is v0.9.1.  If you just discovered Kubernetes or have been using kubecfg (as explained above) you will want to make sure to get more familiar with kubectl because it is the preferred tool going forward, at least at this point.

There are a few handy subcommands that come baked in to the kubectl command.  The first is the resize command.  This command allows you to scale the number of running containers being managed by Kubernetes up or down on the fly.  Obviously this can be really powerful!  The syntax is pretty straight forward and below I have an example listed.

kubectl resize –current-replicas=6 –replicas=0 rc test-rc

The –current-replicas argument is optional, the –replicas defines the *desired* number of replicas to have runing, rc specifies this is a replication controller, and finally, test-rc is the name of the replication controller to scale.   After you scale your replication controller you can check out the status quickly via the following command.

kubectl get pod

Another handy tool to have when working with Kubernetes is the ability to deploy new images as a rolling update.

 kubectl rollingupdate test-rc -f test-rc-2.yml –update-period=”10s”

The rollingupdate command takes a few arguments.  The first is the name of the current replication controller that you would like to update.  The second is to replace it with the yml file of the new replication controller and the third optional argument is the –update-period, which allows a user to override the default time that it takes to spin up a new container and spin down an old.

Below is an example of what your test-rc-2.yml file may look like.

kind: ReplicationController
apiVersion: v1beta1
id: test-rc-2
namespace: default
desiredState:
 replicas: 1
 replicaSelector:
   name: test-rc
   version: v2
 podTemplate:
 labels:
   name: test-rc
   version: v2
 desiredState:
 manifest:
 version: v1beta1
 id: test-rc
 containers:
   - name: test-image
   image: test/test:new-tag
   imagePullPolicy: PullAlways
 ports:
   - name: test-port
   containerPort: 8080

There are a few important things to notice.  The first is that the id must be unique, it can’t be a name that is already in use by another replication controller.  All of the label names should remain the same except for the version.  The version is used to signify the new replication controller is a running a new docker image.  The version number should be unique, which will help keep track of which image version is running.

Another thing to note.  If your original replication controller did not contain a unique key (like version) then you will need to update the original replication controller first, adding a unique key, before attempting to run the rolling update.

If both replication controllers don’t have the same format you will get an error similar to this.

test-rc.yml must specify a matching key with non-equal value in Selector for <selector name>

So that’s pretty much it for now.  I will revisit this post again in the future as new flags and subcommands are added to kubectl for managing and updating replication controllers.  I also plan on writing a few more posts about other aspects and areas of kubectl and running Kubernetes, so please check back soon!

Read More