Easily login to Rancher containers locally

Sometimes managing containers through the Rancher web console can be tedious and painful.  Especially if you need to copy/paste things into or out of the terminal.  I recently discovered a nice little project on Github called Rancher SSH which allows you to connect to a container running in your Rancher environment as if it was local to the machine you are working on, much like SSH and hence the name.

I am still playing around with the functionality but so far it has been very nice and is very easy to get started with.  To get started you can either install it via Homebrew or with Golang.  I chose to use the homebrew option.

brew install fangli/dev/rancherssh

After it is finished installing (it might take a minute or two), you should have access to the rancherssh command from the CLI.  You might need to source your shell in order to pick up tab completion for the command but you should be able to run the command and get some output.

rancherssh

In order to do anything useful with this tool, you will first need to create an API key for rancherssh in Rancher.  Navigate to the environment you’d like to create the key for and then click the API tab in Rancher.  Then click  the “Add Environment API Key” to bring up the dialogue to create a new key.

add api key

After you create your key make not of the Access key (username)  and Secret key (password).  You will need these to configure rancherssh in the step below.  First, create a file somewhere that is easy to remember, called config.yml and populate it, similar to the following, updating the endpoint, access key and secret key.

endpoint: https://your.rancher.server/v1
user: access_key
password: secret_key

That’s pretty much it.  Make sure the endpoint matches your environment correctly, otherwise you should now be able to connect to a container in your Rancher environment.  You’ll need to make sure you run the rancherssh command from the same directory that you configured your config.yml file, but otherwise it should just work.

rancherssh my-stack_container_1

Optionally you can provide all of the configuration information to the CLI and just skip the config file completely.

rancherssh --endpoint="https://your.rancher.server/v1" --user="access_key" --password="secret_key" my-test-container_1

There is one last thing to mention.  rancherssh provides a nice fuzzy matching mechanism for connecting to containers.  For example, if you can’t remember which containers are available to a stack in Rancher you can run a pattern to match the stack, and rancherssh will tell you which containers are running in the stack and allow you to choose which one to connect to.

ranchserssh %my-stack%

If there are multiple containers this command will allow you to pick which one to connect to.

Searching for container %my-stack%
We found more than one containers in system:
[1] my-stack_container_1, Container ID 1i91308 in project 1a216, IP Address 10.42.154.115
[2] my-stack_container_2, Container ID 1i94034 in project 1a216, IP Address 10.42.119.103
[3] my-stack_container_3, Container ID 1i94036 in project 1a216, IP Address 10.42.146.57

I didn’t have any issues at all getting started with this tool, I would definitely recommend checking it out.  Especially if you do a lot of work in your Rancher containers.  It is fast, easy to use and is really useful for the times that using the Rancher UI is too cumbersome.

Read More

My take on the NoOps movement

I recently attended DevOps Days Portland, where Kelsey Hightower gave a nice Keynote about NoOps.  I had heard of the terms NoOps in passing before the conference but never really thought much about it or its implications. Kelsey’s talk started to get me thinking more and more about the idea and what it means to the DevOps world.

For those of you who aren’t familiar, NoOps is a newer tech buzzword that has emerged to describe the concept that an IT environment can become so automated and abstracted from the underlying infrastructure that there is no need for a dedicated team to manage software in-house.

Obviously the term NoOps has caused some friction between the development world and operations/DevOps world because of its perceived meaning along with a very controversial article entitled “I Don’t Want DevOps.  I Want NoOps.” that kicked the whole movement off and sparked the original debate back in 2011.  The main argument from people who work in operations is that there will always be servers running somewhere, as a developer you can’t just magically make servers go away, which I agree with 100%.  It is incredibly short sighted to assume that any environment can work in a way where operations in some form need not exist.

Interestingly though, if you dig into the goals and underlying meaning of NoOps, they are actually fairly reasonable to me when boiled down.  Here are just a few of them, borrowed from the article and Kelsey’s talk:

  • Improve the process of deploying apps
  • Not just VM’s, release management as well
  • Developers don’t want to deal with operations
  • Developers don’t care about hardware

All of these goals seem reasonable to me as an operations person, especially not having to work with developers.  Therefore, when I look at NoOps I don’t necessarily take the ACTUAL underlying meaning of it be to work against operations and DevOps, I look at it as developers trying to find a better way to get their jobs done, however misguided their wording and mindset.  I also see NoOps, from an operations perspective as a shift in the mindset of how to accomplish goals, to improve processes and pipelines, which is something that is very familiar to people who have worked in DevOps.

Because of this perspective, I see an evolution in the way that operations and DevOps works that takes the best ideas from NoOps and applies them in practical ways.  Ultimately, operations people want to be just as productive as developers and NoOps seems like a good set of ideas to get on the same page.

To be able to incorporate ideas from NoOps as cloud and distributed technologies continue to advance, operations folks need to embrace the idea of programming and automation in areas that have been traditionally managed manually as part of the day to day by operation folks in order to abstract away complicated infrastructure and make it easier for developers to accomplish their goals. Examples of these types of things may include automatically provisioning networks and VLAN’s or issuing and deploying certificates by clicking a button.  As more of the infrastructure gets abstracted away, it is important for operations to be able to automate these tasks.

If anything, I think NoOps makes sense as a concept for improving the lives of both developers and operations, which is one facet that DevOps aims to help solve.  So to me, the goals of NoOps are a good thing, even though there has been a lot of stigma about it.  Just to reiterate, I think it is absurd for anybody to say that jobs of operations will going away anytime soon, the job and responsibilities are just evolving to fit the direction other areas of the business are moving.  If anything, the skills of managing cloud infrastructure, automation and building robust systems will be in higher demand.

As an operations/DevOps person just remember to stay curious and always keep working on improving your skill set.

Read More

Bootstrap servers to a Rancher environment

If you’re not familiar already, Rancher is an orchestration and scheduling tool for containers.  I have written a little bit about Rancher in the past but haven’t covered much on the specifics about how to manage a Rancher environment.  One cool thing about Rancher is its “single pane of glass” approach to managing servers and containers, which allows users and admins to quickly and easily manage complicated environments.  In this post I’ll be covering how to quickly and automatically add servers to your Rancher environment.

One of the manual steps that can(and in my opinion should) be automated is the server bootstrapping process.  The Rancher web interface allows users to add hosts across different cloud providers (AWS, Azure, GCE, etc) and importantly the ability to add a custom host.  This custom host registration is the piece that allows us to automate the host addition process by exposing a registration token via the Rancher API.  One important thing to note if you are going to be adding hosts automatically is that you will need to actually create the entries necessary in the environment that you bootstrap servers to.  So for example, if you create a new environment you will either need to programatically hit the API or in the web interface navigate to Infrastructure -> Add Host to populate the necessary tokens and entries.

Once you have populated the API with the values needed, you will need to create an API token to allow the server(s) that are bootstrapping to connect to the Rancher server to add themselves.  If you haven’t done this before, in the environment you’d like to allow access to navigate to API -> Add Environment API Key -> name it and make a note of key that gets generated.

rancher api

That’s pretty much all of the prep work you need to do to your Rancher environment for this method to work.  The next step is to make a script to bootstrap a server when it gets created.  The logic for this bootstrap process can be boiled down to the following snippet.

#!/bin/bash

INTERNAL_IP=$(ip add show eth0 | awk '/inet/ {print $2}' | cut -d/ -f1 | head -1)
SERVER="https://example.com"
TOKEN="access_key:secret_key"
PROJID="unique_environment"
AGENT_VER="v1.0.1"

RANCHER_URL=$(curl -su $TOKEN $SERVER/v1/registrationtokens?projectId=$PROJID | head -1 | grep -nhoe 'registrationUrl[^},]*}' | egrep -hoe 'https?:.*[^"}]')

docker run \
  -e CATTLE_AGENT_IP=$INTERNAL_IP \
  -e CATTLE_HOST_LABELS='your=label' \
  -d --privileged --name rancher-bootstrap \
  -v /var/run/docker.sock:/var/run/docker.sock \
  rancher/agent:$AGENT_VER $RANCHER_URL

The script is pretty straight forward.  It attempts to gather the internal IP address of the server being created, so that it can add it to the Rancher environment with a unique name.  Note that there are a number of variables that need to get set to reflect.   One that uses the DNS name of the Rancher server, one for the token that was generated in the step above and one for the project ID, which can be found by navigating to the Environment and then looking at the URL for /env/xxxx.

After we have all the needed information and updated the script, we can curl the Rancher server (this won’t work if you didn’t populate the API in the steps above or if your keys are invalide) with the registration token.  Finally, start a docker container with the agent version set (check your Rancher server version and match to that) along with the URL obtained from the curl command.

The final step is to get the script to run when the server is provisioned.  There are many ways to do this and this step will vary depending a number of different factors,  but in this post I am using Cloud-init for CoreOS on AWS.  Cloud-init is used to inject the script into the server and then create a systemd service to run the script the first time the server boots and use the result of the script to run the Rancher agent which allows the server to be picked up by the Rancher server and its environment.

Here is the logic to run the script when the server is booted.

coreos:

  units:
  - name: rancher-agent.service
    command: start
    content: |
      [Unit]
      Description=Rancher Agent
      After=docker.service
      Requires=docker.service
      After=network-online.target
      Requires=network-online.target

      [Service]
      Type=oneshot
      RemainAfterExit=yes
      ExecStart=/etc/rancher-agent

The full version of the cloud-init file can be found here.

After you provision your server and give it a minute to warm up and run the script, check your Rancher environment to see if your server has popped up.  If it hasn’t, the first place to start looking is on the server itself that was just created.  Run docker logs -f rancher-agent to get information about what went wrong.  Usually the problem is pretty obvious.

A brand new server looks something like this.

bootstrapped server

I typically use Terraform to provision these servers but I feel like covering Terraform here is a little bit out of scope.  You can image some really interesting possibilities with auto scaling groups and load balancers that can come and go as your environment changes, one of the beauties of disposable infrastructure as well as infrastructure as code.

If you are interested in seeing how this Rancher bootstrap process fits in with Terraform let me know and I’ll take a stab at writing up a little piece on how to get it working.

Read More

Easy Prometheus Monitoring in Rancher

Docker monitoring and container monitoring in general is an area that has historically been difficult.  There has been a lot of movement and progress in the last year or so to beef up container monitoring tools but in my experience the tools have either been expensive or difficult to configure and complicated to use.  The combination or Rancher and Prometheus has finally given me hope.  Now it is easy easy to setup and configure a distributed monitoring solution without paying a high price.

Prometheus has recently added support for Rancher via the Rancher exporter, which is great news.  This is by far the easiest method I have discovered thus far for experimenting with Prometheus.

For those that don’t know much about Prometheus, it is an up and coming project created by engineers at Soundcloud which is hosted on Github.  Prometheus is focused on monitoring, specifically focusing on container and Docker monitoring.  Prometheus uses a polling based model for “scraping” metrics out of predefined endpoints.  The Prometheus Rancher exporter enables Prometheus to scrape Rancher server specific metrics, which are very useful to have.  To build on that, one other point worth mentioning here is that Prometheus has a very nice, flexible design built upon different client libraries in a similar way to Graphite, so adding support and instrumenting code for different types of platforms is easy to implement.  Check out the list of exporters in the Prometheus docs for idea on how to get started exporting metrics.

This post won’t cover setting up Rancher server or any of the Rancher environment since it is well documented in other places.  I won’t touch on alerting here either because I honestly haven’t had much time to dig into it much yet.  So with that said, the first step I will focus on in this post is getting Prometheus set up and running.  Luckily it is extremely easy to accomplish this using the Rancher catalog and the Prometheus template.

prometheus stack

Once Prometheus has been bootstrapped and everything is up, test it out by navigating to the Grafana home dashboard created by the bootstrap process.  Since this is a simple demo, my dashboard is located at the IP of the server using port 3000 which is the only port that should need to be publicly exposed if you are interested in sharing the Grafana dashboard.

The default Grafana credentials for this catalog template are admin/admin for the username and password, which is noted in the catalog notes found here.  The Prometheus tools ship with some nice preconfigured dashboards, so after you have things set up, it is definitely worth checking out some of them.

grafana dashboard

If you look around the dashboards you will probably notice that metrics for the Rancher server aren’t available by default.  To enable these metrics we need to configure Prometheus to connect to the Rancher API, as noted in the Rancher monitoring guide.

Navigate to http://<SERVER_IP>:8080/v1/settings/graphite.host on your Rancher server, then in the top right click edit, and then update the value there to point to the server address where InfluxDB was deployed to.

influxdb host

After this setting has been configured, restart the Rancher server container, wait a few minutes and then check Grafana.

rancher server metrics

As you can see, metrics are now flowing in the the dashboard.

Now that we have the basics configured, we can drill down in to individual containers to get a more granular view of what is happening in the environment.  This type of granularity is great because it gives a very detailed view of what exactly is going on inside our environment and gives us an easy way to share visuals with other team members.  Prometheus offers a web interface to interact with the query language and visual results, which is useful to help figure out what kinds of things to visualize in Grafana.

Navigate to the server that the Prometheus server container is deployed to on port 9090.  You should see a screen similar to the following.

promdash

There is  documentation about how to get started with using this tool, so I recommend taking a look and playing around with it yourself.  Once you find some useful metrics, visualized in the graph view, grab the query used to generate the graph and add a new dashboard to Grafana.

Prometheus offers a lot of power and flexibility and is a great tool for monitoring.  I am still very new to Prometheus but so far it looks very promising and I have to say I’m really impressed with the amount of polish and detail I was able to get in just an afternoon of experimenting.  I will be updating this post as I get more exposure to Prometheus and get more metrics and monitoring set up so stay tuned.

Read More

Set up SSL for Rancher Server

One issue you will probably run across if you start to use Rancher to manage your Docker containers is that it doesn’t serve pages over an encrypted connection by default.  If you are looking to put Rancher in to a production scenario, it is a good idea to serve encrypted pages.  HA is another topic, but at this point I have not attempted to set it up yet because it is a much more complicated process currently.  The Rancher folks are working on making HA easier in the near future (if you know an easy way to do it I would love to hear about it).  I would argue though that if you can set up SSL for your Rancher server you are over half way to a full production set up.

The process of getting Rancher to proxy through an encrypted connection is straight forward, assuming you already have some certs to use.  If you don’t already have any official certificates issued *I think* you should be okay with self signed certs, but you won’t get that green lock that everybody loves.  Definitely if you are just testing this set up you should be fine to start out with some self signed certs.  Here is a reference for creating some certs for Nginx to test with.

Another important thing to be aware of is that these instructions are specific to the Nginx method outline above.  I have not tried the Apache method, though I would guess it should be very easy to adapt.

Take a look at the Rancher docs as a starting point for getting started, they are very good and will get you most of the way there.  However, when I went through this process there were a few pieces of information that I had to piece together myself, which is the bulk of what I will be sharing today.

The first step is to adapt the configuration in the docs in to a full Nginx config that can be dropped in to the official Nginx image from Dockerhub.  Here is the config I used.

upstream rancher {
    server rancher-server:8080;
}

server {
    listen 443 ssl;
    server_name test.com;
    ssl_certificate /etc/rancher/test.com.crt;
    ssl_certificate_key /etc/rancher/test.com.key;

    access_log /var/log/nginx/access.log;
    error_log  /var/log/nginx/error.log;

    location / {
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Forwarded-Port $server_port;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_pass http://rancher;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        # This allows the ability for the execute shell window to remain open for up to 15 minutes. Without this parameter, the default is 1 minute and will automatically close.
        proxy_read_timeout 900s;
    }
}

server {
    listen 80;
    server_name test.com;
    return 301 https://$server_name$request_uri;
}

There are a few important things to note about this config.   One is naming the upstream the same name as what the rancher server container is named, in this case rancher-server.

Note that I have used test.com as the server name and so the certs and names are all reflective of that value.  Obviously that will need to be updated with your own values.

Finally, we have added an additional logging section to the config that will pipe the logs to stdout/stderr so we can easily look at the requests from the host OS via the “docker logs” command.

To get the following Docker run command to work correctly you will want to create a directory called /etc/rancher or something easy to remember, and place this config (named as rancher-nginx.conf), along with the certs you have created in to this location.  Alternately you can modify the Docker run command and simply have the volume mounts pointed at the location you store the configuration and certs.  For me, it makes the most sense to group these items together in /etc/rancher.

docker run -d --restart=always --name nginx 
    -v /etc/rancher/rancher-nginx.conf:/etc/nginx/conf.d/default.conf
    -v /etc/rancher/test.com.crt:/etc/rancher/test.com.crt
    -v /etc/rancher/test.com.key:/etc/rancher/test.com.key
    -p 80:80 -p 443:443 --link=rancher-server nginx

This will mount in the correct configuration and certificates to the Nginx docker container, expose port 80 and 443 for web traffic (make sure to adjust any firewall rules you need to get traffic to pass through these ports), and link to the rancher-server container so that the traffic can be proxied.

Additionally, you will need to update any reference to the old address that was using http://<rancher-name>:8080/ to point to https://<rancher-name>/.  Namely the host registration configuration in the Rancher server, but if you were relying on any other outside tools to hit that endpoint they will also need to be updated to use https instead.

Read More