Setting up a Jenkins 2.0 pipeline

Introduction

The new Jenkins pipeline integration in 2.0 is pretty awesome and is a great way to add more automation to Jenkins.In this post we will set up Jenkins so that we can write our own custom libraries for Jenkins to use with the pipeline.

One huge benefit of the new pipeline integration in to the core components of Jenkins is the idea of a Jenkinsfile, which enables a way to automatically execute Jenkins jobs on the repo automatically, similar to the way TravisCI works.  Simply place a Jenkinsfile in the root of the repo that you wish to automate, set up your webhook so that events to GitHub automatically trigger the Jenkins job and let Jenkins take care of the build.

There are a few very good guides for getting this functionality setup.

Unfortunately, while these guides/docs are very informative and useful they are somewhat confusing to new users and glaze over a few important steps and details that took me longer than it should have to figure out and understand.  Therefore the focus will be on some of the details that the mentioned guides lack, especially setting up the built in Jenkins Git repo for accessing and working with the custom pipeline libraries.

Setup

There are many great resources out there already for getting a Jenkins server up and running.  For the purposes of this post it will be assumed that you have already create and setup a Jenkins instance, using one of the other Digital Ocean [tutorials](https://www.digitalocean.com/community/tutorials/?q=jenkins).

The first step to getting the pipeline set up, again assuming you have already installed Jenkins 2.0 from one of the guides linked above, is to enable the Jenkins custom Git repo for housing unique pipeline libraries that we will write a little bit later on.  There is a section in the workflow plugin tutorial that explains it, but is one of the confusing areas and is buried in the documentation so it will be covered in more detailed below.

In order to be able to clone, read and write to the built in Jenkins Git repo, you will need to add your SSH public key to the server.  To do this, the first step will be to configure the server per the SSH plugin and configuring a port to connect to so that the repo can be cloned.  This can be found in Jenkins -> Configuration.  In this example I used port 2222.

ssh port configuration

As always security should be a concern, so make sure you have authentication turned on.  In this example, I am using GitHub authentication but the process will be similar for other authentication methods.  Also make sure you use best practices in locking down any external access by hardening SSH or using other ways of blocking access.

After the Git server has been configured, you will need to add the public key for your user in Jenkins.  This was another one of the confusing parts initially.   The section to add your personal SSH public key was buried in docs that were glossed over, the page can be found here.  The location in Jenkins to add this key is located here,

https://jenkins.example.com/user/<username>/configure

 

public ssh key

If you don’t know where your public SSH key is located you can get it from the following command.

cat ~/.ssh/id_rsa.pub

Just copy that certificate and paste it into The Public SSH Keys in the Jenkins menu.

Now that you have the built in Jenkins Git repo configured you can clone it via the instructions from the above tutorial.

git clone ssh://<jenkins_username>@jenkins.example.com:2222/workflowLibs.git

Notice that we are using port 2222.  This is the port we configured via the SSH plugin from above.  The port number can be any port you like, just make sure to keep it consistent in the configuration and here.

Working with the pipeline

With the pipeline library repo cloned, we can write a simple function, and then add it back in to Jenkins.  By default the repo is called workflowLibs.  The basic structure of the repo is to place your custom functions is the vars directory.  Within the vars directory you will create a .groovy file for the function and a matching .txt file any documentation of the command you want to add.  In our example let’s create a hello function.

Create a hello.groovy and a hello.txt file.  Inside the hello.groovy file, add something similar to the following.

echo ‘Hello world!’

Go ahead and commit the hello.groovy file, don’t worry about the hello.txt file.

git add hello.groovy

git commit -m “Hello world example”

git push (You may need to set your upstream)

Obviously this function won’t do much.  Once you have pushed the new function you should be able to view it in the dropdown of pipeline functions in the Jenkins build configuration screen.

NOTE:  There is a nice little script testing plugin built in as part of the pipeline now called snippet-generator.  If you want to test out small scripts on your Jenkins server first before committing and pushing any local changes you can try out your script first in Jenkins.  To do this open up any of your Jenkins job configurations and then click the link that says “Pipeline syntax” towards the bottom of the page.

Pipeline configuration

Here’s what the snippet generator looks like.

snippet generator

From the snippet-generator you can build out quite a bit of the functionality that you might want to use as part of your pipeline.

Configuring the job

There are a few different options for setting up Jenkins pipelines.  The most common types are the Pipeline or the Multibranch Pipeline.  The Pipeline implements all of the scripting capabilities that have been covered so that the power of the Groovy scripting language can be leveraged in the Jenkins job as well as things like application life cycles (via stages) which makes CI/CD much easier.

The Multibranch Pipeline augments the functionality of the Pipeline by adding in the ability to index branches from SCM so that different branches can be easily built and tested.  The Multibranch Pipeline is especially useful for larger projects that have many developers and feature branches being worked on because each one of the branches can be tracked separately so developers don’t need to worry as much about overlapping with others work.

Taking the pipeline functionality one step further, it is possible to create a Jenkinsfile with all of the needed pipeline code inside of a repo that so that it can be built automatically.  The Jenkinsfile basically is used as a blueprint used to describe the how and what of a project for its build process and can leverage any custom groovy functions that you have written that are on the Jenkins server.

Using a the combination of a GitHub webhook and a Jenkinsfile in a Git repo it is easy to automatically tell your Jenkins server to kick off a build every time a commit or PR happens in GitHub.

Let’s take a look at what an example Jenkinsfile might look like.

node {

stage ‘Checkout’

// Checkout logic goes here

stage ‘Build’

// Build logic goes here

stage ‘Test’

// Test logic goes here

stage ‘Deploy’

// Deploy logic goes here

}

This Jenkinsfile defines various “stages”, which will run through a set of functions described in each stage every time a commit has been pushed or a PR has been opened for a given project.  One workflow, as shown above, is to segment the job into build, test, push and deploy stages.  Different projects might require different stages so it is nice to have granular control of what the job is doing on a per repo basis.

Bonus: Github Webhooks

Configuring webhooks in GitHub is pretty easy as well.  SCM is fairly standard these days for storing source code and there are a number of different Git management tools so the steps should be very similar if using a tool other than GitHub.  Setting up a webhook in GitHub can be configured to trigger a Jenkins pipeline build when either a commit is pushed to a branch, like master, or a PR is created for a branch.  The advantages of using webhooks should be pretty apparent, as builds are created automatically and can be configured to output their results to various different communication channels like email or Slack or a number of other chat tools.  The webhooks are the last step in automating the new pipeline features.

If you haven’t already, you will need to enable the GitHub plugin (https://wiki.jenkins-ci.org/display/JENKINS/GitHub+Plugin) in order to use the GitHub webhooks.  No extra configuration should be needed out of the box after installing the plugin.

To configure the webhook, first make sure there is a Jenkinsfile in the root directory of the project.  After the Jenkinsfile is in place you will need to set up the webhook.  Navigate to the project settings that you would like to create a webhook for, select ‘Settings’ -> ‘Webhooks & services’ .  From here there is a button to add a new webhook.

adding a webhook

Change the Payload URL to point at the Jenkins server, update the Content type to application/x-www-form-urlencoded, and leave the secret section blank.  All the other defaults should be fine.

After adding the webhook, create or update the associated job in Jenkins.  Make sure the new job is configured as either a pipeline or multibranch pipeline type.

pipeline job

In the job configuration point Jenkins at the GitHub URL of the project.

configure the job

Also make sure to select the build trigger to ‘Build when a change is pushed to GitHub’.

more configuration

You may need to configure credentials if you are using private GitHub repos.  This step can be done in Jenkins by navigating to ‘Manage Jenkins’ -> ‘Credentials’ -> ‘Global’.  Then Choose ‘Add Credentials’ and select the SSH key used in conjunction with GitHub.  After the credentials have been set up there should be an option when configuring jobs to use the SSH key to authenticate with GitHub.

Conclusion

Writing the Jenkinsfiles and custom libraries can take a little bit of time initially to get the hang of but are very powerful.  If you already have experience writing Groovy, then writing these functions and files should be fairly straight forward.

The move towards pipelines brings about a number of nice features.  First, you can keep track of your Jenkins job definition simply by adding a Jenkinsfile to a repo, so you get all of the nice benefits of history and version tracking and one central place to keep your build configurations.  Because groovy is such a flexible language, pipelines give developers and engineers more options and creativity in terms of what their build jobs can do.

One gotcha of this process is that there isn’t a great workflow yet for working with the library functions, so there is a lot of trial and error to get custom functionality working correctly.  One good way to debug is to set up a test job and watch for errors in the console output when you trigger a build for it.  The combination of the snippet generator script tester though this process has become much easier.

Another thing that can be tricky is the Groovy sandbox.  It is mostly and annoyance and I would not suggest turning it off, just be aware that it exists and often times needs to be worked around.

There are many more features and things that you can do with the Pipeline so I encourage readers to go out and explore some of the possibilities, the docs linked to above are a good place for exploring many of these features.  As the pipeline matures, more and more plugins are adding the ability to be configured via the pipeline workflow, so if something isn’t possible right now it probably will be very soon.

Happy Jenkins pipelining!

Read More

Bootstrap servers to a Rancher environment

If you’re not familiar already, Rancher is an orchestration and scheduling tool for containers.  I have written a little bit about Rancher in the past but haven’t covered much on the specifics about how to manage a Rancher environment.  One cool thing about Rancher is its “single pane of glass” approach to managing servers and containers, which allows users and admins to quickly and easily manage complicated environments.  In this post I’ll be covering how to quickly and automatically add servers to your Rancher environment.

One of the manual steps that can(and in my opinion should) be automated is the server bootstrapping process.  The Rancher web interface allows users to add hosts across different cloud providers (AWS, Azure, GCE, etc) and importantly the ability to add a custom host.  This custom host registration is the piece that allows us to automate the host addition process by exposing a registration token via the Rancher API.  One important thing to note if you are going to be adding hosts automatically is that you will need to actually create the entries necessary in the environment that you bootstrap servers to.  So for example, if you create a new environment you will either need to programatically hit the API or in the web interface navigate to Infrastructure -> Add Host to populate the necessary tokens and entries.

Once you have populated the API with the values needed, you will need to create an API token to allow the server(s) that are bootstrapping to connect to the Rancher server to add themselves.  If you haven’t done this before, in the environment you’d like to allow access to navigate to API -> Add Environment API Key -> name it and make a note of key that gets generated.

rancher api

That’s pretty much all of the prep work you need to do to your Rancher environment for this method to work.  The next step is to make a script to bootstrap a server when it gets created.  The logic for this bootstrap process can be boiled down to the following snippet.

#!/bin/bash

INTERNAL_IP=$(ip add show eth0 | awk '/inet/ {print $2}' | cut -d/ -f1 | head -1)
SERVER="https://example.com"
TOKEN="access_key:secret_key"
PROJID="unique_environment"
AGENT_VER="v1.0.1"

RANCHER_URL=$(curl -su $TOKEN $SERVER/v1/registrationtokens?projectId=$PROJID | head -1 | grep -nhoe 'registrationUrl[^},]*}' | egrep -hoe 'https?:.*[^"}]')

docker run \
  -e CATTLE_AGENT_IP=$INTERNAL_IP \
  -e CATTLE_HOST_LABELS='your=label' \
  -d --privileged --name rancher-bootstrap \
  -v /var/run/docker.sock:/var/run/docker.sock \
  rancher/agent:$AGENT_VER $RANCHER_URL

The script is pretty straight forward.  It attempts to gather the internal IP address of the server being created, so that it can add it to the Rancher environment with a unique name.  Note that there are a number of variables that need to get set to reflect.   One that uses the DNS name of the Rancher server, one for the token that was generated in the step above and one for the project ID, which can be found by navigating to the Environment and then looking at the URL for /env/xxxx.

After we have all the needed information and updated the script, we can curl the Rancher server (this won’t work if you didn’t populate the API in the steps above or if your keys are invalide) with the registration token.  Finally, start a docker container with the agent version set (check your Rancher server version and match to that) along with the URL obtained from the curl command.

The final step is to get the script to run when the server is provisioned.  There are many ways to do this and this step will vary depending a number of different factors,  but in this post I am using Cloud-init for CoreOS on AWS.  Cloud-init is used to inject the script into the server and then create a systemd service to run the script the first time the server boots and use the result of the script to run the Rancher agent which allows the server to be picked up by the Rancher server and its environment.

Here is the logic to run the script when the server is booted.

coreos:

  units:
  - name: rancher-agent.service
    command: start
    content: |
      [Unit]
      Description=Rancher Agent
      After=docker.service
      Requires=docker.service
      After=network-online.target
      Requires=network-online.target

      [Service]
      Type=oneshot
      RemainAfterExit=yes
      ExecStart=/etc/rancher-agent

The full version of the cloud-init file can be found here.

After you provision your server and give it a minute to warm up and run the script, check your Rancher environment to see if your server has popped up.  If it hasn’t, the first place to start looking is on the server itself that was just created.  Run docker logs -f rancher-agent to get information about what went wrong.  Usually the problem is pretty obvious.

A brand new server looks something like this.

bootstrapped server

I typically use Terraform to provision these servers but I feel like covering Terraform here is a little bit out of scope.  You can image some really interesting possibilities with auto scaling groups and load balancers that can come and go as your environment changes, one of the beauties of disposable infrastructure as well as infrastructure as code.

If you are interested in seeing how this Rancher bootstrap process fits in with Terraform let me know and I’ll take a stab at writing up a little piece on how to get it working.

Read More

Enable SSL for your WordPress blog

With the announcement of the public beta of the Let’s Encrypt project, it is now nearly trivial to get your site set up with an SSL certificate.  One of the best parts about the Let’s Encrypt project is that it is totally free, so there is pretty much no reason to protect your blog set up with an SSL certificate.  The other nice part of Let’s Encrypt is that it is very easy to get your certificate issued.

The first step to get started is grabbing the latest source code from GitHub for the project.  Log on to your WordPress server (I’m running Ubuntu) and clone the repo.  Make sure to install git if you haven’t already.

git clone https://github.com/letsencrypt/letsencrypt.git

There is a shell script you can run to pretty much do everything for you, including installation of any packages and libraries it needs as well as configures paths and other components it needs to work.

cd letsencrypt
./letsencrypt-auto

After the bootstrap is done there should be some CLI options.  Since I am using Apache for my blog I will use the “–apache” option.

./letsencrypt-auto --apache

There will be some prompts you need to go through for setting up the certificates and account creation.

let's encrypt

 

 

 

 

 

This process is still somewhat error prone, so if you make a typo you can just rerun the “./letsencrypt-auto” command and follow the prompts.

The certificates will be dropped in to /etc/letsencrypt/live/<website>.  Go double check them if needed.

This process will also generate a new apache configuration file for you to use.  You can check for the file in /etc/apache2/site-enabled.  The import part of this config should look similar to the following:

<VirtualHost *:443>
  UseCanonicalName Off
  ServerAdmin webmaster@localhost
  DocumentRoot /var/www/wordpress
  SSLCertificateFile /etc/letsencrypt/live/thepracticalsysadmin.com/cert.pem
  SSLCertificateKeyFile /etc/letsencrypt/live/thepracticalsysadmin.com/privkey.pem
  Include /etc/letsencrypt/options-ssl-apache.conf
  SSLCertificateChainFile /etc/letsencrypt/live/thepracticalsysadmin.com/chain.pem
</VirtualHost>

As a side note, you will probably want to redirect non https requests to use the encrypted connection.  This is easy enough to do, just go find your .htaccess file (mine was in /var/www/wordpress/.htaccess) and add the following rules.

<IfModule mod_rewrite.c>
  RewriteEngine On
  RewriteCond %{SERVER_PORT} 80
  RewriteRule ^(.*)$ https://example.com/$1 [R,L]
</IfModule>

Before we restart Apache with the new configuration let’s run a quick configtest to make sure it all works as expected.

apachectl configtest

If everything looks okay in the configtest then you can reload or restart apache.

service apache2 restart

Now when you visit your site you should get the nice shiny green lock icon on the address bar.  It is important to remember that the certificates issued by the Let’s Encrypt project are valid for 90 days so you will need to make sure to keep up to date and generate new certificates every so often.  The Let’s Encrypt folks are working on automating this process but for now you will need to manually generate new certificates and reload your web server.

let's encrypt

 

 

 

 

 

 

 

 

 

 

 

 

 

 

That’s it.  Your site should now be functioning with SSL.

Updating the certificate automatically

To take this process one step further We can make a script that can be run via cron (or manually) to update the certificate.

Here’s what the script looks like.

#!/usr/bin/env bash

dir="/etc/letsencrypt/live/example.com"
acme_server="https://acme-v01.api.letsencrypt.org/directory"
domain="example.com"
https="--standalone-supported-challenges tls-sni-01"

# Using webroot method
#/root/letsencrypt/letsencrypt-auto --renew certonly --server $acme_server -a webroot --webroot-path=$dir -d $domain --agree-tos

# Using standalone method
service apache2 stop
/root/letsencrypt/letsencrypt-auto --renew certonly --standalone $https -d $domain --agree-tos
service apache start

Notice that I have the “webroot” method commented out.  I run a service (Varnish) on port 80 that proxies traffic but also interferes with LE so I chose to run the standalone renewal method.  It is pretty easy, the main difference is that you need to turn off Apache before you run it since Apache binds to to ports 80/443.  But the downtime is okay in my case.

I chose to put the script in to a cron job and have it run every 45 days so that I don’t have to worry about logging on to my server to regenerate the certifcate.

This is a straight forward process and will help with your search engine juices as well.

Read More

Set up SSL for Rancher Server

One issue you will probably run across if you start to use Rancher to manage your Docker containers is that it doesn’t serve pages over an encrypted connection by default.  If you are looking to put Rancher in to a production scenario, it is a good idea to serve encrypted pages.  HA is another topic, but at this point I have not attempted to set it up yet because it is a much more complicated process currently.  The Rancher folks are working on making HA easier in the near future (if you know an easy way to do it I would love to hear about it).  I would argue though that if you can set up SSL for your Rancher server you are over half way to a full production set up.

The process of getting Rancher to proxy through an encrypted connection is straight forward, assuming you already have some certs to use.  If you don’t already have any official certificates issued *I think* you should be okay with self signed certs, but you won’t get that green lock that everybody loves.  Definitely if you are just testing this set up you should be fine to start out with some self signed certs.  Here is a reference for creating some certs for Nginx to test with.

Another important thing to be aware of is that these instructions are specific to the Nginx method outline above.  I have not tried the Apache method, though I would guess it should be very easy to adapt.

Take a look at the Rancher docs as a starting point for getting started, they are very good and will get you most of the way there.  However, when I went through this process there were a few pieces of information that I had to piece together myself, which is the bulk of what I will be sharing today.

The first step is to adapt the configuration in the docs in to a full Nginx config that can be dropped in to the official Nginx image from Dockerhub.  Here is the config I used.

upstream rancher {
    server rancher-server:8080;
}

server {
    listen 443 ssl;
    server_name test.com;
    ssl_certificate /etc/rancher/test.com.crt;
    ssl_certificate_key /etc/rancher/test.com.key;

    access_log /var/log/nginx/access.log;
    error_log  /var/log/nginx/error.log;

    location / {
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Forwarded-Port $server_port;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_pass http://rancher;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        # This allows the ability for the execute shell window to remain open for up to 15 minutes. Without this parameter, the default is 1 minute and will automatically close.
        proxy_read_timeout 900s;
    }
}

server {
    listen 80;
    server_name test.com;
    return 301 https://$server_name$request_uri;
}

There are a few important things to note about this config.   One is naming the upstream the same name as what the rancher server container is named, in this case rancher-server.

Note that I have used test.com as the server name and so the certs and names are all reflective of that value.  Obviously that will need to be updated with your own values.

Finally, we have added an additional logging section to the config that will pipe the logs to stdout/stderr so we can easily look at the requests from the host OS via the “docker logs” command.

To get the following Docker run command to work correctly you will want to create a directory called /etc/rancher or something easy to remember, and place this config (named as rancher-nginx.conf), along with the certs you have created in to this location.  Alternately you can modify the Docker run command and simply have the volume mounts pointed at the location you store the configuration and certs.  For me, it makes the most sense to group these items together in /etc/rancher.

docker run -d --restart=always --name nginx 
    -v /etc/rancher/rancher-nginx.conf:/etc/nginx/conf.d/default.conf
    -v /etc/rancher/test.com.crt:/etc/rancher/test.com.crt
    -v /etc/rancher/test.com.key:/etc/rancher/test.com.key
    -p 80:80 -p 443:443 --link=rancher-server nginx

This will mount in the correct configuration and certificates to the Nginx docker container, expose port 80 and 443 for web traffic (make sure to adjust any firewall rules you need to get traffic to pass through these ports), and link to the rancher-server container so that the traffic can be proxied.

Additionally, you will need to update any reference to the old address that was using http://<rancher-name>:8080/ to point to https://<rancher-name>/.  Namely the host registration configuration in the Rancher server, but if you were relying on any other outside tools to hit that endpoint they will also need to be updated to use https instead.

Read More

ECS cluster turnup with CoreOS and Terraform

Recently I have been evaluating different container clustering tools and technologies.  It has been a fun experience thus far, the tools and community being built around Docker have come a long time since I last looked.  So for today’s post I’d like to go over ECS a little bit.

ECS is essentially the AWS version of container management.  ECS takes care of managing your Docker (container) infrastructure by handling creation, management, destruction and scheduling as well as providing API integration with other AWS services, which is really powerful.  To get ECS up and running all you need to do is create an ECS cluster, either from the AWS console or from some other AWS integration like the CLI or Terraform, then install the agent on servers that you would like ECS to schedule work on.  After setting up the agent and cluster name you are basically ready to go, start by creating a task and then create a service to start running containers on the cluster.  Some cool new features have been announced at this years re:Invent conference but I haven’t had a chance yet to look at them yet.

First impression of ECS

The best part about testing ECS by far has been how easy it is to get set up and running.  It took less than 20 minutes to go from nothing to fully functioning cluster that was scheduling containers to hosts and receiving load.  I think the most powerful aspect of ECS is its integration with other AWS services.  For example, if you need to attach containers/services to a load balancer, the AWS infrastructure is already there so the different pieces of the infrastructure really mesh well together.

The biggest downside so far is that the ECS console interface is still clunky.  It is functional, and I have been able to use it to do everything I have needed but it just feels like it needs some polish and things are nested in menu’s and usually not easy to find.  I’m sure there are plans to improve the interface and as mentioned above some new features were recently announced, so I have a feeling there will be some nice improvements on the way.

I haven’t tried the CLI tool yet but it looks promising for automating containers and services.

Setting things up

Since I am a big fan of CoreOS I decided to try turning up my ECS cluster using CoreOS as the base OS and Terraform to do the heavy lifting and provisioning.

The first step is to create your cluster.  I noticed in the AWS console there was a configuration wizard that guides you through your first cluster which was annoying because there wasn’t a clean way to just create the cluster.  So you will need to follow the on screen instructions for getting your first environment set up.  If any of this is unclear there is a good guide for getting started with ECS here.

After your cluster has been created there is a menu that shows your ECS environments.

ECS cluster menu

 

 

 

 

 

 

 

 

 

 

Next, you will need to turn on the nodes that will be connecting to this cluster.  The first part of this is to get your cloud-config set up to connect to the cluster.  I used the CoreOS docs to set up the ECS agent, making sure to change the ECS_CLUSTER= section in the config.

#cloud-config

coreos:
  units:
  -
  name: amazon-ecs-agent.service
  command: start
  runtime: true
  content: |
  [Unit]
  Description=Amazon ECS Agent
  After=docker.service
  Requires=docker.service
  Requires=network-online.target
  After=network-online.target

  [Service]
  Environment=ECS_CLUSTER=my-cluster
  Environment=ECS_LOGLEVEL=warn
  Environment=ECS_CHECKPOINT=true
  ExecStartPre=-/usr/bin/docker kill ecs-agent
  ExecStartPre=-/usr/bin/docker rm ecs-agent
  ExecStartPre=/usr/bin/docker pull amazon/amazon-ecs-agent
  ExecStart=/usr/bin/docker run --name ecs-agent --env=ECS_CLUSTER=${ECS_CLUSTER} --env=ECS_LOGLEVEL=${ECS_LOGLEVEL} --env=ECS_CHECKPOINT=${ECS_CHECKPOINT} --publish=127.0.0.1:51678:51678 --volume=/var/run/docker.sock:/var/run/docker.sock --volume=/var/lib/aws/ecs:/data amazon/amazon-ecs-agent
  ExecStop=/usr/bin/docker stop ecs-agent

Note that the Environment=ECS_CLUSTER=my-cluster, this is the most important bit to get the server to check in to your cluster, assuming you named it “my-cluster”.  Feel free to add any other values your infrastructure may need.  Once you have the config how you want it, run it through the CoreOS cloud-config validator to make sure it checks out.  If everything looks okay there, your cloud-config should be ready to go.

You can find more info about how to configure the ECS agent in the docs here.

Once you have your cloud-config in order, you will need to get your Terraform “recipe” set up.  I used this awesome github project as the base for my own project.  The Terraform logic from there basically creates an AWS launch config and autoscaling group (and uses the cloud-config from above) to launch instances in to your cluster.  And the ECS agent takes care of the rest, once your servers are up and the agent is reporting in to the cluster.

launch_config.tf

resource "aws_launch_configuration" "ecs" {
  name = "ECS ${var.cluster_name}"
  image_id = "${var.ami}"
  instance_type = "${var.instance_type}"
  iam_instance_profile = "${var.iam_instance_profile}"
  key_name = "${var.key_name}"
  security_groups = ["${split(",", var.security_group_ids)}"]
  user_data = "${file("../cloud-config/ecs.yml")}"

  root_block_device = {
    volume_type = "gp2"
    volume_size = "40"
  }
}

Notice the user_data section.  This is where we inject the cloud config from above to provision CoreOS and launch the ECS agent.

autoscaler.tf

resource "aws_autoscaling_group" "ecs-cluster" {
  availability_zones = ["${split(",", var.availability_zones)}"]
  vpc_zone_identifier = ["${split(",", var.subnet_ids)}"]
  name = "ECS ${var.cluster_name}"
  min_size = "${var.min_size}"
  max_size = "${var.max_size}"
  desired_capacity = "${var.desired_capacity}"
  health_check_type = "EC2"
  launch_configuration = "${aws_launch_configuration.ecs.name}"
  health_check_grace_period = "${var.health_check_grace_period}"

  tag {
    key = "Env"
    value = "${var.environment_name}"
    propagate_at_launch = true
  }

  tag {
    key = "Name"
    value = "ECS ${var.cluster_name}"
    propagate_at_launch = true
  }
}

There are a few caveats I’d like to highlight with this approach.  First, I already have an AWS infrastructure in place that I was testing agains this.  So I didn’t have to do any of the extra work to create a VPC, or a gateway for the VPC.  I didn’t have to create the security groups and subnets either, I just added them to the Terraform code.

The other caveat is that if you want to use the Github project I linked to you will need to make sure that you populate the variables with your own environment specific values.  That is why having the VPC, subnets and security groups was handy for me.  Be sure to browse through the variables.tf file and substitute in your own values.  As an example,  I had to update the variables to use the CoreOS 766.4.0 image.  This AMI will be specific to your AWS region so make sure to look up the AMI first.

variable "ami" {
  /* CoreOS 766.4.0 */
  default = "ami-dbe71d9f"
  description = "AMI id to launch, must be in the region specified by the region variable"
}

Another part I had to modify to get the Github project to work was adding in my AWS credentials which look similar to the following.  Make sure to update these variables with your ID and secret.

provider "aws" {
  access_key = "${var.access_key}"
  secret_key = "${var.secret_key}"
  region = "${var.region}"
}

variable "access_key" {
  description = "AWS access key"
  default = "XXX"
}

variable "secret_key" {
  description = "AWS secret access key"
  default = "xxx"
}

Make sure to also copy/edit the autoscaling.tf and launch_config.tf files to reflect anything that is specific to your environment (Terraform will complain if there are issues).

After you have combed through the variables.tf and updated the Terraform files to your liking you can simply run terraform plan -input=false and see how Terraform will create the ASG for you.

If everything looks good, you can run terrafrom apply -input=false and Terraform will go out and start building your new ECS infrastructure for you.  After a few minutes check the EC2 console and your launch config and autoscaling group should be in there.  If that stuff all looks okay, check the ECS console and your new servers should show up and be ready to go to work for you!

NOTE: If you are starting from scratch, it is possible to do all of the infrastructure provisioning via Terraform but it is too far out of the scope of this post to cover because there are a lot of steps to it.

Read More