Enable SSL for your WordPress blog

December 5, 2015November 18, 2016 Josh Reichardt Leave a comment

Updated: 11/18/16

The Let’s Encrypt client was recently renamed to “certbot”. I have updated the post to use the correct name but if I miss something use certbot or let me know.

With the announcement of the public beta of the Let’s Encrypt project, it is now nearly trivial to get your site set up with an SSL certificate. One of the best parts about the Let’s Encrypt project is that it is totally free, so there is pretty much no reason to protect your blog set up with an SSL certificate. The other nice part of Let’s Encrypt is that it is very easy to get your certificate issued.

The first step to get started is grabbing the latest source code from GitHub for the project. Log on to your WordPress server (I’m running Ubuntu) and clone the repo. Make sure to install git if you haven’t already.

git clone https://github.com/letsencrypt/certbot.git

There is a shell script you can run to pretty much do everything for you, including installation of any packages and libraries it needs as well as configures paths and other components it needs to work.

cd certbot
./certbot-auto

After the bootstrap is done there should be some CLI options. Run the command with the -h flag to print out help.

./certbot-auto -h

Since I am using Apache for my blog I will use the “–apache” option.

./certbot-auto --apache

There will be some prompts you need to go through for setting up the certificates and account creation.

This process is still somewhat error prone, so if you make a typo you can just rerun the “./letsencrypt-auto” command and follow the prompts.

The certificates will be dropped in to /etc/letsencrypt/live/<website>. Go double check them if needed.

This process will also generate a new apache configuration file for you to use. You can check for the file in /etc/apache2/site-enabled. The import part of this config should look similar to the following:

<VirtualHost *:443>
  UseCanonicalName Off
  ServerAdmin webmaster@localhost
  DocumentRoot /var/www/wordpress
  SSLCertificateFile /etc/letsencrypt/live/thepracticalsysadmin.com/cert.pem
  SSLCertificateKeyFile /etc/letsencrypt/live/thepracticalsysadmin.com/privkey.pem
  Include /etc/letsencrypt/options-ssl-apache.conf
  SSLCertificateChainFile /etc/letsencrypt/live/thepracticalsysadmin.com/chain.pem
</VirtualHost>

As a side note, you will probably want to redirect non https requests to use the encrypted connection. This is easy enough to do, just go find your .htaccess file (mine was in /var/www/wordpress/.htaccess) and add the following rules.

<IfModule mod_rewrite.c>
  RewriteEngine On
  RewriteCond %{SERVER_PORT} 80
  RewriteRule ^(.*)$ https://example.com/$1 [R,L]
</IfModule>

Before we restart Apache with the new configuration let’s run a quick configtest to make sure it all works as expected.

apachectl configtest

If everything looks okay in the configtest then you can reload or restart apache.

service apache2 restart

Now when you visit your site you should get the nice shiny green lock icon on the address bar. It is important to remember that the certificates issued by the Let’s Encrypt project are valid for 90 days so you will need to make sure to keep up to date and generate new certificates every so often. The Let’s Encrypt folks are working on automating this process but for now you will need to manually generate new certificates and reload your web server.

That’s it. Your site should now be functioning with SSL.

Updating the certificate automatically

To take this process one step further We can make a script that can be run via cron (or manually) to update the certificate.

Here’s what the script looks like.

#!/usr/bin/env bash

dir="/etc/letsencrypt/live/example.com"
acme_server="https://acme-v01.api.letsencrypt.org/directory"
domain="example.com"
https="--standalone-supported-challenges tls-sni-01"

# Using webroot method
#/root/letsencrypt/certbot-auto --renew certonly --server $acme_server -a webroot --webroot-path=$dir -d $domain --agree-tos

# Using standalone method
service apache2 stop
# Previously you had to specify options to renew the cert but this has been deprecated
#/root/letsencrypt/certbot-auto --renew certonly --standalone $https -d $domain --agree-tos
# In newer versions you can just use the renew command
/root/letsencrypt/certbot-auto renew --quiet
service apache start

Notice that I have the “webroot” method commented out. I run a service (Varnish) on port 80 that proxies traffic but also interferes with LE so I chose to run the standalone renewal method. It is pretty easy, the main difference is that you need to turn off Apache before you run it since Apache binds to to ports 80/443. But the downtime is okay in my case.

I chose to put the script in to a cron job and have it run every 45 days so that I don’t have to worry about logging on to my server to regenerate the certificate. Here’s what a sample crontab for this job might look like.

0 0 */45 * * /root/renew_cert.sh

This is a straight forward process and will help with your search engine juices as well.

Set up SSL for Rancher Server

December 1, 2015February 22, 2019 Josh Reichardt Leave a comment

One issue you will probably run across if you start to use Rancher to manage your Docker containers is that it doesn’t serve pages over an encrypted connection by default. If you are looking to put Rancher in to a production scenario, it is a good idea to serve encrypted pages. HA is another topic, but at this point I have not attempted to set it up yet because it is a much more complicated process currently. The Rancher folks are working on making HA easier in the near future (if you know an easy way to do it I would love to hear about it). I would argue though that if you can set up SSL for your Rancher server you are over half way to a full production set up.

The process of getting Rancher to proxy through an encrypted connection is straight forward, assuming you already have some certs to use. If you don’t already have any official certificates issued *I think* you should be okay with self signed certs, but you won’t get that green lock that everybody loves. Definitely if you are just testing this set up you should be fine to start out with some self signed certs. Here is a reference for creating some certs for Nginx to test with.

Another important thing to be aware of is that these instructions are specific to the Nginx method outline above. I have not tried the Apache method, though I would guess it should be very easy to adapt.

Take a look at the Rancher docs as a starting point for getting started, they are very good and will get you most of the way there. However, when I went through this process there were a few pieces of information that I had to piece together myself, which is the bulk of what I will be sharing today.

The first step is to adapt the configuration in the docs in to a full Nginx config that can be dropped in to the official Nginx image from Dockerhub. Here is the config I used.

upstream rancher {
    server rancher-server:8080;
}

server {
    listen 443 ssl;
    server_name test.com;
    ssl_certificate /etc/rancher/test.com.crt;
    ssl_certificate_key /etc/rancher/test.com.key;

    access_log /var/log/nginx/access.log;
    error_log  /var/log/nginx/error.log;

    location / {
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Forwarded-Port $server_port;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_pass http://rancher;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        # This allows the ability for the execute shell window to remain open for up to 15 minutes. Without this parameter, the default is 1 minute and will automatically close.
        proxy_read_timeout 900s;
    }
}

server {
    listen 80;
    server_name test.com;
    return 301 https://$server_name$request_uri;
}

There are a few important things to note about this config. One is naming the upstream the same name as what the rancher server container is named, in this case rancher-server.

Note that I have used test.com as the server name and so the certs and names are all reflective of that value. Obviously that will need to be updated with your own values.

Finally, we have added an additional logging section to the config that will pipe the logs to stdout/stderr so we can easily look at the requests from the host OS via the “docker logs” command.

To get the following Docker run command to work correctly you will want to create a directory called /etc/rancher or something easy to remember, and place this config (named as rancher-nginx.conf), along with the certs you have created in to this location. Alternately you can modify the Docker run command and simply have the volume mounts pointed at the location you store the configuration and certs. For me, it makes the most sense to group these items together in /etc/rancher.

docker run -d --restart=always --name nginx 
    -v /etc/rancher/rancher-nginx.conf:/etc/nginx/conf.d/default.conf
    -v /etc/rancher/test.com.crt:/etc/rancher/test.com.crt
    -v /etc/rancher/test.com.key:/etc/rancher/test.com.key
    -p 80:80 -p 443:443 --link=rancher-server nginx

This will mount in the correct configuration and certificates to the Nginx docker container, expose port 80 and 443 for web traffic (make sure to adjust any firewall rules you need to get traffic to pass through these ports), and link to the rancher-server container so that the traffic can be proxied.

Additionally, you will need to update any reference to the old address that was using http://<rancher-name>:8080/ to point to https://<rancher-name>/. Namely the host registration configuration in the Rancher server, but if you were relying on any other outside tools to hit that endpoint they will also need to be updated to use https instead.

ECS cluster turnup with CoreOS and Terraform

October 9, 2015December 5, 2015 Josh Reichardt 6 Comments

Recently I have been evaluating different container clustering tools and technologies. It has been a fun experience thus far, the tools and community being built around Docker have come a long time since I last looked. So for today’s post I’d like to go over ECS a little bit.

ECS is essentially the AWS version of container management. ECS takes care of managing your Docker (container) infrastructure by handling creation, management, destruction and scheduling as well as providing API integration with other AWS services, which is really powerful. To get ECS up and running all you need to do is create an ECS cluster, either from the AWS console or from some other AWS integration like the CLI or Terraform, then install the agent on servers that you would like ECS to schedule work on. After setting up the agent and cluster name you are basically ready to go, start by creating a task and then create a service to start running containers on the cluster. Some cool new features have been announced at this years re:Invent conference but I haven’t had a chance yet to look at them yet.

First impression of ECS

The best part about testing ECS by far has been how easy it is to get set up and running. It took less than 20 minutes to go from nothing to fully functioning cluster that was scheduling containers to hosts and receiving load. I think the most powerful aspect of ECS is its integration with other AWS services. For example, if you need to attach containers/services to a load balancer, the AWS infrastructure is already there so the different pieces of the infrastructure really mesh well together.

The biggest downside so far is that the ECS console interface is still clunky. It is functional, and I have been able to use it to do everything I have needed but it just feels like it needs some polish and things are nested in menu’s and usually not easy to find. I’m sure there are plans to improve the interface and as mentioned above some new features were recently announced, so I have a feeling there will be some nice improvements on the way.

I haven’t tried the CLI tool yet but it looks promising for automating containers and services.

Setting things up

Since I am a big fan of CoreOS I decided to try turning up my ECS cluster using CoreOS as the base OS and Terraform to do the heavy lifting and provisioning.

The first step is to create your cluster. I noticed in the AWS console there was a configuration wizard that guides you through your first cluster which was annoying because there wasn’t a clean way to just create the cluster. So you will need to follow the on screen instructions for getting your first environment set up. If any of this is unclear there is a good guide for getting started with ECS here.

After your cluster has been created there is a menu that shows your ECS environments.

Next, you will need to turn on the nodes that will be connecting to this cluster. The first part of this is to get your cloud-config set up to connect to the cluster. I used the CoreOS docs to set up the ECS agent, making sure to change the ECS_CLUSTER= section in the config.

#cloud-config

coreos:
  units:
  -
  name: amazon-ecs-agent.service
  command: start
  runtime: true
  content: |
  [Unit]
  Description=Amazon ECS Agent
  After=docker.service
  Requires=docker.service
  Requires=network-online.target
  After=network-online.target

  [Service]
  Environment=ECS_CLUSTER=my-cluster
  Environment=ECS_LOGLEVEL=warn
  Environment=ECS_CHECKPOINT=true
  ExecStartPre=-/usr/bin/docker kill ecs-agent
  ExecStartPre=-/usr/bin/docker rm ecs-agent
  ExecStartPre=/usr/bin/docker pull amazon/amazon-ecs-agent
  ExecStart=/usr/bin/docker run --name ecs-agent --env=ECS_CLUSTER=${ECS_CLUSTER} --env=ECS_LOGLEVEL=${ECS_LOGLEVEL} --env=ECS_CHECKPOINT=${ECS_CHECKPOINT} --publish=127.0.0.1:51678:51678 --volume=/var/run/docker.sock:/var/run/docker.sock --volume=/var/lib/aws/ecs:/data amazon/amazon-ecs-agent
  ExecStop=/usr/bin/docker stop ecs-agent

Note that the Environment=ECS_CLUSTER=my-cluster, this is the most important bit to get the server to check in to your cluster, assuming you named it “my-cluster”. Feel free to add any other values your infrastructure may need. Once you have the config how you want it, run it through the CoreOS cloud-config validator to make sure it checks out. If everything looks okay there, your cloud-config should be ready to go.

You can find more info about how to configure the ECS agent in the docs here.

Once you have your cloud-config in order, you will need to get your Terraform “recipe” set up. I used this awesome github project as the base for my own project. The Terraform logic from there basically creates an AWS launch config and autoscaling group (and uses the cloud-config from above) to launch instances in to your cluster. And the ECS agent takes care of the rest, once your servers are up and the agent is reporting in to the cluster.

launch_config.tf

resource "aws_launch_configuration" "ecs" {
  name = "ECS ${var.cluster_name}"
  image_id = "${var.ami}"
  instance_type = "${var.instance_type}"
  iam_instance_profile = "${var.iam_instance_profile}"
  key_name = "${var.key_name}"
  security_groups = ["${split(",", var.security_group_ids)}"]
  user_data = "${file("../cloud-config/ecs.yml")}"

  root_block_device = {
    volume_type = "gp2"
    volume_size = "40"
  }
}

Notice the user_data section. This is where we inject the cloud config from above to provision CoreOS and launch the ECS agent.

autoscaler.tf

resource "aws_autoscaling_group" "ecs-cluster" {
  availability_zones = ["${split(",", var.availability_zones)}"]
  vpc_zone_identifier = ["${split(",", var.subnet_ids)}"]
  name = "ECS ${var.cluster_name}"
  min_size = "${var.min_size}"
  max_size = "${var.max_size}"
  desired_capacity = "${var.desired_capacity}"
  health_check_type = "EC2"
  launch_configuration = "${aws_launch_configuration.ecs.name}"
  health_check_grace_period = "${var.health_check_grace_period}"

  tag {
    key = "Env"
    value = "${var.environment_name}"
    propagate_at_launch = true
  }

  tag {
    key = "Name"
    value = "ECS ${var.cluster_name}"
    propagate_at_launch = true
  }
}

There are a few caveats I’d like to highlight with this approach. First, I already have an AWS infrastructure in place that I was testing agains this. So I didn’t have to do any of the extra work to create a VPC, or a gateway for the VPC. I didn’t have to create the security groups and subnets either, I just added them to the Terraform code.

The other caveat is that if you want to use the Github project I linked to you will need to make sure that you populate the variables with your own environment specific values. That is why having the VPC, subnets and security groups was handy for me. Be sure to browse through the variables.tf file and substitute in your own values. As an example, I had to update the variables to use the CoreOS 766.4.0 image. This AMI will be specific to your AWS region so make sure to look up the AMI first.

variable "ami" {
  /* CoreOS 766.4.0 */
  default = "ami-dbe71d9f"
  description = "AMI id to launch, must be in the region specified by the region variable"
}

Another part I had to modify to get the Github project to work was adding in my AWS credentials which look similar to the following. Make sure to update these variables with your ID and secret.

provider "aws" {
  access_key = "${var.access_key}"
  secret_key = "${var.secret_key}"
  region = "${var.region}"
}

variable "access_key" {
  description = "AWS access key"
  default = "XXX"
}

variable "secret_key" {
  description = "AWS secret access key"
  default = "xxx"
}

Make sure to also copy/edit the autoscaling.tf and launch_config.tf files to reflect anything that is specific to your environment (Terraform will complain if there are issues).

After you have combed through the variables.tf and updated the Terraform files to your liking you can simply run terraform plan -input=false and see how Terraform will create the ASG for you.

If everything looks good, you can run terrafrom apply -input=false and Terraform will go out and start building your new ECS infrastructure for you. After a few minutes check the EC2 console and your launch config and autoscaling group should be in there. If that stuff all looks okay, check the ECS console and your new servers should show up and be ready to go to work for you!

NOTE: If you are starting from scratch, it is possible to do all of the infrastructure provisioning via Terraform but it is too far out of the scope of this post to cover because there are a lot of steps to it.

Thoughts on Working Remotely

September 24, 2015 Josh Reichardt 5 Comments

I’d like to share a few nuggets that I have learned so far in my experience working as a remote employee. I have been working from home for around a year and a half and have learned some lessons in my experience thus far. While I absolutely recommend trying the remote option if possible, there are a few things that are important to know.

Working remotely is definitely not for everybody. In order to be an effective remote employee you have to have a certain amount of discipline, internal drive and self motivation. Additionally, you need to be a good communicator (covered below). If you have trouble staying on task or finding things to do at work or even have issues working by yourself in an isolated environment, you will quickly discover that working remotely may be more stressful than working in an office where you get the daily interactions and guidance from others.

Benefits

That being said, I feel that in most cases, the positives outweigh the negatives. Below are a few of the biggest benefits that I have discovered.

There is little to no commute. Long commutes, especially in big cities create a certain amount of stress and strain that you simply don’t have to deal with when working from home. As a bonus you save some cash on gas and miles of wear and tear on your vehicle.
It is easier to avoid distractions. This of course depends on how you handle your work but if you are disciplined it becomes much easier to get work done with less distractions. At home, if you manage to separate home from work (more on that topic below) then you don’t need to worry about random people stopping over to your desk to shoot the shit or bother you. By avoiding simple distractions you can become much more productive in shorter periods of time.
No dress code. This is a surprisingly simple but powerful bonus to working from home. Having a criteria for dress code was actually stressful for me in previous jobs. I always disagreed with having a dress code and didn’t understand why I couldn’t wear a t-shirt and jeans to work. Now that I can wear whatever I want I feel more comfortable and more relaxed which leads to better productivity.
Schedule can be more flexible. I can pick my own hours for the most part. Obviously it is best to get in to a routine of working the same hours each day but if something comes up I can step out for a few hours and just make the hours up in the evening most in most cases and it won’t be a big deal to coworkers. This flexibility is a great perk to working remotely and it allows you much more time to yourself when needed because you aren’t restricted to a set schedule.

Achieving a Work/life balance

Maintaining a balance between life at home and life at work can get very blurry when working from home as a telecommuter. I would argue that finding a balance between personal life and work is the number one most important thing to work towards when making the transition from an on site employment because it directly leads to your happiness (or sorrow), which in turn influences all other aspects of your life, including activities and relationships outside of work.

It is super easy to get in to the habit of “always being around” and working extra and often time crazy hours when you are at home. One thing that has helped in my own experience to improve the work/life balance and alleviate this always working thing is by creating routines.

I try to start work and end work at the same time of the day each day during the week. Likewise, I make a point to take breaks throughout the day to break up the time. A few things I like to do are take a 30ish minute walk around the same time every day and I also have a coffee ritual in the morning that always precedes work time. These daily cues help me get in to the flow of the day and to get my day started the same way every day.

Another mechanism I have discovered to help cope with the work hours is to leave work at work. Find a way to create clear distinctions between home and work, either by creating an office at home where work stays or consider finding a coffee shop or co-working space. As a side note, I have found 2-3 days working at a coffee shop/co-working space to be the best middle ground for me, but everybody is different so if you are new to remote work you will need to experiment. That way you can have a place that represents what a workplace should be, and you when you leave that place, the work stays there. It is very important to separate home from work if you don’t have a clear distinction between the two.

Some folks mention that it can get lonely. I definitely agree with this sentiment. On the up side, working in this type of environment can sort of force you to find ways to interact with people. It can feel uncomfortable at first, but finding social activities will help alleviate the loneliness. Coffee shops and co-working spaces are a great place to start. I find that working in an environment with others helps mix things up and having the extra interaction really helps feeling like you are a part of a community. These environments are a great solution if you are introverted and have a hard time getting out and meeting people.

Regardless of what exactly you do, it is absolutely critical to get out of your house. This should be a no brainer but I can’t stress the importance enough. Even if you’re just taking walks or going to the store, you need to make sure that you find things to do to get out of the house. I have found some things that work but it is something again that you will need to experiment with.

If you are ambitious then I suggest getting involved in some other communities outside of work. Meeting new people (outside of a work environment) is a very powerful tool in managing your work/life balance. Obviously this advice works as well in more scenario’s than working remotely but I think it becomes much more important. If you want some ideas for ways to get out or communities to join, feel free to email or comment and I can let you know what has worked for me.

Communicate

Another important piece of the social aspect that I have discovered is that it is VERY important to have many open communication channels with coworkers. Google Hangouts, Slack, Screenhero, WebEx, Skype, email, IRC and any other collaboration tools you can find are super important for communicating with coworkers and for building relationships and culture in distributed work environments. In my experience, if you are working as part of a team and aren’t a great communicator, relationships with coworkers can quickly become strained.

Also, having regular meetings with key members of your team is important. A nice once a week check in with any managers is a good starting point. It helps you keep track of what you’re doing and it helps others on your team understand the type of work you’re doing so you’re not as isolated. Gaining the trust of your coworkers is always very important.

Conclusion

The most difficult balance to achieve when transitioning to a work from home opportunity for me, was maintaining a good work/life balance. You are 100% responsible for how you choose to spend your time so it becomes important to make the right decisions when it comes to how to prioritize.

For example, one thing I have struggled with is how to work the right amount of time. There was a stretch where I was working 12-14 days just because I kept finding more and more things to do. While that is good for your employer, it is not good for you or anyone around you. The work will always be there, so you have to find strategies to help you step away from work when you have put in enough hours for the day.

Everybody is different so if you are new to telecommuting/working remotely I encourage you to experiment with different techniques for managing your work/life balance. While I feel that working remotely is for the most part a bonus, it still has its own set of issues so please be careful and don’t work too much, and especially don’t expend extra energy or get too stressed out about things you can’t control.

Graphite threshold alerting with Sensu

September 8, 2015 Josh Reichardt 1 Comment

Instrumenting your code to report application level metrics is definitely one of the most powerful monitoring tasks you can accomplish. It is damn satisfying to get working the first time as well. Having the ability to look at your application and how it is performing at a granular level can help identify potential issues or bottlenecks but can also give you a greater understanding of how people are interacting with the application at a broad scale. Everybody loves having these types of metrics to talk about their apps and products so this style of monitoring is a great win for the whole team.

I don’t want to dive in to the specifics of WHAT you should monitor here, that will be unique to every environment. Instead of covering the what and how of instrumenting the code to report specific metrics, I will be running through an example of what the process might look like for instrumenting a check and alarm for monitoring and alerting purposes at an operations level. I am not a developer, so I don’t spend a lot of time thinking about what types of things are important to collect metrics on. Usually my job instead is to figure out how to monitor and alert effectively, based on the metrics that developers come up with.

Sensu has a great plugin to check Graphite thresholds in their plugin repo. If you haven’t looked already, take a minute to glance over the options a little bit and see how the plugin works. It is a pretty simple plugin but has been able to do everything I need it to.

One common monitoring task is to check how long requests are taking. So in this example, we are querying the Graphite server and reporting a critical status (status 1) if the request averages more than 7 seconds for a response time.

Here is the command you would run manually to check this threshold. Make sure to download the script if you haven’t already, you can just copy the code directly or clone the repo if you are doing this manually. If you are using Sensu you can use the sensu_plugin LWRP to grab the script (more on that below).

./check-data -s <servername:port> -t <graphite query> -c 7000 -u user -p password
./check-data -s graphite.example.com -t alias(stats.timer.server.response_time.mean, 'Mean') -c 7000 -u myuser -p awesomepassword

There are a few things to note. The -s flag specifies which graphite server or endpoint to hit, -t specifies the target or the graphite query to run the script against, the -c flag sets the threshold, -u and -p are used if your Graphite server uses authentication. If your Graphite instance is public it should probably use auth, otherwise if it is internal only, probably not as important. Obviously these are just dummy values, included to give you a better idea of what a real command should look like. Use your own values in their place.

The query we’re running is against a statsd metric that for mean response time for a request that gets recorded from the code (this is the developer instrumenting their code part I mentioned). This check is specific to my environment so you will need to modify any of your queries to make sure to alert on a useful metric and threshold in your own environment.

Here’s an example of what the graphite graph (rendered in Grafana) looks like.

Obviously this is just a sample but it should give you the general idea of what to look for.

If you examine the script, there are a few Ruby Gem requirements to get the script to run, which you will need to be installed if you haven’t already. They are sensu-plugin, json, json-uri and openssl. You don’t need the sensu-plugin if you are just running the check manually but you WILL need to have it installed on the Sensu client that will be running the scheduled check. That can be done manually or with the Sensu Chef recipe (specifically for turning on Sensu embedded ruby and ruby gems), which I recommend using anyway if you plan on doing any type of deployments at scale using Sensu.

Here is the Chef code looks like if you use Sensu to deploy this check automatically.

sensu_check "check_request_time" do 
  command "#{node['sensu']['plugindir']}/check-data.rb -s graphite.example.com -t \"alias(stats.timers.server.facedetection.response_time.mean, 'Mean')\" -c 7000 -a 360 -u myuser -p awesomepassword"
  handlers ["pagerduty", "slack"] 
  subscribers ["core"] 
  interval 60 
  standalone true 
  additional(:notification => "Request time above threshold", :occurrences => 5)
end

This should look familiar if you have any background using the Sensu Chef cookbook. Basically we are using the sensu_check LWRP to execute the script with the different parameters we want, using the pagerduty and slack handlers, which are just fancy ways to pipe out the results of the check. We are also saying we want to run this on a scheduled interval time of 60 seconds as a standalone check, which means it will be executed on the client node (not the Sensu server itself). Finally, we are saying that after 5 failed checks we want to append a message to the handler that says what exactly is going wrong.

You can stick this logic in an existing recipe or create a new one that handles your metrics threshold checks. I’m not sure what the best practice is for where to put the check but I have a recipe that runs standalone threshold checks that I stuck this logic in to and it seems to work. Once the logic has been added you should be able to run chef-client for the new check to get picked up.