Export SNMP metrics with the Prometheus Operator

There are quite a few use cases for monitoring outside of Kubernetes, especially for previously built infrastructure and otherwise legacy systems.  Additional monitoring adds an extra layer of complexity to your monitoring setup and configuration, but fortunately Prometheus makes this extra complexity easier to manage and maintain, inside of Kubernetes.

In this post I will describe a nice clean way to monitor things that are internal to Kubernetes using Prometheus and the Prometheus Operator.  The advantage of this approach is that it allows the Operator to manage and monitor infrastructure, and it allows Kubernetes to do what it’s good at; make sure the things you want are running for you in an easy to maintain, declarative manifest.

If you are already familiar with the concepts in Kubernetes then this post should be pretty straight forward.  Otherwise, you can pretty much copy/paste most of these manifests into your cluster and you should have a good way to monitor things in your environment that are external to Kubernetes.

Below is an example of how to monitor external network devices using the Prometheus SNMP exporter.  There are many other exporters that can be used to monitor infrastructure that is external to Kubernetes but currently it is recommended to set up these configurations outside of the Prometheus Operator to basically separate monitoring concerns (which I plan on writing more about in the future).

Create the deployment and service

Here is what the deployment might look like.

apiVersion: apps/v1beta1
kind: Deployment
metadata:
  name: snmp-exporter
spec:
  replicas: 1
  selector:
    matchLabels:
      app: snmp-exporter
  template:
    metadata:
    labels:
      app: snmp-exporter
  spec:
    containers:
    - image: oakman/snmp-exporter
    command: ["/bin/snmp_exporter"]
    args: ["--config.file=/etc/snmp_exporter/snmp.yml"]
    name: snmp-exporter
    ports:
    - containerPort: 9116
      name: metrics

And the accompanying service.

apiVersion: v1
kind: Service
metadata:
  labels:
    app: snmp-exporter
  name: snmp-exporter
spec:
  ports:
  - name: http-metrics
    port: 9116
    protocol: TCP
    targetPort: metrics
  selector:
    app: snmp-exporter

At this point you would have a pod in your cluster, attached to a static IP address.  To see if it worked you can check to make sure a service IP was created.  The service is basically what the Operator uses to create targets in Prometheus.

kubectl get sv

From this point you can 1) set up your own instance of Prometheus using Helm or by deploying via yml manifests or 2) set up the Prometheus Operator.

Today we will walk through option 2, although I will probably cover option 1 at some point in the future.

Setting up the Prometheus Operator

The beauty of using the Prometheus Operator is that it gives you a way to quickly add or change Prometheus specific configuration via the Kubernetes API (custom resource definition) and some custom objects provided by the operator, including AlertManager, ServiceMonitor and Prometheus objects.

The first step is to install Helm, which is a little bit outside of the scope of this post but there are lots of good guides on how to do it.  With Helm up and running you can easily install the operator and the accompanying kube-prometheus manifests which give you access to lots of extra Kubernetes metrics, alerts and dashboards.

helm repo add coreos https://s3-eu-west-1.amazonaws.com/coreos-charts/stable/
helm install --name prometheus-operator --set rbacEnable=true --namespace monitoring coreos/prometheus-operator
helm install coreos/kube-prometheus --name kube-prometheus --namespace monitoring

After a few moments you can check to see that resources were created correctly as a quick test.

kubectl get pods -n monitoring

NOTE: You may need to manually add the “prometheus” service account to the monitoring namespace after creating everything.  I ran into some issues because Helm didn’t do this automatically.  You can check this with kubectl get events.

Prometheus Operator configuration

Below are steps for creating custom objects (CRDs) that the Prometheus Operator uses to automatically generate configuration files and handle all of the other management behind the scenes.

These objects are wired up in a way that configs get reloaded and Prometheus will automatically get updated when it sees a change.  These object definitions basically convert all of the Prometheus configuration into a format that is understood by Kubernetes and converted to Prometheus configuration with the operator.

First we make a servicemonitor for monitoring the the snmp exporter.

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  labels:
    k8s-app: snmp-exporter
    prometheus: kube-prometheus # tie servicemonitor to correct Prometheus
  name: snmp-exporter
spec:
  jobLabel: k8s-app
  selector:
    app: snmp-exporter
  namespaceSelector:
    matchNames:
    - monitoring

  endpoints:
  - interval: 60s
    port: http-metrics
    params:
      module:
      - if_mib # Select which SNMP module to use
      target:
      - 1.2.3.4 # Modify this to point at the SNMP target to monitor
    path: "/snmp"
    targetPort: 9116

Next, we create a custom alert and tie it our Prometheus Operator.  The alert doesn’t do anything useful, but is a good demo for showing how easy it is to add and manage alerts using the Operator.

Create an alert-example.yml configuration file, add it as a configmap to k8s and mount it in as a configuration with the ruleSelector label selector and the prometheus operator will do the rest. Below shows how to hook up a test rule into an existing Prometheus (kube-prometheus) alert manager, handled by the prometheus-operator.

kind: ConfigMap
apiVersion: v1
metadata:
 name: josh-test
 namespace: monitoring
 labels:
 role: alert-rules # Standard convention for organizing alert rules
 prometheus: kube-prometheus # tie to correct Prometheus
data:
 test.rules.yaml: |
 groups:
 - name: test.rules # Top level description in Prometeheus
 rules:
 - alert: TestAlert
 expr: vector(1)

Once you have created the rule definition via configmap just use kubectl to create it.

kubectl create -f alert-example.yml -n monitoring

Testing and troubleshooting

You will probably need to port forward the pod to get access to the IP and port in the cluster

kubectl port-forward snmp-exporter-<name> 9116

Then you should be able to visit the pod in your browser (or with curl).

localhost:9116

The exporter itself does a lot more so you will probably want to play around with it.  I plan on covering more of the details of other external exporters and more customized configurations for the SNMP exporter.

For example, if you want to do any sort of monitoring past basic interface stats, etc. you will need to generate and build your own set of MIBs to gather metrics from your infrastructure and also reconfigure your ServiceMonitor object in Kubernetes to use the correct MIBs so that the Operator updates the configuration correctly.

Conclusion

The amount of options for how to use Prometheus is one area of confusion when it comes to using Prometheus, especially for newcomers.  There are lots of ways to do things and there isn’t much direction on how to use them, which can also be viewed as a strength since it allows for so much flexibility.

In some situations it makes sense to use an external (non Operator managed Prometheus) when you need to do things like manage and tune your own configuration files.  Likewise, the Prometheus Operator is a great fit when you are mostly only concerned about managing and monitoring things inside Kubernetes and don’t need to do much external monitoring.

That said, there is some support for external monitoring using the Prometheus Operator, which I want to write about in a different post.  This support is limited to a handful of different external exporters (for the time being) so the best advice is to think about what kind of monitoring is needed and choose the best solution for your own use case.  It may turn out that both types of configurations are needed, but it may also end up being just as easy to use one method or another to manage Prometheus and its configurations.

Read More

Configure a Rancher HAProxy health check

If you are familiar with ELB/ALB you will know that there are slight idiosyncrasies between the two.  For example, ELB allows you to health check a back end server by TCP port.  Basically allowing the user to check if a back end comes up and is listening on a specified port.  ALB is slightly different in its method for health checking.  ALB uses HTTP checks (layer 7) to ensure back end instances are up and listening.

This becomes a problem in Rancher, when you have multiple stacks in a single environment that are fronted by the Rancher HAProxy load balancer.  By default, the HAProxy config does not have a health check endpoint configured, so ALB is never able to know if the back end server is actually up and listening for requests.

A colleague and I  recently discovered a neat trick for solving this problem if you are fronting your environment with an ALB.  The solution to this conundrum is to sprinkle a little bit of custom configuration to the Rancher HAProxy config.

In Rancher, you can modify the live settings without downtime.  Click on the load balancer that sits behind the ALB and navigate to the Custom haproxy.cfg tab.

haproxy config

Modify the HAProxy config by adding the following:

# Use to report haproxy's status
defaults
    mode http
    monitor-uri /_ping

Click the “Edit” button to apply these changes and you should be all set.

Next, find the health check configuration for the associated ALB in the AWS console and add a check the the /_ping path on port 80 (or whichever port you are exposing/plan to listen on).  It should look similar to the following example.

Health checks

Below is an example that maps a DNS name to an internal Nginx container that is listening for requests on port 80.

HAPRoxy configuration

The check in ALB ensures that the HAProxy load balancer in Rancher is up and running before allowing traffic to be routed to it.  You can verify that your Rancher load balancer is working if the instances behind your ALB start showing a status of healthy in the AWS console.

NOTE: If you don’t have any apps initially behind the Rancher load balancer (or that are listening on the port specified in the health check) the AWS instances behind ALB will remain unhealthy until you add configuration in Rancher for the stacks to be exposed, as pictured above.

After setting up HAProxy, publicly accessible services in private Rancher environments can easily be managed by updating the HAProxy config.  Just add a dns name and a service to link to and HAProxy is able to figure out how and where to route requests to.  To map other services that aren’t listening on port 80, the process is  very similar.  Use the above as a guideline and simply update the target port to whichever port the app is listening on internally.

Read More

Set up SSL for Rancher Server

One issue you will probably run across if you start to use Rancher to manage your Docker containers is that it doesn’t serve pages over an encrypted connection by default.  If you are looking to put Rancher in to a production scenario, it is a good idea to serve encrypted pages.  HA is another topic, but at this point I have not attempted to set it up yet because it is a much more complicated process currently.  The Rancher folks are working on making HA easier in the near future (if you know an easy way to do it I would love to hear about it).  I would argue though that if you can set up SSL for your Rancher server you are over half way to a full production set up.

The process of getting Rancher to proxy through an encrypted connection is straight forward, assuming you already have some certs to use.  If you don’t already have any official certificates issued *I think* you should be okay with self signed certs, but you won’t get that green lock that everybody loves.  Definitely if you are just testing this set up you should be fine to start out with some self signed certs.  Here is a reference for creating some certs for Nginx to test with.

Another important thing to be aware of is that these instructions are specific to the Nginx method outline above.  I have not tried the Apache method, though I would guess it should be very easy to adapt.

Take a look at the Rancher docs as a starting point for getting started, they are very good and will get you most of the way there.  However, when I went through this process there were a few pieces of information that I had to piece together myself, which is the bulk of what I will be sharing today.

The first step is to adapt the configuration in the docs in to a full Nginx config that can be dropped in to the official Nginx image from Dockerhub.  Here is the config I used.

upstream rancher {
    server rancher-server:8080;
}

server {
    listen 443 ssl;
    server_name test.com;
    ssl_certificate /etc/rancher/test.com.crt;
    ssl_certificate_key /etc/rancher/test.com.key;

    access_log /var/log/nginx/access.log;
    error_log  /var/log/nginx/error.log;

    location / {
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Forwarded-Port $server_port;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_pass http://rancher;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        # This allows the ability for the execute shell window to remain open for up to 15 minutes. Without this parameter, the default is 1 minute and will automatically close.
        proxy_read_timeout 900s;
    }
}

server {
    listen 80;
    server_name test.com;
    return 301 https://$server_name$request_uri;
}

There are a few important things to note about this config.   One is naming the upstream the same name as what the rancher server container is named, in this case rancher-server.

Note that I have used test.com as the server name and so the certs and names are all reflective of that value.  Obviously that will need to be updated with your own values.

Finally, we have added an additional logging section to the config that will pipe the logs to stdout/stderr so we can easily look at the requests from the host OS via the “docker logs” command.

To get the following Docker run command to work correctly you will want to create a directory called /etc/rancher or something easy to remember, and place this config (named as rancher-nginx.conf), along with the certs you have created in to this location.  Alternately you can modify the Docker run command and simply have the volume mounts pointed at the location you store the configuration and certs.  For me, it makes the most sense to group these items together in /etc/rancher.

docker run -d --restart=always --name nginx 
    -v /etc/rancher/rancher-nginx.conf:/etc/nginx/conf.d/default.conf
    -v /etc/rancher/test.com.crt:/etc/rancher/test.com.crt
    -v /etc/rancher/test.com.key:/etc/rancher/test.com.key
    -p 80:80 -p 443:443 --link=rancher-server nginx

This will mount in the correct configuration and certificates to the Nginx docker container, expose port 80 and 443 for web traffic (make sure to adjust any firewall rules you need to get traffic to pass through these ports), and link to the rancher-server container so that the traffic can be proxied.

Additionally, you will need to update any reference to the old address that was using http://<rancher-name>:8080/ to point to https://<rancher-name>/.  Namely the host registration configuration in the Rancher server, but if you were relying on any other outside tools to hit that endpoint they will also need to be updated to use https instead.

Read More

Setting up a pfSense NAT instance in AWS

One important aspect of cloud deployments that often get overlooked, especially at start ups is the aspect of security.  So I thought I would take some time to go through the process of setting up a NAT instance on AWS with full firewall capabilities.  There are instructions and documentation for this process which are very good but aren’t completely clear so I will attempt to fill in some of the gaps I ran in to when attempting to set this up myself.

There is one thing to take note of if you have used pfSense before.  This firewall isn’t free.  There is a slight hourly charge for this that ends up coming out to about $500/yr (which comes out to about $42/month).  If you look at other commercial solutions with similar functionality you are looking at thousands of dollars per month in costs.  Long story short, the cloud images of pfSense has a tiny tiny cost associated with it but is very much worth it.

Just for reference I put together a few comparison prices.

  • Barracuda web app firewall – ($1.04-1.76/hr) (up to ~$1300/month)
  • Vyatta  ($0.30-1.50/hr) (up to ~$1100/month)
  • Sophos UTM ($0.35-$2.80/hr) (up to ~$2000/month)
  • pfSense ($0.07/hr) ($42/month)

As you can see, pfSense is very reasonable compared to some of the other bigger players.  You can build an r3.8xlarge instance and the software price won’t change which doesn’t seem to be the case with others.  One bonus to choosing pfSense is that you automatically qualify for support by agreeing to the ToS when getting the pfSense AMI set up.

Finally, pfSense is rock solid being built on top BSD and is thoroughly tested.  I have been running pfSense on other projects outside of AWS for 5+ years and have never had an issue with it outside of a dead hard drive one time.  Other added benefits of choosing pfSense are that updates are frequent and thoroughly tested, tons of add-ons including IPS’s and VPN’s so additional functionality can be built on top and great community support as well.

Getting started

There are a few good resources that I found to be useful when working through this problem, which got me most of the way to a working setup.  They are listed below.

And here is the link to my question about how to do this on serverfault, there is some good detail in the post over there.

Setting up the NAT in pfSense

The first issue that was confusing was the issue of getting the network interfaces set up and configured.  For this setup you will need two interfaces, preferably with static IP addresses.  You will also need to make sure that you disable source/destination checks for the interface that will be acting as the LAN interface that the nat goes through.  Disabling source and destination checks is pretty straightforward and is detailed in pretty much all of the guides.

You should note that there will be tabs for firewalling for LAN as well as WAN, if you can keep these two straight it should be much easier to troubleshoot and configure your pfSense machine.  Out of the box, the firewall on pfSense will not be configured to allow your LAN interface to do any sort of NATing, you will need to manually create rules to get started.  If you check the WAN firewall tab you should notice some access rules but the LAN tab should be empty.  Most of the work we will be doing will be on the LAN firewall.

The first rule to set up to make things easier to troubleshoot is a ping rule.  There is a WAN rule for ping but not for LAN.  You can essentially copy the WAN rule into a new one and modify it to look similar to the following.

LAN firewall rule

 

 

 

 

 

 

 

 

 

 

 

 

 

This rule will work for the template for the other rules that need to be put in to place.  The other rules will be for outbound web access.   Just copy this rule in to a new rule and change the protcol to TCP and make one rule that allows port 80 and another that allows 443.  The resulting should look similar to what I have listed below.

firewall rules for nat

Just a quick note.  If at any point you are having trouble seeing traffic or are getting stuck in your troubleshooting, an excellent way to figure out what is going on is the logging that is provided by pfSense.  You can access all of the various logs to see what is happening by selecting Status -> System Logs and the highlighting the firewall tab.

Modifying your outbound nat

Here is what your outbound NAT rule should look like.

outbound nat

 

 

 

 

 

 

 

 

 

 

 

 

 

Notice the “Networks_to_NAT” value in the source section.  This is a pfSense alias that can be used as a sort of variable to help ease management.  You can either use this alias or specify the local subnet you want to use here.  To check the values in your alias you can go to Firewall -> Aliases.

Conclusion

This setup will provide you with a nice easy way to manage your network in AWS.  The guides for setting up a NAT are nice and are a good first step but with a Firewall in place you can do so many other things, especially auditing that just aren’t available or viable with a straight AWS nat instance or that are way out of your price range with some of the other commercial solutions available.

pfSense also provides the capability to add more advanced tools like IDS/IPS, VPN and high availability if you choose so there is nice room for expansion.  Even if you don’t take advantage of all of the additional components of pfSense you will still have a rock solid firewall and nat instance that is suitable for production workloads at a fraction of the cost of other commercial solutions.

Read More

Fix Xbox Strict NAT on PFSense

Out of the box, it turns out that PFSense is not configured to handle some connection settings for Xbox Live.  Unfortunately I couldn’t find much of an explanation as to what this message actually means as far as degraded online performance but noticed that I would randomly get kicked out of games, get disconnected from XBox Live and have communication issues every once in a awhile so decided to take a look at what was actually going on because the mentioned issues started to get annoying.

I figured it should be easy enough to fix, but I couldn’t find a definite guide on how to fix this issue so I figured I would make sure it is clear for those who find this post and are having the same issue.  I tried a few different combinations, including port forward combinations mentioned in some forums, firewall rule changes, various UPnP settings, etc. but none of these combo’s worked and were unclear not very clear either.

Eventually I found this guide, which works and is great but doesn’t depict how to set everything up.  There are a few steps to get this working correctly so I will briefly describe them below.

Verify the IP address of you Xbox 360.  There is documentation around for finding it, but essentially go to system -> network -> advanced and it should give you the information.  You may want to set a static IP for your Xbox but I won’t cover that here.  Ask me if you have issues.

Now you will need to modify your firewall settings (Firewall -> NAT).  Choose the “Outbound” tab and change the mode to Manual Outbound NAT rule generation.  After you have saved the settings, create an entry for your Xbox and give it the address of your Xbox, with a mask of /32.

Firewall rule

Once this rule has been created, move it up to the top of the rule list.  You should have something similar to the following when done.

Firewall rules

Next, modify UPnP settings (Services -> UPnP & NAT-PMP) and select the following settings.

  • Enable UPnP & NAT-PMP
  • Allow UPnP port mapping
  • External Interface -> WAN
  • Interfaces -> LAN
  • User specified permissions 1- > allow 88-65535 192.168.39.17/32 88-65535

It should look something like this.

UPnP settings

Go ahead and save the settings and restart your Xbox (just turn off and on) to make sure the settings get picked up and that should be it.  I’m not entirely sure the user permissions need to be this wide open but it works so it is there for now.  I will update the post if I find any evidence that the settings need to be modified.

Read More