Introduction to Grafana

Grafana is a beautiful dashboard for displaying various Graphite metrics through a web browser.  Grafana is nice because it is simple to set up and maintain and is easy to use and displays metrics in a very nice Kibana like display style.  I would like to walk readers through the basics of this tool because although it is a very new project (beginning of 2014) it has an enormous amount of potential and I honestly believe that there is enough functionality to put it into production.

The guys over at Hosted Graphite have just recently released hosted Grafana as part of their standard plan.  If you are interested you can check it out over at Hosted Graphite.

One very nice feature of this product offering is that you do not need to worry about any of the behind the scenes details or intricacies of how all of the Graphite components work together.  You just click a few buttons and you are ready to go.  Highly recommended if you just need something that works, go check it out.

Anyway, Grafana gives you the ability to bolt on all kinds of bells and whistles to your graphs.  For example, there is a nice little node.js tool called statsd written by the guys over at etsy that sort of expands the capabilities of Graphite.  And since it hooks right in with Graphite, it makes it very simple to represent and output the various metrics into Grafana.

If you do choose to roll your own Grafana solution, there are a few gotchas that I was not aware of that I’d like to cover.  The first, which should seem easy enough is that you need to run but Graphite and Grafana simultaneously.  So that means you either need to create two virtual hosts depending on which web server you choose to use or you need to to create to different port bindings.  The Grafana documentation has a few examples of how to create new port binding, which uses Apache as the web server but it is pretty easy to find examples of configs on Github and other sites if you are interested in using nginx as your web server instead.  The important thing to remember is that you will want to have two sites listed in your sites-enabled directory inside of the apache directory.  One for Graphite (which I moved to port 8080 for simplicity) and one for Grafana (which I stuck on port 80).

If you choose to use authentication there are a few extra headers that need to be written as well so that Grafana is accessible correclty.  This is easy to add to either your nginx or you apache config.

Here is the adapted Apache configuration necessary to add in authentication, where website name is the globally reachable DNS name of your webserver:

Header set Access-Control-Allow-Origin "http://WEBSITENAME"
Header set Access-Control-Allow-Methods "GET, OPTIONS"
Header set Access-Control-Allow-Headers "origin, authorization, accept"
Header set Access-Control-Allow-Credentials true

<Location />
  AuthName "graphs restricted"
  AuthType Basic
  AuthUserFile /etc/apache2/htpasswd
  <LimitExcept OPTIONS>
    require valid-user
  </LimitExcept>
</Location>

That is almost the entire configuration that I have for the Grafana webserver.  You will need to download and install an Apache lib tool to generate a hash to use for you AuthUserFile, directions can be pieced together from this.  The Graphite configuration is only a little bit more complex and was created by Chef, so I will go ahead and post that here as well:

<VirtualHost *:8080>
  ServerName SERVER
  ServerAdmin [email protected]
  DocumentRoot "/opt/graphite/webapp"
  ErrorLog /opt/graphite/storage/log/webapp/error.log
  CustomLog /opt/graphite/storage/log/webapp/access.log common

  Header set Access-Control-Allow-Origin "*"
  Header set Access-Control-Allow-Methods "GET, OPTIONS"
  Header set Access-Control-Allow-Headers "origin, authorization, accept"

  WSGIScriptAlias / /opt/graphite/conf/wsgi.py

  <Directory /opt/graphite>
  <Files wsgi.py>
    Require all granted
  </Files>
  </Directory>

  <Location "/content/">
    SetHandler None
  </Location>

  <Location "/media/">
    SetHandler None
  </Location>

 # NOTE: In order for the django admin site media to work you
 # must change @DJANGO_ROOT@ to be the path to your django
 # installation, which is probably something like:
 # /usr/lib/python2.6/site-packages/django
 Alias /media/ "@DJANGO_ROOT@/contrib/admin/media/"

</VirtualHost>

Again pretty straight forward.  If you choose to add authentication, it is done in a similar way.

The only other step is to add your metrics into Graphite.  I am using Sensu, which I have written a bit about and plan to write more about in the future.  Reference some of those writings to get an idea for metric gather and if you have any question let me know.  I will be writing a post in the near future about gathering and aggregating metrics with Sensu.  For now I will just assume that you already have a way to collect metrics.

Once you have everything set up you should be able to start playing around with the Grafana GUI.  After you have your configurations ironed out pretty much everything else is done through the GUI which is nice.

To illustrate the power of Grafana, here are a few example dashboards I have built recently:

grafana dashboard graph
Memory Used
grafana dashboard graph
CPU usage
grafana dashboard graph
Disk space used

If you want to check out the Grafana project you can find more information either on their Github page or their website.  The docs page on the main website is a great resource as well as the IRC channel.  In fact, the IRC channel is probably the most preferable place to go for help because it isn’t overcrowded and the creator of Grafana is in there quite a bit.

Read More

Set up PagerDuty alerts in Sensu

I am currently in the midst of rolling a monitoring solution using Sensu and a handful of other tools, which I will be covering sporadically in the future.  Onee important facet of any good monitoring solution is a reliable alerting method.  Sensu uses a distributed approach to monitoring so all of the components are spread out rather than run as one monolithic system.  So following this principle, Sensu integrates nicely with the awesome PagerDuty tools for alerting.  You can find more information about Sensu and its architecture over at the docs page of their website.

“The Sensu way” involves using what is called a handler (for the uninitiated) to trigger an alert.  So for example, my setup involves a number of checks, which are run on each of my clients.  These checks have associated subscribers and handlers that report back to the Sensu server.  From there the Sensu server will run the handler(s) specified and do something with results of the check that was run on the Sensu client.

For my project I am using PagerDuty to generate alerts if disk space gets low or a process dies.  I will briefly run through the steps of how to set the PagerDuty integration up because there were a few roadblocks that I encountered when I set this up the first time.

This set of instructions assumes that you already have a PagerDuty account created and configured.  So the first step is to create a Service API check for Sensu.  Pick a suitable name and choose Use our API directly.  It should look similar to the following:

pagerduty api key

Now that we have an API key set up in PagerDuty we should be able to jump on the Sensu server and add in the apporpriate json to configure the Sensu handler to communicate with PagerDuty.  Place the following contents in /etc/sensu/conf.d/handlers/pagerduty.json.

{
 "pagerduty": {
   "api_key": "xxxxxx"
 },
 "handlers": {
   "pagerduty": {
     "type": "pipe",
     "command": "/etc/sensu/plugins/pagerduty.rb",
     "severities": [
       "critical",
       "ok"
       ]
     }
   }
}

I learned (the hard way) that the pagerduty.rb script won’t work out of the box.  It relies on a ruby gem called redphone.  It is easy enough to install and get working, just do a gem install redphone and you should be all set.

Next, go ahead and download the pagerduty.rb script to the appropriate location on the Sensu server:

cd /etc/sensu/plugins
wget -O /etc/sensu/plugins/pagerduy.rb https://raw.github.com/sensu/sensu-community-plugins/master/handlers/notification/pagerduty.rb

That should be it.  One good way to check if things are working and that the checks and handler are actually firing correctly is to tail the log file on both the client and server. On the server the log is located at /var/log/sensu/sensu-server.log and on the client machine at /var/log/sensu/sensu-client.log.

Bonus:  Chef integration

Of course all of this can be automated using Chef, which is ultimately what I ended up doing, so I will share some of the things that I learned in the process.  For starters, I am using the Sensu Chef cookbook, created by the maintainer of the Sensu project.  This cookbook exposes a few useful options for configuration Sensu.  You will need to clone the cookbook directly from the github repository to get the newest features that we need, as the Opscode version has not yet been updated to incorporate them.

Just add this line to your recipe before you call any of the Sensu resources/providers.

include_recipe "sensu::default"

The Sensu coobook exposes a number of nice resources that we can use in our recipes to deploy Sensu.  As an example if you wanted to clone the PagerDuty handler to the Chef server you would use something like the following in your recipe:

sensu_plugin "https://raw.githubusercontent.com/sensu/sensu-community-plugins/master/handlers/notification/pagerduty.rb"

Which will place the pagerduty.rb script into the appropriate directory automaitcally.  There are other options as well, but this should do the trick.  You can find some more examples here.

Define your pagerduty handler:

sensu_handler "pagerduty" do 
  type "pipe" 
  command "/etc/sensu/plugins/pagerduty.rb" 
  severities ["ok", "critical"] 
end

You will need to add this handler to each check that you want to receive an alert on, and you will also need to subscribe your host to that check as well.  Here is what an example check might look like:

sensu_check "check_ntp" do 
   command "/etc/sensu/plugins/check-procs.rb -p ntpd -C 1" 
   handlers ["pagerduty"] 
   subscribers ["core"] 
   interval 60 
   additional(:notification => "NTP is not running", :occurrences => 5) 
end

That’s all I have for now.  So far Sensu has been amazing, it is very flexible and the IRC channel an excellent resource.  The docs are nice as well.  Again, props to Sean Porter for creating an awesome new way to do monitoring.  I am still just flirting with the very top of the iceburg as far as the capabilites of Sensu go and will be revisiting this subject in the future.

Read More

Set up PEM key authentication

Many times it is useful to keys to authenticate to your servers.  This can dramatically improve security and is a great way to manage servers in bulk as well.  You just need to keep track of your keys rather than having to remember a large number of passwords.  The steps to get PEM key authentication are fairly straight forward but it never hurts to walk through the process of getting them set up correctly.

Side Note: I’d like to also mention briefly, that I have these steps set up to work with Chef, so every server that gets deployed using Chef will use PEM keys out of the box, which works out very nicely.  If you’re interested I can expound on that topic a little more, just let me know.

The first step in the process is to generate some keys using openssl.  If you don’t have openssl go download and install it.  If you do have openssl but haven’t updated in ahwile, please update to avoid the heartbleed vulnerability that was recently exploited (nearly all distributors have released the patched version at this point so it should be trivial).

We want to generate our key and create a PEM file out of it.  Here are the steps:

cd ~/.ssh
ssh-keygen -t dsa -b 1024
openssl dsa -in id_dsa -outform pem > test.pem
cat ida_dsa >> authorized_keys

You can leave the values blank (default) in the ssh-keygen.

Now you should have similar listings in your ~/.ssh directory:

ssh keys

  • authorized_keys – This is the public key that the pem file gets authenticated against
  • id_dsa – This is the private portion of the key that we generated in the steps above
  • id_dsa.pub – This is the public key section that is used when authenticating
  • test.pem – this is the file that will be used to authenticate.  Essentially the private key minus the pass phrase

Now you just need to copy the test.pem file that was just generated to a different host in order to log in with your PEM key using scp or rsync.  Once that is done, the command to connect to the remote host using  your key should look similar to the following:

ssh -i /path/to/pem user@server-name

Next steps.  At this point you should have a working pem authentication on your server.  It is probably a good idea at this point to start looking at hardening the security as well as the SSH configuration on the host.  Small things can go a long way.  For example disabling root login, disabling password authentication, etc. will stop a very large amount of attacks from hitting your server now that you are authenticating with pem keys.

Read More

Introduction to Logstash+ElasticSearch+Kibana

There are a few problems with the current state of logging.  The first is that there is no real unified or agreed upon standard for how to do logging, across software platforms, so it is typically left up to the software designer to choose how to design and output logs.  Because of this non standardized approach, there are many many different formats that logs can become.  Obviously this is an issue if you are attempting to gather useful and meaningful data from a variety of different sources.  Because of this large number of log types and formats, numerous logging tools have been created, all trying to solve a certain type of logging problem, and so selecting one tool that offers everything can quickly become a chore.  The other big problem is that logs can produce an overwhelming amount of information.  Many of the traditional tools do nothing to correlate and represent the data that they collect.  Therefore, narrowing down specific issues can also become very difficult.

Logstash solves both of these problems in its own way.  First, it does a great job of abstracting out a lot of the difficulty with log collection and management.  So for example, you need to collect MySQL logs, Apache logs, and syslogs on a system.  Logstash doesn’t discriminate, you just tell what Logstash to expect and what to expect and it will go ahead and process those logs for you.  With ElasticSearch and Kibana, you can quickly gather useful information by searching through logs and identifying patterns and anomalies in your data.

The goal of this post will be to take readers through the process of getting up and running, starting from scratch all the way up into a working example.  Feel free to skip through any of the various sections if you are looking for something specific.  I’d like to mention quickly that this post covers the steps to configuring Logstash 1.4.0 on an Ubuntu 13.10 system with a log forwarding client on anything you’d like.  You *may* run into issues if you are trying these steps on different versions or Linux distributions.

Installing the pieces:

We will start by installing all of the various pieces that work together to create our basic centralized logging server.  The architecture can be a little bit confusing at first, here is a diagram from the Logstash docs to help.

Logstash architecture

Each of the following components do a specific task:

  • Java – Runtime Environment that Logstash uses to run.
  • Logstash – Collects and processes the logs coming into the system.
  • ElasticSearch – This is what stores, indexes and allows for searching the logs.
  • Redis – This is used as a queue and broker to feed messages and logs to logstash.
  • Kibana – Web interface for searching and analyzing logs stored by ES.

Java

sudo apt-get install openjdk-7-jre

Logstash

cd ~
curl -O https://download.elasticsearch.org/logstash/logstash/logstash-1.4.2.tar.gz
tar zxvf logstash-1.4.2.tar.gz

ElasticSearch (Logstash is picky about which version gets installed)

wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.3.2.deb
dpkg -i elasticsearch-1.3.2.deb

Redis

apt-get install redis-server

Kibana

cd ~
wget https://download.elasticsearch.org/kibana/kibana/kibana-3.0.0.tar.gz
tar xvfz kibana-3.0.0.tar.gz

Nginx

sudo apt-get install nginx

Configuring the pieces:

This is an important component in a successful setup because there are a lot of different moving parts here.  If something isn’t working correctly you will want to double check that you have all of your configs setup correctly.

Redis

This step will bind redis to your public interface, so that other servers can connect to it.  Find the line in the /etc/redis-server/redis.conf file

bind 127.0.0.1

and change it to the following:

bind 0.0.0.0

we need to restart redis for it to pick up our change:

sudo service redis-server restart

ElasticSearch

Find the lines in the /etc/elasticsearch/elasticsearch.yml file and change them to the following:

cluster.name: elasticsearch
node.ame: "logstash test"

and restart elasticsearch:

sudo service elasticsearch restart

You can test your config and elasticsearch by browsing to the name/IP of the host and its port http://192.168.1.200:9200

Logstash server

We need to create a config here for the central Logstash.  Let’s create a file called /etc/logstash/server.conf and input the following:

input {
  redis {
    host => "192.168.1.200"
    type => "redis"
    data_type => "list"
    key => "logstash"
  }
}
output {
stdout { }
  elasticsearch {
    cluster => "elasticsearch"
  }
}

Replace host with the local IP address of you redis server, in this case it is on the same host as logstash.

FInally, fire up the Logstash server with the following command:

logstash/bin/logstash --verbose -f /etc/logstash/server.conf

Kibana

You will need to navigate to your Kibana files.  From the installation steps above we chose ~/kibana-3.0.0.  So to get everything working we need to edit a file named config.js in the Kibana directory to point it to the correct host.  Change it from:

elasticsearch: "http://"+window.location.hostname+":9200"

to

elasticsearch: "http://192.168.1.200:9200"

Nginx

We are almost done.  We just have to configure nginx to point to our Kibana website.  To do this we need to copy the kibana directory to the default webserver root.

mkdir /var/www
cp -R ~/kibana-3.0.0 /var/www/kibana

Finally we edit edit the /etc/nginx/sites-enabled/default file and find the following:

root /usr/share/nginx/www;

and change it to read as follows:

root /var/www/kibana;

now restart Nginx:

sudo service nginx restart

Now you should be able to open up a browser and navigate to either http://localhost or to your IP address and get a nice web GUI for Kibana.

Logstash client

We’re almost finished.  We just need to configure the client to forward some logs over to our *other* Logstash server.  Follow the instruction for downloading Logstash as we did earlier on the centralized logging server.  Once you have the files ready to go you need to create a config for the client server.

Again we will create a config file.  This time it will be /etc/logstash/agent.conf and we will use the following configuration:

input {
  file {
    type => "apache-access"
    path => "/var/log/apache2/other_vhosts_access.log"
  }

  file {
    type => "apache-error"
    path => "/var/log/apache2/error.log"
  }
}

filter {
  grok {
    match => { "message" => "%{COMBINEDAPACHELOG}" }
  }
  date {
  match => [ "timestamp" , "dd/MMM/yyyy:HH:mm:ss Z" ]
  }
}

output {
  stdout { }
  redis {
    host => "192.168.1.200"
    data_type => "list"
    key => "logstash"
  }
}

Let’s fire up our client with the following command:

logstash/bin/logstash --verbose -f /etc/logstash/server.conf agent

If you switch back to your browser and wait a few minutes you should start seeing some logs being displayed.  If you start seeing logs coming in and displayed on the event chart you have it working!

Kibana interface

Conclusion

As I have learned with everything, there are always caveats.  For example, I was getting some strange errors on my client endpoint whenever I ran the logstash agent to forward logs to the central logstash server.  It turned out that Java didn’t have enough RAM and CPU assigned to it to begin with.  You need to be aware that you may run into seemingly random problems if you don’t allocate enough resources to the machine.

Another quick tip when you are encountering issues or things just aren’t working correctly is to turn on verbosity (which we have done in our example) which will enable you to gather some clues to help identify more specific problems you are having.

Resources

http://www.slashroot.in/logstash-tutorial-linux-central-logging-server
http://logstash.net/docs/1.4.0/

Read More

Selecting a Cloud Provider

Well the deed is done.  I finally have migrated the blog off my home server and onto a hosted provider.  The site is finally starting to get big enough that I felt like a migration would be a step in the right direction.  When I originally created this blog it was more of an experiment than anything else, but over the past 2 years I have seen the blog and my writing grow in directions I really didn’t anticipate when it was created, which is very exciting for me.

Due to this growth, one of my main concerns is stability.  With the stage that this project is in now I just want a place that has good bandwidth and is available 24×7 when I want to write something.  I don’t want to have to worry about the power going out at my home or any sort of service disruption to my internet to cause my site to be unreachable.  My traffic numbers are relatively low if compared to some other popular websites but they aren’t anything to sneeze at either now that the blog and my writings have become more established and so it is important for me to have the site available all the time to people.  If you’re interested in the migration comment or send me a note and I can do a quick write up or go into more depth, but I figured I’d spare the details because there are already a number of good guides out there on how to do it.  The only real issue was upgrading from Apache 2.2 -> 2.4.

I’d like to take some time and talk about moving to a cloud provider.  It may be an unfamiliar process to some so there might be a few good takeaways by covering the topic briefly.  Cloud hosting and cloud technologies are evolving to be much more than just a fad, and a lot of companies are trying to position themselves for the next generation of cloud computing moving forward.  Choosing a cloud provider offers a number of benefits including lower over head and maintenance costs with running servers and a data center.  It also alleviates infrastructure maintenance, stability issues and dealing with hardware failures and troubleshooting.

There are a very large number of cloud providers out there currently and the competition is fierce.  In fact the competition is so fierce right now that Google and Amazon have recently begun a price war.

  • Heroku
  • Amazon AWS
  • Joyent
  • Azure
  • Rackspace
  • Linode
  • DreamHost
  • Google Cloud Platform
  • Digital Ocean

In my current view they are all great in their own way.  By that I mean that each of these providers can provide value in their own way.  If you are a business looking to move to the cloud the AWS is the way to go.  The Amazon cloud has been tested and vetted by some of the largest cloud companies (Netflix, Pinterest, LinkedIn).  It has been around for a very long time in cloud years so Amazon has been able to work out most of the issues as well create a golden standard.  The trade off is that for somebody new to cloud computing the services and interface can be confusing.  There are a lot of bells and whistles, which many people do not need.

If you’re a Microsoft shop you might take a look at the Azure platform.  Azure does a really good job of integrating with other Microsoft products and services.  This would be a logical move for anybody that leverages Microsoft technologies.

Rackspace and Joyent both leverage OpenStack for their underlying architecture.  OpenStack is open source software so there are some really interesting things revolving around that platform and technology.

Ultimately I decided to go with Digital Ocean for this project for a couple of reasons.  First, the price was there.  My blog doesn’t require a lot of horsepower so I was able to spin up a 1GB, 1CPU, 30GB, 1TB bandwidth Ubuntu server on DO for $10/month.  The second reason that I like DO and which many other people are moving towards DO is that the setup and configuration process is stupidly simple and easy.  Create an account, setup a credit card and away you go, up into the clouds.  The process from start to having a new server up literally took me 10 minutes.

That is the beauty of DO’s approach to cloud.  Make things as simple as possible for people to get up and going.  Certainly there aren’t nearly as many features as many other platforms but for many scenarios people just want a server to play around with, and DO does a great job of making that possible.

The point I’m trying to get at here is that there are basically different tools for different jobs.  You need to evaluate what all is out there and how it will suit your needs.  If you aren’t a Microsoft shop then you might not need to use Azure.  The good news is that with all of this competition and rivalry prices are dropping and more options and niches are becoming available as products mature and as new providers enter the scene.  The cloud introduces some pretty neat features and technologies but ultimately you need to decide what you or your business is looking for before you make a decision.

Read More