Category Archives: Scripting

Autosnap AWS snapshot and volume management tool

This is my first serious attempt at a Python tool on github.  I figured it was about time, as I’ve been leveraging Open Source tools for a long time, I might as well try to give a little bit back.  Please check out the project and leave feedback by emailing, opening a github or issue or commenting here, I’d love to see what can be done with this tool, there are lots of bugs to shake out and things to improve.  Even better if you have some code you’d like to contribute, this is very much a work in progress!

Here is the project – https://github.com/jmreicha/autosnap.

Introduction

Essentially, this tool is designed to ease the management of the snapshot and volume lifecycle in an AWS environment.  I have discovered that snapshots and volumes can be used together to form a simple backup management system, so by simplifying the management of these resources, by utilizing the power of the AWS API, you can easily manage backups of your AWS data.

While this obviously isn’t a full blown backup tool, it can do a few handy things like leverage tags to create and destroy backups based on custom expiration dates and create snapshots based on a few other criteria, all managed with tags.  Another cool thing about handling backups this way is that you get amazing resiliency by storing snapshots to S3, as well as dirt cheap storage.  Obviously if you have a huge number of servers and volumes your mileage will vary, but this solution should scale up in to the hundreds, if not thousands pretty easily.  The last big bonus is that you can nice granularity for backups.

For example, if you wanted to keep a weeks worth of backups across all your servers in a region, you would simply use this tool to set an expiration tag of 7 days and voila.  You will have rolling backups, based on snapshots for the previous seven days.  You can get the backup schedule fairly granular, because the snapshots are tagged down to the hour. It would be easy to get them down to the second if that is something people would find useful, I could see DB snapshots being important enough but for now it is set to the hour.

The one drawback is that this needs to be run on a daily basis so you would need to add it to a cron job or some other tool that runs tasks periodically.  Not a drawback really as much of a side note to be aware of.

Configuration

There is a tiny bit of overhead to get started, so I will show you how to get going.  You will need to either set up a config file or let autosnap build you one.  By default, autosnap will help create one the first time you run it, so you can use this command to build it:

autosnap

If you would like to provide your own config, create a file called ‘.config‘ in the base directory of this project.  Check the README on the github page for the config variables and for any clarifications you may need.

Usage

Use the –help flag to get a feeling for some of the functions of this tool.

$ autosnap --help

usage: autosnap [--config] [--list-vols] [--manage-vols] [--unmanage-vols]
 [--list-snaps] [--create-snaps] [--remove-snaps] [--dry-run]
 [--verbose] [--version] [--help]

optional arguments:
 --config          create or modify configuration file
 --list-vols       list managed volumes
 --manage-vols     manage all volumes
 --unmanage-vols   unmanage all volumes
 --list-snaps      list managed snapshots
 --create-snaps    create a snapshot if it is managed
 --remove-snaps    remove a snapshot if it is managed
 --version         show program's version number and exit
 --help            display this help and exit

The first thing you will need to do is let autosnap manage the volumes in a region:

autosnap --manage-vols

This command will simply add some tags to help with the management of the volumes.  Next, you can take a look and see what volumes got  picked up and are now being managed by autosnap

autosnap --list-vols

To take a snapshot of all the volumes that are being managed:

autosnap --create-snaps

And you can take a look at your snapshots:

autosnap --list-snaps

Just as easily you can remove snapshots older than the specified expiration date:

autosnap --remove-snaps

There are some other useful features and flags but the above commands are pretty much the meat and potatoes of how to use this tool.

Conclusion

I know this is not going to be super useful for everybody but it is definitely a nice tool to have if you work with AWS volumes and snapshots on a semi regular basis.  As I said, this can easily be improved so I’d love to hear what kinds of things to add or change to make this a great tool.  I hope to start working on some more interesting projects and tools in the near future, so stay tuned.

Setting up a private git repo in Chef

As a Chef newbie I really might have bit off more than I could chew originally when I was looking at how to get private github repo’s working but am glad I pushed through and got a solution working.  I have no idea if this is the preferred method or if there are any easier ways but this worked for me and so I am hoping that if any other Chef newbies stumble across this problem then they can use this post as a guide or reference.

First, I’d like to give credit where it is due.  I used this post as a template as well as the SSH wrapper section in the deploy documentation on the Chef website.

The first issue I had problems with, is that when you connect to github via SSH it wants the Chef client to accept its public fingerprint.  By default, if you don’t modify anything SSH will just sit there waiting for the fingerprint to be accepted.  That is why the SSH Git wrapper is used, it tells SSH on the Chef client that we don’t care about the authentication to the github server, just accept the key.  Here’s what my ssh git wrapper looks like:

 #!/usr/bin/env bash 
 /usr/bin/env ssh -o "StrictHostKeyChecking=no" -i "/home/vagrant/.ssh/id_rsa" $1 $2

You just need to tell your Chef recipe to use this wrapper script:

# Set up github to use SSH authentication 
cookbook_file "/home/vagrant/.ssh/wrap-ssh4git.sh" do 
  source "wrap-ssh4git.sh" 
  owner "vagrant" 
  mode 00700 
end

The next problem is that when using key authentication, you must specify both a public and a private key.  This isn’t an issue if you are running the server and configs by hand because you can just generate a key on the fly and hand that to github to tell it who you are.  When you are spinning instances up and down you don’t have this luxury (actually you might but it seemed like a pain in the ass).

To get around this, we create a couple of templates in our cookbook to allow our Chef client to connect to github with an already established public and private key, the id_rsa and id_rsa.pub files that are shown.  Here’s what the configs look like in Chef:

# Public key 
template "/home/vagrant/.ssh/id_rsa.pub" do 
  source "id_rsa.pub" 
  owner "vagrant" 
  mode 0600 
end 
 
# Private key 
template "/home/vagrant/.ssh/id_rsa" do 
  source "id_rsa" 
  owner "vagrant" 
  mode 0600 
end

After that is taken care of, the only other minor caveat is that if you are cloning a huge repo then it might timeout unless you override the default timeout value, which is set to 600 seconds (10 mins).  I had some trouble finding this information on the docs but thanks to Seth Vargo I was able to find what I was looking for. This is easy enough to accomplish, just use the following snippet to override the default value

timeout 9999

That should be it.  There are probably other, easier ways to accomplish this and so I definitely think the adage “there’s more than one way to skin a cat” applies here.  If you happen to know another way I’d love to hear it.

Gathering Exchange 2010 mail flow statistics

There are times when it can be useful and beneficial to have a good grasp on the details of what kind of mail traffic is running through your Exchange environment.  Recently I have been tasked with coming up with some environmental statistics for our Exchange 2010 servers to help size a new project we are starting soon.  There are a few different tools to help gather this information that I’d like to briefly go over today.  Before I start I’d like to point out that most of this stuff I am borrowing from others, however I think it is valuable to know how to do this type of thing.  With that said, I’m definitely not trying to take credit for any of these techniques, just trying to show the benefits.

There are a few different tools that will help to get a handle on your Exchange environment.  The first and quickest way to peer into your Exchange environment for some quick high level overview statistics is to use PowerShell.

The following command can be used to grab some basics stastics such as the total mailbox size, average maiblbox size, the max and the minimum sizes in your environment.

Get-Mailbox -Database MBDB1 | Get-MailboxStatistics | %{$_.TotalItemSize.Value.ToMB()} | Measure-Object -sum -average -max -min

It is important to note however that this command can take some time to complete and can be an intensive process because there are so many calculations going on, just be careful that you don’t crash anything.  This command may not be viable if the environment is enormous but if that is the case you probably don’t need to use any of these techniques anyway.

The next useful tool to gather up mail flow information uses the Microsoft Log Parser tool, which can be downloaded here.  The log parser basically allows us to query the Exchange message transport logs to pull out interesting information.  I found a great blog post that describes the process of using the log parser tool to query the message tracking logs to help determine daily send and receive traffic in your Exchange environment.  You can find the blog post here and I have it reference at the end of this article as well.

There are a few tricks however that I would like to mention because a few things in the blog post aren’t exactly obvious.  After downloading and installing the Log Parser you must run the command he has listed on his site using CMD, otherwise you will have to modify his commands to use PowerShell.

For this command to work correctly you must also navigate to the correct location where the transport logs are being stored.  In the default install of Exchange they are stored in:

C:\Program Files\Microsoft\Exchange Server\V14\TransportRoles\Logs\MessageTracking

So after you navigate to the correct location you run the command:

"C:\Program Files (x86)\Log Parser 2.2\logparser.exe" "SELECT TO_LOCALTIME(TO_TIMESTAMP(EXTRACT_PREFIX(TO_STRING([#Fields: date-time]),0,'T'), 'yyyy-MM-dd')) AS Date, COUNT(*) AS Hits from *.log where (event-id='RECEIVE') GROUP BY Date ORDER BY Date ASC" -i:CSV -nSkipLines:4 -rtp:-1

This will output the total number of send/receive messages for each date for the last 30 days on that particular server.  Another important thing to keep in mind is that you need to run this command on each server that has either the Hub Transport or Edge Transport role installed because each server houses a unique set of log files.

The last technique I’d like to go over for gathering interesting Exchange mail flow information is a script I found online, which can found here.  This is a very robust script that gathers a lot of specific information for a particular set of logs files.  Essentially this script functions similarly to the above Log Parser, except it grabs a lot more detail for a particular date.

This is easy to get working, just copy the script from the link into a .ps1 file and save it to a server that has the Exchange Management Shell installed on it.  If the EMS is not installed then this script will not function correctly.  The script will output some interesting details for each individual user including things like:

  • Username
  • Messages sent/received
  • Total MB sent/received
  • Internal sent/received stats
  • Unique messages sent

And output this information into a CSV file so it easy to manipulate the data at that point.  This kind of stuff is very useful in helping to determine things like average sent and received message size for example, I have not been able to provide that information to management easily until I found this script.

There are more techniques out there I’m sure, maybe even software that helps gather these sorts of stastics and information but for a quick and dirty way to grab some high level statistics you can’t really beat these techniques.  These methods are quick and will get you the information you need, which more often than not seems to be at least as detailed as the people requesting this information are looking for which is a win-win for everybody.  If you have any other input or questions about mail flow statistics feel free to let me know.

Resources:

http://exchangeserverpro.com/daily-email-traffic-message-tracking-log-parser/
http://exchangeserverpro.com/exchange-2010-message-tracking/
http://gallery.technet.microsoft.com/scriptcenter/bb94b422-eb9e-4c53-a454-f7da6ddfb5d6

Setting up Git in PowerShell

It seems like everybody is using git these days.  And for most, not everybody is stuck using Windows in their day to day workflow.  Unfortunately, I am.  So that means it is much more painful to get up and running with a lot of the coolest and best open source projects that are offered by members of github and other online code repositories being shared via git.  However, there is hope and it is possible for Windows users to join the git party.  So in this post, I would like to describe just how to do that.  And it should only take a few minutes if done correctly.  I will mention beforehand that there are a few steps that need to be completed in order for this technique to work successfully that typically are taken care of in a Linux or OSX environment.

The goal of this post is to work through these steps as best I can to get users up and running as quickly as possible and as easily as possible, reducing the amount of confusion and fumbling around with settings.  This post is designed for beginners that are just getting their feet wet with git but hopefully others can use it as a resource if they are coming from a different environment and are confused by the Windows way of doing things.

First step – Download and install the git port for Windows.

This is pretty straight forward.  Download and run the executable to install git for Windows.  If you just want to get up and running or are lazy, you can leave all of the defaults when you run through the installation wizard.

Second step – Add the git binaries to your system path variable.

This is the most important step, because out of the box git won’t work in your ordinary PowerShell command prompt, it needs to be opened separately.  So to fix this and add all the necessary binaries open up your environmental variables (in Windows 8).

Computer -> Properties -> Advanced -> Environmental Variables

environmental variables

and add the following value to the PATH variable.

C:\Program Files\Git\bin

Here is what this should look like in Windows.

path variable

Third step (optional) – Download and install posh-git for better PowerShell and git integration.

I have highlighted part of this process before in an older post but will go through the steps again because it is pretty straight forward.  To be able to get posh-git you need to have a sort of PowerShell package management tool called PsGet (instructions here).  To get this tool run the following command from your PowerShell command prompt.

(new-object Net.WebClient).DownloadString("http://psget.net/GetPsGet.ps1") | iex

Once the command has completed you should be able to simply run this install command and be finished.

install-module posh-git

That should be it.  With these simple steps you should be able to utilize git from the command line like you are accustomed to on other operating systems.  As I said, there is a tad more leg work but you can really utilize the flexibility of PowerShell to get things working.  I hope it helps, and as always let me know if you have any tips or questions.

Quickly Find Exchange Database Usage

Here is a Powershell script you can use to quickly determine the total amount of space taken up by all of your Exchange database files (edb files) on an Exchange server.  I’d like to note that this may not necessarily be a 100% accurate representation but is a great way to get a ballpark number without having to add the numbers up yourself, manually.

$dbs = Get-MailboxDatabase -Status

foreach($db in $dbs) {

$edbsize = $db.DatabaseSize.Tobytes()
$totalsize += $edbsize

}

Write-Host $totalsize

I noticed that I had no way to calculate the total amount of space being used by my Exchange databases the other day.  And even after scouring through teh Googles I was unable to find what I was looking for quickly so I wrote this script up quick to fix that problem.  Just copy the previous bit of code into a ps1 file with notepad and execute the script from your EMS.  It is a super simple way to iterate through all the databases, save their sizes to a variable and then spit that variable out when it is complete.