Category Archives: Scripting

Gathering Exchange 2010 mail flow statistics

There are times when it can be useful and beneficial to have a good grasp on the details of what kind of mail traffic is running through your Exchange environment.  Recently I have been tasked with coming up with some environmental statistics for our Exchange 2010 servers to help size a new project we are starting soon.  There are a few different tools to help gather this information that I’d like to briefly go over today.  Before I start I’d like to point out that most of this stuff I am borrowing from others, however I think it is valuable to know how to do this type of thing.  With that said, I’m definitely not trying to take credit for any of these techniques, just trying to show the benefits.

There are a few different tools that will help to get a handle on your Exchange environment.  The first and quickest way to peer into your Exchange environment for some quick high level overview statistics is to use PowerShell.

The following command can be used to grab some basics stastics such as the total mailbox size, average maiblbox size, the max and the minimum sizes in your environment.

Get-Mailbox -Database MBDB1 | Get-MailboxStatistics | %{$_.TotalItemSize.Value.ToMB()} | Measure-Object -sum -average -max -min

It is important to note however that this command can take some time to complete and can be an intensive process because there are so many calculations going on, just be careful that you don’t crash anything.  This command may not be viable if the environment is enormous but if that is the case you probably don’t need to use any of these techniques anyway.

The next useful tool to gather up mail flow information uses the Microsoft Log Parser tool, which can be downloaded here.  The log parser basically allows us to query the Exchange message transport logs to pull out interesting information.  I found a great blog post that describes the process of using the log parser tool to query the message tracking logs to help determine daily send and receive traffic in your Exchange environment.  You can find the blog post here and I have it reference at the end of this article as well.

There are a few tricks however that I would like to mention because a few things in the blog post aren’t exactly obvious.  After downloading and installing the Log Parser you must run the command he has listed on his site using CMD, otherwise you will have to modify his commands to use PowerShell.

For this command to work correctly you must also navigate to the correct location where the transport logs are being stored.  In the default install of Exchange they are stored in:

C:\Program Files\Microsoft\Exchange Server\V14\TransportRoles\Logs\MessageTracking

So after you navigate to the correct location you run the command:

"C:\Program Files (x86)\Log Parser 2.2\logparser.exe" "SELECT TO_LOCALTIME(TO_TIMESTAMP(EXTRACT_PREFIX(TO_STRING([#Fields: date-time]),0,'T'), 'yyyy-MM-dd')) AS Date, COUNT(*) AS Hits from *.log where (event-id='RECEIVE') GROUP BY Date ORDER BY Date ASC" -i:CSV -nSkipLines:4 -rtp:-1

This will output the total number of send/receive messages for each date for the last 30 days on that particular server.  Another important thing to keep in mind is that you need to run this command on each server that has either the Hub Transport or Edge Transport role installed because each server houses a unique set of log files.

The last technique I’d like to go over for gathering interesting Exchange mail flow information is a script I found online, which can found here.  This is a very robust script that gathers a lot of specific information for a particular set of logs files.  Essentially this script functions similarly to the above Log Parser, except it grabs a lot more detail for a particular date.

This is easy to get working, just copy the script from the link into a .ps1 file and save it to a server that has the Exchange Management Shell installed on it.  If the EMS is not installed then this script will not function correctly.  The script will output some interesting details for each individual user including things like:

  • Username
  • Messages sent/received
  • Total MB sent/received
  • Internal sent/received stats
  • Unique messages sent

And output this information into a CSV file so it easy to manipulate the data at that point.  This kind of stuff is very useful in helping to determine things like average sent and received message size for example, I have not been able to provide that information to management easily until I found this script.

There are more techniques out there I’m sure, maybe even software that helps gather these sorts of stastics and information but for a quick and dirty way to grab some high level statistics you can’t really beat these techniques.  These methods are quick and will get you the information you need, which more often than not seems to be at least as detailed as the people requesting this information are looking for which is a win-win for everybody.  If you have any other input or questions about mail flow statistics feel free to let me know.

Resources:

http://exchangeserverpro.com/daily-email-traffic-message-tracking-log-parser/
http://exchangeserverpro.com/exchange-2010-message-tracking/
http://gallery.technet.microsoft.com/scriptcenter/bb94b422-eb9e-4c53-a454-f7da6ddfb5d6

Setting up Git in PowerShell

It seems like everybody is using git these days.  And for most, not everybody is stuck using Windows in their day to day workflow.  Unfortunately, I am.  So that means it is much more painful to get up and running with a lot of the coolest and best open source projects that are offered by members of github and other online code repositories being shared via git.  However, there is hope and it is possible for Windows users to join the git party.  So in this post, I would like to describe just how to do that.  And it should only take a few minutes if done correctly.  I will mention beforehand that there are a few steps that need to be completed in order for this technique to work successfully that typically are taken care of in a Linux or OSX environment.

The goal of this post is to work through these steps as best I can to get users up and running as quickly as possible and as easily as possible, reducing the amount of confusion and fumbling around with settings.  This post is designed for beginners that are just getting their feet wet with git but hopefully others can use it as a resource if they are coming from a different environment and are confused by the Windows way of doing things.

First step – Download and install the git port for Windows.

This is pretty straight forward.  Download and run the executable to install git for Windows.  If you just want to get up and running or are lazy, you can leave all of the defaults when you run through the installation wizard.

Second step – Add the git binaries to your system path variable.

This is the most important step, because out of the box git won’t work in your ordinary PowerShell command prompt, it needs to be opened separately.  So to fix this and add all the necessary binaries open up your environmental variables (in Windows 8).

Computer -> Properties -> Advanced -> Environmental Variables

environmental variables

and add the following value to the PATH variable.

C:\Program Files\Git\bin

Here is what this should look like in Windows.

path variable

Third step (optional) – Download and install posh-git for better PowerShell and git integration.

I have highlighted part of this process before in an older post but will go through the steps again because it is pretty straight forward.  To be able to get posh-git you need to have a sort of PowerShell package management tool called PsGet (instructions here).  To get this tool run the following command from your PowerShell command prompt.

(new-object Net.WebClient).DownloadString("http://psget.net/GetPsGet.ps1") | iex

Once the command has completed you should be able to simply run this install command and be finished.

install-module posh-git

That should be it.  With these simple steps you should be able to utilize git from the command line like you are accustomed to on other operating systems.  As I said, there is a tad more leg work but you can really utilize the flexibility of PowerShell to get things working.  I hope it helps, and as always let me know if you have any tips or questions.

Quickly Find Exchange Database Usage

Here is a Powershell script you can use to quickly determine the total amount of space taken up by all of your Exchange database files (edb files) on an Exchange server.  I’d like to note that this may not necessarily be a 100% accurate representation but is a great way to get a ballpark number without having to add the numbers up yourself, manually.

$dbs = Get-MailboxDatabase -Status

foreach($db in $dbs) {

$edbsize = $db.DatabaseSize.Tobytes()
$totalsize += $edbsize

}

Write-Host $totalsize

I noticed that I had no way to calculate the total amount of space being used by my Exchange databases the other day.  And even after scouring through teh Googles I was unable to find what I was looking for quickly so I wrote this script up quick to fix that problem.  Just copy the previous bit of code into a ps1 file with notepad and execute the script from your EMS.  It is a super simple way to iterate through all the databases, save their sizes to a variable and then spit that variable out when it is complete.

Getting Python Fabric setup in Windows

This has really turned into a wild goose chase.  Initially my goal when I set out on this project was simply to get Fabric up and running so I could test out some different features on some network gear.  It seems like the Python integration in Windows is very different than it is in the Linux world where everything is all bundled up nice and neatly.  There are several separate, seemingly unrelated pieces that all need to fit together to get Python and Fabric working correctly in a Windows environment, which can be very perplexing at first, hence my need to write a post so I don’t have to remember all this complexity for next time.  I thought I might as well show people how I got this to work instead of picking and choosing different bits of information from the internet.

The following is a list of links that I have found to be helpful in getting everything up and going, flip back to here for the different resources and components:

There’s a few steps for getting up and running.  For basic Python functionality it should be enough to download and install Python via the basic installer in your Windows environment.  Accepting the defaults should be enough.  Also, I recommend going with Python 2.7, rather than 3.3 because it has much better backwards compatibility.  You will also want to double check to make sure you download the correct version for you OS as well, either 32-bit or 64-bit.

Once you have your Python install up and going you will want to get pip installed. You will use this tool to get Python modules because it aids tremendously with downloading, managing and installing useful Python code.

So to get up and running with pip, first make sure that you have the correctly matched version of Python and the pip installed for your environment.  For example the 2.7 pip installer will not work with a 3.3 Python installation.  Second, you will need to make sure you have the Distribute package installed in your Python environment as well.  This is the tool that will allow pip to work.  Once you have these modules installed you will need to switch to the directory where pip is installed (or add it to your ENV path variable).  For me it was located in the following location:

C:\Python27\Scripts\pip.exe

So the command to install Fabric would be as follows:

pip.exe install fabric

You would think that’s all you need to get fabric working right?  Well it turns out that using this method we do not have the correct version of Pycrypto installed.

pycrypto error

So using the link posted above go ahead and get the correct version of Pycrypto downloaded and installed (version 2.1.0).  That still doesn’t fix it though!  It just gets us to a different error.  I used this post and this post as a guide for getting the correct version of Pycrypto installed on the Windows machine.

Okay, so now we should have a fully functioning Python environment with Fabric installed.  The only main issue that remains at this point (to my knowledge at least) is that pip still doesn’t work quite right when attempting to install various Python packages.  To get that part working you will need MinGW32 installed (reference above for links).  But that is basically out of the scope of this post, I will write another post about it if there is any interest or you can ask me if you have issues as always.

The only other piece left then is to get Fabric up and going with our Cisco gear.  Take a look at the docs for basic usage on getting acquainted with Fabric, it is fairly straight forward for the most part.

One thing I was not aware of was the way Cisco CLI and devices would behave when using Fabric to control them remotely.  I was having issues with Fabric flaking out whenever I went into config mode on a Cisco switch.  It turns out that when you enter into config mode you are essentially dropped into a new shell and Fabric doesn’t have a nice way to deal with that.  So something like this will bomb out,

def test():
	run("conf t", shell=False)
	run("int 1/0/1", shell=False)
	run("no shut", shell=False)
	run("exit", shell=False)

The “conf t” command opens your new shell and the Cisco gear freaks out because it doesn’t know what to do with the next command.  I should also mention the shell=False is somewhat unrelated to this issue but it gets around Fabric trying to use bash as its default shell.  The workaround?  Use the open_shell command in Fabric and escape each command by using \n to escape to a new line.  So a sample command using this format would be something like the following,

def test():
	open_shell("conf t \n"
		   "ip name-server 1.1.1.1 \n"
		   "exit \n"
		   "exit \n"
		   )

Yeah this is sort of hacky, and I’m not sure if it will be able to do everything I am looking for but hey at least it kind of works.  I am currently looking for a more robust and easier way around this limitation so if you have any suggestions let me know.

Credit goes to markmm on reddit for letting me know about this workaround as well as the people who hang out on the #fabric irc channel on freenode.

Document storage: Part 6

Document Storage Project

This is Part 6: Tying it all together.

All that’s left to do now is write a script that will:

  • Detect when a new file’s been uploaded.
  • Turn it into a searchable PDF with OCR.
  • Put the finished PDF in a suitable directory so we can easily browse for it later.

This is actually pretty easy. inotifywait(1) will tell us whenever a file’s been closed, we can use that as our trigger to OCR the document.

Our script is therefore in two parts:

Part 1: will watch the /home/incoming directory for any files that are closed.
Part 2: will be called by the script in part 1 every time a file is created.

Part 1

This script lives in /home/scripts and is called watch-dir.

#!/bin/bash
INCOMING="/home/incoming"
DIR="$( cd "$( dirname "${BASH_SOURCE[0]}" )" && pwd )"

inotifywait -m --format '%:e %f' -e CLOSE_WRITE "${INCOMING}"  2>/dev/null | while read LINE
do
        FILE="${INCOMING}"/`echo ${LINE} | cut -d" " -f2-`
        "${DIR}"/process-image "${FILE}" &
done

Part 2

This script lives in /home/scripts and is called process-image.

#!/bin/bash

# Dead easy - at least in theory!
# Take a single argument - filename of the file to process. 
# Do all the necessary processing to make it a 
# searchable PDF.

OUTFILE="`basename "${1}"`"
TEMPFILE="`mktemp`"

if [ -s "${1}" ]
then
	# We use the first part of the filename as a classification.
	CLASSIFICATION=`echo ${OUTFILE} | cut -f1 -d"-"`
	OUTDIR="/home/http/documents/${CLASSIFICATION}/`date +%Y`/`date +%Y-%m`/`date +%Y-%m-%d`"

	if [ ! -d "${OUTDIR}" ]
	then
		mkdir -p "${OUTDIR}" || exit 1
	fi

	# We have to move our file to a temporary location right away because 
	# otherwise pdfsandwich uses the file's own location for 
	# temporary storage. Well and good - but the file's location is 
	# subject to an inotify that will call this script!

	mv "${1}" "${TEMPFILE}" || exit 1

	# Have we a colour or a mono image? Probably quicker to find out 
	# and process accordingly rather than treat everything as RGB.
	# We assume the first page is representative of everything
        COLOURDEPTH=`convert "${TEMPFILE}[0]" -verbose -identify /dev/null 2>/dev/null | grep "Depth:" | awk -F'[/-]' '{print $2}'`
	if [ "${COLOURDEPTH}" -gt 1 ]
	then
		SANDWICHOPTS="-rgb"
	fi
	pdfsandwich ${SANDWICHOPTS} -o "${OUTDIR}/${OUTFILE}" "${TEMPFILE}" > /dev/null 2>&1
	rm "${TEMPFILE}"
fi

There’s just one thing missing: pdfsandwich. This is actually something I found elsewhere on the web. It hasn’t made it into any of the major distro repositories as far as I can tell, but it’s easy enough to compile and install yourself. Find it here.

Run /home/scripts/watch-dir every time we boot – the easiest way to do this is to include a line in /etc/rc.local that calls it:

/home/scripts/watch-dir &

Get it started now (unless you were planning on rebooting):

nohup /home/scripts/watch-dir &

Now you should be able to scan in documents, they’ll be automatically OCR’d and made available on the internal website you set up in part 3.

Further enhancements are left to the reader; suggestions include:

  • Automatically notifying sphider-plus to reindex when a document is added. (You’ll need a newer version of sphider-plus to do this. Unfortunately there is a cost associated with this, but it’s pretty cheap. Get it from here).
  • There is a bug in pdfsandwich (actually, I think the bug is probably in tesseract or hocr2pdf, both of which are called by pdfsandwich): under certain circumstances which I haven’t been able to nail down, sometimes you’ll find that in the finished PDF one page of a multi-page document will only show the OCR’d layer, not the original document. Track down this bug, fix it and notify the maintainer of the appropriate package so that the upstream package can also be fixed.
  • This isn’t terribly good for bulk scanning – if you want to scan in 50 one-page documents, you have to scan them individually otherwise they’ll be treated as a single 50 page document. Edit the script so we can somehow communicate with it that certain documents should be split into their constituent pages and store the resulting PDFs in this way.
  • Like all OCR-based solutions, this won’t give you a perfect representation of the source text in the finished PDF. But I’m quite sure the accuracy can be improved, very likely without having to make significant changes to how this operates. Carry out some experiments to figure out optimum settings for accuracy and edit the scripts accordingly.