A word to the wise. If you all of a sudden are unable to send and receive email messages in your Exchange environment, take a look and make sure the Exchange server disks aren’t being filled up. Today I ran across an interesting (and by interesting I mean that this could have caused a serious outage) issue where Windows updates were very routinely being downloaded for our next patch management installation cycle but unknowingly were also causing our email services to stop functioning correctly. I am thankful the scenario didn’t get ugly and luckily this event gives me the opportunity to talk about a few of things that I think might be useful for readers and other admins.
It turns out that this month’s wave of Windows updates caused the disks on our Hub Transport servers to quietly fill up during the day, unbeknownst to any of the admins. In normal circumstances this process is by design and almost never becomes an issue, however in this case there was not enough disk available for Exchange to work correctly. This could have been disastrous had we not known that the disk was starting to fill up. We could have been chasing our tails for a much longer period of time and the situation could have escalated to a more stressful situation. For some reason, the company likes to be able to send and receive emails. Thank god for monitoring that works.
There are a couple things that need to be investigated at this point. First, had we not known that the Windows updates were what were causing the disk to fill up, a logical place to start looking for clues would be to examine the log files on the suspect servers. I would like to take a little bit of time and quickly go over some steps for looking at logs in an Exchange environment, when thinking about potential disk space issues a few things come to mind. Are log files growing rapidly? Did somebody turn on verbose logging and accidentally forget to turn it off? To verify the logs aren’t the issue there are a few places that are good to look. If you are familiar with or have ever used message tracking in Exchange you know how powerful it can be. Sometimes that can also potentially be an issue with your disk filling up. Here is the location that these message tracking logs are stored:
C:\Program Files\Microsoft\Exchange Server\V14\TransportRoles\Logs\MessageTracking
Another location that gets used when you turn on verbose logging for troubleshooting send or receive connectors are the smtpsend and smtpreceive directories. These can fill up quite quickly if you forget to turn off verbose logging on a send or receive connector when are you done troubleshooting. This location is here:
C:\Program Files\Microsoft\Exchange Server\V14\TransportRoles\Logs\ProtocolLog
Finally, there is a location for logging protocol settings on the hub transport. These logs can be found here:
C:\Program Files\Microsoft\Exchange Server\V14\TransportRoles\Logs\ProtocolLog
I would like to point out quickly that any and all of the behaviors of these logging methods can be modified using the Exchange Management Shell, and sometimes for more detailed settings can only be modified by the EMS.
If these quick spot checks don’t uncover any immediate problems another good technique to help gain some insight into where your disk space issues are is to use a tool that enumerates file locations and file sizes. There are a few tools available, one of them I like to use is Space Sniffer. It is fast, easy to use and gives a good visual representation of directory sizes and file sizes. The tool can do much more but in this case we are just interested in finding the disk issue quickly. We were able to quickly find that the size and contents of the %windir%\softwaredistribution\download folder were growing rather quickly. I just happen to know that this is the temporary location that Windows uses to store Windows update files before they are installed.
There are a few things that can be done here. You can either clear the temporary Windows updates files, delete other unnecessary files or you can grow your disks. We were lucky because our Hub Transport servers are VM’s and increasing the disk size of these servers is simple. That seems like the best option if it is a possibility, just in case something like this happens again we will have the additional space so the Exchange servers won’t bog down.
Ultimately we prevented the disaster from occurring but the incident is a great illustration of the lesson I’d like to share. Make sure you have a good monitoring and alerting solution in place. Otherwise you may not have any clue where to start looking. If we did not have a reliable monitoring tool in place it would have been much more difficult to track this problem down in the first place because our Exchange environment is large and complex. Because we have good monitoring tools we were able to quickly identify the problem and resolve it before anything bad happened. On a side note, I am still thinking about how we can take this monitoring and alerting one step further in the future to become proactive instead of reactive but for now the monitoring tools are doing their job and because of this we avoided a potential disaster. If you have any thoughts on proactive monitoring and alerting relating to these types of disk issues let me know, I’d love to hear how you handle it.