Due to an outage this weekend, I’d like to take a minute to briefly describe the scenario that occurred and how it was resolved. If you are having trouble starting your Exchange Transport Service then you may potentially be running into the same issue I was having during the outage. Luckily there is an easy remedy for the service failing to start. Basically what was happening was the Exchange message queue database was beginning to fail due to some sort of corruption, causing the Transport service to fail. Because the Transport service wasn’t running, the Edge Sync process was failing, causing external mail delivery to fail. Obviously a big issue, since you cannot receive any email from external domains if this is not working correctly.
To troubleshoot this, there are a few obvious signs that you should look at first. The main thing you should check first is your disk sizes, I wrote about it in my previous post. If your disks are full or are filling up then you are pretty much dead in the water and will need to fix your disk issue. In my scenario the disk sizes were not an issue so the next tool I turned to were the logs. I found a number of interesting entries in the Windows Application Event logs that gave me some clues. I want to detail as many of these messages as I can so that people who are having similar issues know what to look for.
There are a few possible resolutions to this problem. Through some Google searches one solution I found is that you can attempt to repair the corruption in the queue databases by running the database through ESE util. There is no guarantee this will work and it can potentially take a lot of time, depending on the size of your queue database. There is some good information here about the mail queue and how it works.
If you decide to repair the database, the mail queue file is located in the following location:
C:\Program Files\Microsoft\Exchange Server\V14\TransportRoles\data\Queue
Inside this directory is a file called tmp.edb. This is the file that you will need to repair.
The other method is much simpler and was the solution I went with. Instead of attempting to repair the database corruption, simply copy and rename the queue folder and restart the Transport service. Doing this will force the Transport service to create a new, fresh copy of the database queue along with all of the accompanying config files and associated items that are required to get things up and running. It is faster and simpler, IMO. The only problem with this approach is that items that were stuck in the queue when the database corruption occurred will be lost. For me, this was an acceptable loss. If not, you will probably have to use the first method and attempt to repair the database or try to somehow work with a shadow copy or backup somehow to get unstuck.