Preventing and recovering from MailStorm within Exchange servers

Tzahi Kolber
14 min readJan 27, 2023

In this blog I will describe what are the signs and what must be done to deal with a MailStorm situation in which there is a “rampant” of huge amounts of mail that lead to a situation of slowness in sending and receiving mail and sometimes stopping the passage of mail completely.

Of course, it is desirable first of all to try and prevent a MailStorm situation at all, this can be done by proactive actions such as monitoring and proactive tests in order not to reach a situation where there is extreme mail traffic that will affect mail arrival times in an unusual way.
However, if we are already in this type of event, there are two main ways of coping that can be a solution:

  1. Transport rules:
    By defining rules, which are customized to the situation that caused the MailStorm, for example mail sent from a certain user to the entire organization or mail with a certain subject that was distributed in an uncontrollable amount.
  2. PowerShell to handle the various Queues:
    With the help of a set of commands, you can clean, export or transfer mail to “clean” the mail sending routs from the created load.

Identifying incoming MailStorm situation.

There are several ways to detect a MailStorm situation, where the best way is of course comprehensive and continuous monitoring of all mail services, starting with the mail servers, through the Mail Gateway servers, and up to the various communication components through which the mail services pass if there are such as Load Balancers for example.

Testing at the level of mail servers.

There are several places and tests that must be performed on the mail servers to check whether we are in a trend of a problematic increase in the amount of mail passing through the servers:

  • Checking the status of the various Mail Queues:
    This test is very simple and can be implemented either by opening the PowerShell interface or by the Queue viewer interface found in the Exchange Toolbox.

Once the interface opens, select Queue Viewer.
In this interface you can see all the Queues of the server we connected to.
If there is an unusual accumulation of mail amounts, it will be possible to see it as in the following example:

  • Checking the status using PowerShell:
    You can run the following command from the Exchange PowerShell interface on each of the servers to find out the amount of mail in the various queues on the local server on which we opened the Exchange Management PowerShell interface:

Get-Queue

In case you would like to get the status of all queues on all Exchange servers, run the next command:

Get-TransportService | Get-Queue

  • Checking performance counters related to the queues status:
    Via this test, we can check the current growing mail flow, the amount of mail passing through the servers and the load of mail at a given time.

\MSExchangeTransport Queues(_total)\Messages Queue for delivery per second

\MSExchangeTransport Queues(_total)\Messages Queue for delivery

  • Checking the status of free disk space:
    If there is less than 10% free space on the disks where the Queue Database or Queue Logging are located, there will be a slowdown and even a complete stop of mail traffic.
    Therefore, it is necessary to first find out where the Queue Database and Queue Logging are located.
    This can be done by running the following two commands:

Get-Content ‘C:\Program Files\Microsoft\Exchange Server\V15\Bin\EdgeTransport.exe.config’ | Where-Object { $_.Contains(“QueueDatabasePath”) }

Get-Content ‘C:\Program Files\Microsoft\Exchange Server\V15\Bin\EdgeTransport.exe.config’ | Where-Object { $_.Contains(“QueueDatabaseLoggingPath”) }

You can see that both the Queue Database and Queue Logging paths are in this case on drive C:\
Therefore, it is necessary to find out whether there is 10% free space on disk C:\
This can be checked by running the following command with the appropriate drive letter (C in this case):

((((Get-Volume -DriveLetter C).Sizeremaining)) / ((Get-Volume -DriveLetter C).size))*100

In this case there is about 60% free space left on disk C therefore, there is no problem in this case.

  • Checking the disk formatting setting
    The type of formatting the disk on which the Transport files are located is very important for the performance of the mail transfer.
    Like the databases, the Mail.que and the logs associated with Transport will work faster and better when the disk on which they are located is formatted as 64KB (compared to 4KB which is the default formatting).
    To test this, the following command must be run in the CMD interface.
    Note that it is mandatory to run the CMD interface command with Elevated permissions as Administrator:

fsutil fsinfo ntfsinfo g:\

In this example, suppose that the Transport files are on drive G:
You can see that the Bytes Per Cluster value is 4096, which means that the disk is formatted as 4KB.

According to the best practice, the above value should be 65536 as in the following example:

If the server is already installed and the Transport files are already configured, we will need to move the Transport files to a temporary location and after formatting the disk, we will return them back to the same location.
Please notice that it will require a short downtime in which the transport services will not be available on the server on which we are making the change.
Follow the following article to carry out the process in an orderly manner:

Change the location of the queue database | Microsoft Docs

  • Checking the events at the Event Viewer
    If there is an abnormal load on the server and there are only few resources left (CPU, Memory & Disk), it will affect mail traffic to the point of stopping incoming mail and preventing the exit of mail that is already on the server.
    This situation is called back pressure.
    In such a situation, the server is likely to generate topic-related events in the Application Event Log.
    The Events numbers will range from 15004–15007.
    More information about the various events and their significance can be found at the following link:

Understanding back pressure | Microsoft Learn

Testing communication and transmission components

Beyond checking the mail servers, there may be communication and transmission elements involved in the mail traffic chain that should be checked as well.

  • Checking SMTP abnormal loads on the Load Balancer.
    If there is a load balancer component through which mail traffic is carried out to / from the mail servers, it is advisable to check the volume of mail traffic on a daily basis and find out whether there is an exception from other days.
    By collecting statistics and comparing the amount of traffic compared to “normal” days when there are no malfunctions or congestion in the email environment, it is possible to examine whether there is initially an increase in the quantities of mail and whether we are ahead of MailStorm.
  • Checking SMTP flow at the mail gateway components.
    As part of the mail flow chain, if the organization is linked to the other organization or to the Internet, there will most likely be mail filters through which mail travels from the mail servers to external entities.

When there is a connection to any external entity, such as another organization or to the Internet, the loads and the accumulation of queues, if any, should also be examined in these components.
As in the mail servers and communication components (if any) that are in the mail transit chain, it is necessary to sample the daily queues of the emails and compare the existing quantity and check whether there is a significant deviation that will eventually lead to MailStorm.

Dealing with MailStorm situation

As mentioned at the beginning of this blog, there are two ways to deal with MailStorm:

1. Transport rules: by defining rules, which are customized to the situation that caused MailStorm, for example mail sent from a certain user to the entire organization or mail with a certain subject distributed in an uncontrollable amount.

2. PowerShell for handling the various loads on the queues:
With the help of a set of commands, you can clean, export or transfer mail in order to “clear” the mail sending routes from the resulting load.

Using Transport Rules

Transport rules are a set of rules that are configured to identify mail traffic by sender, recipient, subject and destination location (internal or external), attachments, message size, or message classification.
You can set what to do for messages that match the defined conditions, and don’t match any of the exceptions.
Such as rejecting, deleting, or redirecting messages, adding more recipients, or, for example, adding a prefix in the subject of the message.
If the overload or attack can be identified by a certain symbol such as the subject of the message, a certain message size or a specific recipient marked as a target, a Transport Rule can be created that will prevent the arrival of mail and thus allow the flow of the rest of the normal mail traffic.

  • Creating a transport rule
    You can define for example a rule that in this case will delete all mail traffic whose subject is “checked by”:

New-TransportRule -Name “Mail Storm” -Enabled $true -SubjectContainsWords ‘checked by’ –DeleteMessage $True

In the following case we will define a law that will delete any mail
received from gmail.com:

New-TransportRule -FromAddressContainsWords @('gmail.com') -DeleteMessage:$true -Name 'Block-Gmail' -StopRuleProcessing:$false -Mode 'Enforce' -Comments 'This Rule blocks ALL emails from gmail' -RuleErrorAction 'Ignore' -SenderAddressLocation 'Header'

Using PowerShell to manage queues.

PowerShell commands are the easiest way to delete mass mail from the queues.
There may be a situation in which the load on the servers is so great that all the queues are already filled with mail stuck on the servers and it will not be possible for new mail to enter and / or old mail that is already on the server to exit.
In this situation, a much more extreme and surgical operation will be required.

  • Deleting an email with specific characteristics from a queue
    In a situation where there are one or more mail servers with an unusual amount of mail, but the server is still able to send and receive mail, you can use the following commands, which perform a scan on the queue of a mail server or mail servers and deletes only the mail that meets the search criteria.

In this example, the command will go over the Queues rule of all mail servers and delete all emails with the subject Testing Only

Get-TransportServer ex131 | Get-Queue | Get-Message -ResultSize unlimited | Where {$_.Subject -eq “Testing only”} | Remove-Message -WithNDR $False -Confirm:$false

Or

Get-transportServer ex131 | remove-message -filter {Subject -eq “Testing Only”} -withNDR:$false — Confirm:$false

In this example, the command will go through the Queues rule of all mail servers and delete all emails in which the sender is exadm@msft.net

Get-TransportServer | Get-Queue | Get-Message -ResultSize unlimited | Where {$_.FromAddress -eq “exadm@msft.net”} | Remove-Message -WithNDR $False -Confirm:$false

Clearing the effected queue and exporting the imported emails

In a situation where the load on the servers is so high, because all the queues are already filled with mail stuck on the servers, it will not be possible for new mail to enter and / or old mail that is already on the server to be sent.
To handle this crisis, a much more extreme and surgical operation will be required.

1. Stop the Transport Service on the server where the emails accumulate (EX131 in this example).

2. Migrate the Queue Database and Queue Logging to another inactive mail server (EX161 in this example).

3. Re-uploading the Transport Service on the server where the emails have accumulated in order to allow normal mail traffic again.

4. Stop the Transport Service on the server where the emails accumulate (EX161 in this example).

5. Replacing the files from the problematic server EX131 to the server from which we will export the EX161 mail and upload the Transport Service again.

6. Exporting important mail that you wish to send from the Queue that we issued in Step 2

7. Return his mail was exported, back to the source server.

The following example on the EX131 server contains a large number of emails that could not be send.

We will use an EX161 server that is inactive and, in another environment, to retrieve the data (mass mails) that was in transit without the ability to send on EX131 server and return the same selected mail back to the original server EX131.

1. Stop the Transport Service on the server where the emails accumulate (EX131 in this example)

In this step we will stop the transport service on EX131

2. Migrate the Queue Database and Queue Logging to another inactive mail server (EX161 in this example)

Before transferring the files, check where the Queue Database and Queue Logging are.
This can be done by running the following two commands:

Get-Content ‘C:\Program Files\Microsoft\ExchangeServer\V15\Bin\EdgeTransport.exe.config’ | Where-Object { $_.Contains(“QueueDatabasePath”) }

Get-Content ‘C:\Program Files\Microsoft\ExchangeServer\V15\Bin\EdgeTransport.exe.config’ | Where-Object { $_.Contains(“QueueDatabaseLoggingPath”) }

You can see that the two paths of the Queue Database and the Queue Logging are in this case on drive C:\

  • Now we will move all the files from the location above, from the source folder, in this case where both the Mail.que and the logs are located:

C:\Program Files\Microsoft\ExchangeServer\V15\TransportRoles\Data\Queue

To a different path at the next location:

C:\Program Files\Microsoft\ExchangeServer\V15\TransportRoles\Data\Queue\SRC

  • After transferring the files, the original location will remain empty as in the following example:

3. Restarting the Transport Service on the server where the emails have accumulated in order to allow normal mail traffic again (EX131)

Now start the Transport Service on the server where the emails were accumulated before (EX131)
In this operation, the server will actually generate new Transport Database + Logs files, allowing the mail traffic to run continuously again.

4. Stop the Transport Service on the server where the emails accumulate (EX161 in this example)

Before replacing the files on the destination server intended for exporting the desired mail from the various queues (files), stop the Transport Service on the destination server (EX161):

5. Replacing the files from the problematic server EX131 to the server from which we will export the EX161 mail and upload the Transport Service again.

After the Transport Service is stopped on the EX161 server, we will delete all Queue Database and Queue Logging files on the server.

  • After we have stopped the Transport Service on the server and deleted the files that were previously on the server, all the files that we saved in the folder in step 2 must be copied:

C:\Program Files\Microsoft\ExchangeServer\V15\TransportRoles\Data\Queue\SRC

  • To the path where the Queue Database and Queue Logging files on the destination server (EX161):

Now we will run the Transport Service back on the EX161 server in order to export from it in the next step the desired mail from the various Queues.

6. Exporting important mail that you wish to send from the Queue that we issued in Step 2

After enabling Transport Service back on the EX161 server at the end of the previous step, there seems to be a large queue of messages (which previously existed on the EX131 server), from which we will export the messages:

Get-Queue

  • To see the number of messages or sample messages sent to billg@align.com recipient, run the following command:

Get-Queue EX161 | Get-Message -ResultSize unlimited | Where {$_.Recipients -like “billg@align.com”}

  • Before exporting messages, you must suspend messages that are in Queue.
    Therefore, before exporting, the following command must be run:

Get-Queue EX161 | Get-Message -ResultSize unlimited | Where {$_.Recipients -like “billg@align.com”} | suspend-Message

Make sure that the message(s) are in a Suspend state:

  • To export a single sample message sent to billg@align.com recipient, to eml format mail, that can be read by Outlook or another email interface, run the following command:

Get-Queue EX161 | Get-Message -ResultSize unlimited | Where {$_.Recipients -like “*billg@align.com*”} | Export-Message | AssembleMessage -Path “C:\Temp\Billg.eml”

If the message(s) has not been suspended, you will receive the following error message:

  • After performing Suspend we will run the command again and confirm the request:
  • At the next step, we explain how to return the message to the message cycle so that it is sent correctly from the EX131 server.
  • If there are a large number of messages for the same recipient, a large number of messages with a certain subject or a large number of messages from a particular sender, you can run the following commands to export a quantity of messages:

$msg=Get-Queue EX161 | Get-Message -ResultSize unlimited | Where {$_.Recipients -like “*billg@align.com*”}

Now we will set all messages in Suspend mode:

$msg | Suspend-Message

Now we will export all the desired mail to the C:\Temp folder on the server:

$msg | ForEach-Object {$Temp=”C:\Temp\”+$_.InternetMessageID+”.eml”;$Temp=$Temp.Replace(“<”,”_”);$Temp=$Temp.Replace(“>”,”_”);Export-Message $_.Identity | AssembleMessage -Path $Temp}

We can see that all messages are located as eml files in the C:\Temp folder on the EX161 server.

7. Return his mail was exported, back to the source server.

  • After exporting the messages to eml files in the C:\Temp folder on the EX161 server, we can return them to the mail traffic circuit on the EX131 server.
    To do this, we will copy the messages (the .eml files) from the C:Temp folder to the following folder on the EX131 server:

C:\Program Files\Microsoft\Exchange Server\V15\TransportRoles\Pickup

  • After copying, the files will change to the .tmp extension, this change is normal and is part of the process:
  • After a very short time, we can see that the messages from the folder have moved to the Queue of the EX131 server, where the Source column contains the value, Pickup.

That’s it!

--

--

Tzahi Kolber

During the last 15 years, I was working as a Senior PFE within Exchange area at Microsoft. Now I’m Senior Consult as Azure IAAS, PowerShell & Automations.