Exchange Page Patching in action

In this blog, I will demonstrate the Exchange’s Page Patching capability, which firstly introduced in Exchange Server 2010 using the high availability mechanism.
Page Patching capability is included in all Exchange versions since Exchange 2010, while in my demo, I will be using the Exchange 2019 version.

When a database corruption caused by minor disk faults, Exchange Server 2019 page patching automatically repairs the corrupted database page by using one of the other database copies that are configured for high availability

More about the Page Patching mechanism. what are the exact steps for getting the right page and more, can be found at the next link:
https://techcommunity.microsoft.com/t5/Exchange-Team-Blog/Database-Maintenance-in-Exchange-2010/ba-p/602257#patching

During the steps below, I will cause a logical corruption on one of the database copies pages. This action can be done by replacing the bits on the disk where the message is located.
After the “corruption” will take place, Exchange 2019 page patching mechanism will self-recover the corruption by the replication service, requesting the fixed page from the other copy where the database is located.

Keep in mind that the data inside the database page is the same on all copies. This means that the content on page 320 for example, in DB01 on server MBX01, will contain the same data on page 320 in DB01 on server MBX02.
The only difference between the pages will be the location of the page on the disks/drives on the servers where the copies are located.

After self-recovering, the database will mount without any problems and the items on that page will be available with no issues.

Please DO NOT try this process on a live or production environment!

To generate the page patching process, I will run the next steps:

· Dismount a database and dump the database’s headers to view the state of the tables before sending a new message.

· Mount the database and send a new message to the Tkolber’s mailbox.

· Dismount the database and dump the database’s headers to view the state of the tables after sending the new message.

· Compare the 2 dumps with WinDiff to see the page where the new message is located.

· Find the page where the new message is on and change this page content. This process will cause a checksum error.

· Mount the database and review the event logs for the page patching process.

· Verify the mailbox is working OK and that the message can be opened.

To demonstrate Page Patching process, I used my lab, which includes 2 Exchange 2019 servers:
- EX191
- EX192
Tkolber’s mailbox is located on DB03, which is replicated between the 2 Exchange servers.

Please DO NOT try this process on a live or production environment!

  • Dismount database DB03 and dump the database’s headers.
    This is the only way to view the state of the tables and pages before sending a new message.
    To do that I will open CMD and navigate to the next location where the database is located: C:\DB\DB03
    Now let’s run the next command to dump the database’s headers:

eseutil /ms DB03.edb /v /f#legacy >c:\Temp\BeforeMSG.txt

  • Verify that the file was created on C:\Temp.
    The file should look like the next txt file:
  • Now mount the database DB03 on Ex191
  • In this step, I will log on to OWA as EXADM and send a new test message to Tkolber’s mailbox with the subject line “Test”.
    Later on, we will search for this message location inside the database and cause corruption.
  • After the message arrived at Tkolber’s mailbox, I will dismount the DB3 database to dump its headers.
    Inside the database’s headers, we will be able to view the state of the tables and pages including the new message.
    To that, I will run the following command:

eseutil /ms DB03.edb /v /f#legacy >c:\Temp\AfterMSG.txt

  • Now let’s check that the file was created on C:\Temp
  • To compare the 2 dumps (C:\Temp\BeforeMSG.txt and C:\Temp\AfterMSG.txt) I will use WinDiff, but of course, it can be done with any other software which compers 2 files.
    Using WinDiff, we will be able to see the page that was changed since we sent a new message.

First, select the BeforeMSG.txt file dump

Now select the AfterMSG.txt file dump

  • We will be able to see the differences between the 2 files marked with red and yellow (one color for each file).
  • The 4th field called the PgnoFDP represents the page number in each raw.
    If we take a closer look, we will see that Message_101 was changed between the 2 dumps after the message was sent.
    In this example, the change took place on page 265:
  • Now that we have the page number from the previous step, we will be able to change it. You can change the page only if the database is in a dismounted state!
  • In this example, I will use “HexEditor” to change the data inside the DB03 database, this action will cause checksum error and trigger the Page Patching mechanism.
    To do that I will choose “File” — Open” and select the database file DB03.edb.
    We will be able to see the raw data inside the database.
  • As you remember, the page we are looking for with the message is page 265.
    To find page 265 and change it, we should run the next calculation:

Page number 265

32Kb per page (32 X 1024)

265 x 32 x 1024 = 8,683,520 (in Decimal)

Now we need to convert 8,683,520 to Hex.
To do that, just click on hex

  • To navigate page 265 which located at 848000, select the “Search” menu, then “GoTo” and enter the number we have calculated 848000:
  • The cursor will jump right to the begging of page 265.
    Now go over from the first line for about 30 lines and change all the bits to “1
  • From the main menu click “File” — Save”.
    Then from the main menu click “File” — Exit”.
  • Mount back database DB03 on EX191 and verify that the database is mounted and that the replication is in a healthy state.
  • From EX191, review the Application event log and search for event 474 (checksum error) :

We can see that the system detected a mismatch of the database’s checksum, which means that the data on disk (inside the database) and the data that should be there is not the same.
On previous versions of Exchange (Pre Exchange 2010 SP1), this kind of event would cause “fixing” the database using eseutil /p or restoring the whole database from backup.

  • Right after event 474, we will be able to see events 103 and 129 which ignites the page patching process:
  • Finally, event 905, will show up and update that the page patching process was completed successfully.
  • I will log on to Tkolber’s mailbox and successfully open the message we send earlier.
  • By the way… If we will check the database header, we will be able to see that a checksum error was detected

eseutil /mh DB03.edb

Conclusions:

During the last 13 years, I'm working as a Senior Customer Succes Engineer (former PFE) at Microsoft. My areas of expertise are Exchange, Powershell & Azure.