The following blog post is a trimmed-down excerpt from the eBook "Exchange Server Troubleshooting Companion":
One of the primary talking points of my Storage Configuration Options for Exchange session at IT Dev Connections was around JBOD with Exchange, and what that definition means to various people. Since Exchange 2010 and the advent of the Database Availability Group, the idea of deploying Exchange onto JBOD has spread like wildfire. Starting with laughs and jeers from the IT community at the mere idea of placing production workloads onto non-RAID protected direct-attached storage, evolving to the largest Exchange deployment in the world (Office 365) running on JBOD storage. Not only does Exchange Online run on JBOD, but mailboxes in the service do not have traditional backups performed against them. Instead relying upon Exchange Native Data Protection. Quite a shocking fact, especially if you were to present it to an Exchange Admin around the year 2009.
For the correct deployment architecture, JBOD actually makes a lot of sense once you understand the performance and High Availability improvements made in the product. Exchange 2013/2016 requires ~90% fewer IOPS than Exchange 2003, making large/slow/cheap disks such as 6TB 7.2K NL SAS a deployment option. This also removed the requirement for expensive SAN storage with optimized RAID configurations to achieve acceptable performance. Also, with a DAG providing up to 16 geographically diverse copies of your mailbox database (although 3-4 is a more practical number), there’s no need to waste drives on RAID when the application itself can handle your data availability and redundancy.
While I’m not here to tout the awesomeness that is Exchange High Availability (that actually is its own book), I did want to discuss a common misconception around the hardware requirements for an Exchange JBOD solution (which is the Preferred Architecture). Misconceptions that I’ve seen many encounter, which resulted in poor performance and escalations to their hardware vendor. In every case, it was not the deployed hardware which was at fault, but rather an inappropriate hardware configuration.
The definitive source of information regarding Exchange storage is the Exchange Storage Configuration Options TechNet article. It details the various supported configurations, storage media, and best practices for Exchange storage; such as:
- Media types and expected speed
- Database and Log File location recommendations
- Requirements for when JBOD is acceptable (opposed to RAID)
- RAID array stripe size
- Controller cache settings
- Maximum database size recommendations
- Supported file systems
- Encryption supportability and recommendations
- Deduplication supportability
Within these recommendations is the following guidance around controller caching settings under the “Supported RAID types for the Mailbox server role” section:
Storage array cache settings:
- The cache settings are provided by a battery-backed caching array controller.
- Best practice: 100 percent write cache (battery or flash-backed cache) for DAS storage controllers in either a RAID or JBOD configuration. 75 percent write cache, 25 percent read cache (battery or flash-backed cache) for other types of storage solutions such as SAN. If your SAN vendor has different best practices for cache configuration on their platform, follow the guidance of your SAN vendor.
This guidance leads me to my goal, of detailing the hardware requirements and configuration required of properly deploying an Exchange solution on “JBOD.” Unfortunately, many have incorrectly assumed that Exchange on JBOD means a caching controller is not required to achieve the necessary performance. This is due to the notion many have that JBOD is simply a disk connected to a server, with no intelligence or sophisticated controller whatsoever. With Microsoft advertising the performance improvements in Exchange storage and it requiring many fewer IOPS per mailbox, some mistakenly assumed a caching controller was not required. This is an incorrect assumption.
While the Exchange Product Team technically supports the lack of a caching disk controller, it makes no guarantees that the solution will be able to provide the performance necessary to run Exchange. This is why running Jetstress is such an important step in an Exchange Deployment. Personally, the only reason I feel the lack of a caching controller is still supported is so that Storage Spaces (which require the absence of a caching controller) can remain supported, however that’s purely my own speculation. So what’s the big deal you might ask? If Microsoft does not require a caching controller in a JBOD solution from a supportability perspective, why is it so vital to a successful and well-performing Exchange solution? Let’s start by defining the types of cache and their recommended settings.
Hard drives have a Disk Buffer, often called a Disk Cache, used for caching writes to disk. This caching operation occurs when the operating system issues a write to disk, but before the data is actually written to a platter, the drive firmware acknowledges to the OS that the write has been committed to disk. This allows the OS to continue working instead of waiting for the data to actually be committed. This potential period of latency can be significant on slower rotational media. This caching increases performance with the known risk that should the system lose power before a write is committed to disk, but after the OS has received acknowledgement of committal, data loss/corruption will occur. This is why a UPS is required for such a configuration.
Unfortunately, the cache on NL SAS disks is notoriously small (typically 64-128MB) and unreliable. Meaning they really are not suitable for enterprise workloads. The cache can be easily overwhelmed or susceptible to data loss, also known as a lost flush. Should a customer purchase a low-end RAID controller which does not contain cache, they can only rely on this on-disk cache for write performance.
Disk Array Controllers typically have much larger (512MB-2GB) and more robust cache, capable of delivering significant write performance. This feature, in my opinion, is the single biggest contributor to delivering proper disk write performance in enterprise server solutions. In fact, the topic of controller caching (or lack thereof) is one of the most common call drivers in hardware vendor storage support.
It happens far too often. A customer will build a server with 96GB of RAM, 16 CPU cores, and 10TB of storage, but skimp on the RAID controller by purchasing one without cache. A low-end RAID controller may save a few hundred dollars but will turn an otherwise robust system into one incapable of sustaining a storage-intensive enterprise workload. This is because the on-disk cache, which I previously mentioned is easily overwhelmed, is all that stands between you and significantly poor storage performance.
On several occasions I’ve handled Exchange performance escalations where a low-end controller was purchased on the assumption that an Exchange “JBOD” solution did not require one. This was often discovered during Jetstress testing, but in some unfortunate occasions, the customer had already begun the deployment/migration because they chose to forgo Jetstress testing.
Even some modern high-end controllers with > 1GB of cache can encounter this problem when not properly configured. Because of modern solutions like Storage Spaces, which require no cache be present, some controllers offer a non-RAID/Pass-through/HBA/JBOD (name varies by vendor) mode. This feature allows selected disks to be presented to the OS as raw disks, with no RAID functionality whatsoever; meaning no cache. Again, because of customer misconceptions, I’ve encountered customers who used this mode for an Exchange Server JBOD deployment because they incorrectly assumed it was appropriate. What makes this most unfortunate is that not enabling the Write-Cache or purchasing a low-end controller are fairly easy to recover from. You either reconfigure the cache (does not even require a reboot) or upgrade the controller (which will import the RAID configuration from the disks), neither of which involves data destruction. However, if a customer deployed Exchange using the Pass-through option, the drive would have to be rebuilt/reformatted. This is where you really hope to discover this during Jetstress testing and not after migrating mailboxes.
So how do we properly configure an Exchange JBOD solution? I personally like to use the term RBOD for this, because you’re really creating a server full of single-disk RAID 0 virtual disks/arrays. This effectively functions as a JBOD solution, which also allows you to reap the benefits of the controller’s cache. Having spoken with Microsoft employees, this is how JBOD is implemented in Office 365.
When creating this array/virtual disk for Exchange JBOD, the following settings should be used:
- Write Cache: 100% of controller cache should be used for writes
- Read Cache: 0% of controller cache should be used for reads
- Stripe Size: 256K or 512K (follow vendor guidance)
- On-Disk Cache: Disabled
A few things to note regarding the above list. The on-disk cache should be disabled to avoid double-caching (caching at the controller as well as the disk) as well as the possibility of overwhelming the on-disk cache. If this is left enabled, the risk of a Lost Flush is increased. Also, each vendor uses different terminology for caching settings. For example, Dell does not use a percentage value for configuring their cache, instead it’s either Enabled or Disabled. For example:
- Write-Back=Write caching enabled
- Write-Through=Write caching disabled
- Read Ahead=Read caching enabled
- No Read Ahead=Read caching disabled
- Disk Cache Policy Enabled/Disabled=On-Disk Cache
Follow your vendor’s guidance when configuring any of these settings.
What if I don’t follow this guidance?
What if a customer were to still decide against using a controller with caching functionality? Or to use Pass-through mode? Or to not enable the write cache? Technically, such a solution would be supported by Microsoft. However, realistically the only solution which would pass Jetstress would be one where users had a very low IO profile. Maybe a customer who only sends/receives a few emails daily, and therefore would not require many IOPS per server.
I wanted to perform a quick test on my lab server. A physical system with 64GB of RAM, 12 CPU cores, and a RAID controller with 1GB of cache. My plan was to use Jetstress on the system with Write-Cache enabled, note the time it took to create the test databases and the achievable IOPS, then repeat the test with the Write-Cache disabled. I expected the testing to be much slower with caching disabled, but even I was surprised with how drastically different the results were.
- 1 Database (40GB)
- Dedicated virtual disk/Windows partition for database and logs
- Disk Subsystem Throughout Test
- Performance Test
- Duration=15min (.15)
Note: The same virtual disk/array was used for both tests; the only change was the caching setting
Results with Disk-Cache Enabled:
- Time to create 40GB test database=8 minutes
- Jetstress Result=Pass
- Achieved IOPS=300
- Thread count used=6 (via Autotuning)
Results with Disk-Cache Disabled:
- Time to create 40GB test database=4 hours
- Jetstress Result=Fail (due to write latency)
- Achieved IOPS=50
- Thread count used=1
Note: Tests in my lab with caching disabled would always fail with Autotuning enabled. Therefore I had to manually configure the thread count. After several tests (all failing), I configured the test to only 1 thread, which still failed due to log write latency.
Needless to say, I was surprised. The test went from taking 8 minutes to create a 40GB file, to 4 hours! It was more latent by a factor of 30! Now imagine instead of Jetstress, this was production Exchange. Maybe the Administrator or Consultant thought the initial install was taking longer than expected, maybe the mailbox moves were much slower than anticipated, and maybe the client experience was so slow it was unusable. This is usually where the product or the hardware are blamed by the users and upper management. It usually takes an escalation to Microsoft and/or the hardware vendor to explain the importance of a caching controller.
Hopefully, as an individual designing, managing, or troubleshooting the performance of an Exchange deployment, knowing the proper caching settings can prove a valuable skillset.
Note: Always follow hardware vendor guidance. Also, this guidance was specifically for Exchange direct attached storage solutions. For SAN or converged solutions, contact your vendor for guidance. Lastly, always run Jetstress to validate an Exchange storage solution before going into production.