When it comes to sizing a typical on-premises Exchange Server deployment, Microsoft has really gone out of their way to provide all the information you need. Along with the Mailbox Role Requirements Calculator, I believe Microsoft’s guidance to be one of the most complete in the industry—–leaving little to the imagination and with clear guidance on what you should and should not do.
Despite the extensive information provided in that documentation, one aspect of a potential Exchange Server scenario that is poorly documented, if not entirely missing, is how to properly size for Edge Transport servers in both a pure on-premises or hybrid deployment. In fact, the sizing of a hybrid deployment is a topic that doesn't get much attention at all. In part, this is because the (sizing) requirements of a hybrid deployment aren’t all that different, if at all, from an on-premises organization. If your Exchange servers can handle the load for your users on-premises, they will also be able to handle the load when you configure a hybrid connection.
In this article, I will describe the process I typically go through to size Edge Transport servers for hybrid deployments. It goes without saying that I am not Microsoft, and you should always use common sense when designing an environment of your own.
Why use Edge Transport servers?
Before we dive into sizing for Edge Transport servers, let’s first look at why one would bother using them in an on-premises or hybrid deployment in the first place. Truth be told, I think Edge Transport servers are a pretty horrible anti-malware/anti-spam solution. So, that can’t be the reason...
The most common use case I come across (but this is perhaps because I’m biased) is when an organization has corporate/legal/regulatory requirements that somehow stipulate that connections originating from the Internet must be terminated in the perimeter network (DMZ). Even when you are using, for instance, Exchange Online Protection for filtering purposes and without on-premises Edge Transport servers, filtered mail would travel from EOP directly to the backend transport servers. Given you cannot (should not, really!) deploy them in the DMZ, Edge Transport servers are the only supported way of terminating the connection in a DMZ before passing of traffic to the internal servers. You could, of course, use some other on-premises gateway to terminate the connection before handing them off to the internal transport servers, but that will most likely lead you down a different rabbit hole.
Except for the fact that I believe they are awful from a mail filtering perspective, Edge Transport servers can be quite helpful in many other ways. First, they help you fulfil the requirement to terminate incoming connections (for SMTP!) in the DMZ, but you can also configure additional transport rules (and thus separate those from the internal deployment) and even do some address rewriting. Add in a customized transport agent, and you’ve got a pretty flexible mail-handling solution!
How many servers are enough?
Sizing for a hybrid deployment in general isn’t at all difficult. Although there isn’t an entire book written on how to do it, the logic is simple: size the Exchange server environment to handle the load for the on-premises users appropriately, and you’ll have no issues. There’s no need to deploy additional Exchange servers to handle regular user and/or migration load. There are some corner cases where you might want to deploy additional servers to allow for specific migration scenarios, but that has nothing to do with sizing per se.
When you already have an Edge Transport server farm in your environment, it’s very unlikely you’ll have to make changes to it. However, you must consider that the servers might get extra busy during the migration as they will handle internal mail flow for a while too; traffic flowing between recipients on-premises and recipients in Exchange Online will go through the Edge Servers too. The thing is that this, too, is only temporary. Once you’ve migrated most your users to Office 365, internal mail flow will be handled entirely by Exchange Online, and you might even face excess capacity on your Edge Transport servers...
So, how do you go about if you don’t have any Edge Transport servers to begin with? An Edge Transport server is nothing more (or less) than some sort of Hub Transport service, albeit with limited functionality and living in the DMZ. From that perspective, I lean on most of the guidance for a typical transport service. As such, the most important things to consider are storage and CPU.
The most crucial part of an Edge Transport server is to ensure there’s plenty of storage for SafetyNet. To understand how much disk space you need, you must calculate the (overall) requirement for SafetyNet storage and then divide that number by the total amount of Edge Transport servers you will deploy.
The process on how to calculate the overall transport storage requirements is available from here (section: Transport Storage Requirements). In a nutshell, it comes down to the following formula:
Overall Transport DB Size = average message size x (number of users x message profile) x (1 + (percentage of messages queued x maximum queue days) + Safety Net hold days) x 2 copies for high availability
If you have a deployment of 90,000 users with an average message size of 100kb sending/receiving approx. 75 messages per day and a SafetyNet hold period of 7 days, the formula yields a total transport storage requirement of:
100kb x (90,000x75) x (1 + (100% x 2) + 7) x 2 = 12,784 GB (or 12,57 TB)
You’ll find that the overall CPU requirement for Edge Transport servers is quite insignificant. Still, this doesn’t mean you don’t have to pay attention to it. Depending on the type of CPUs that are being used, and whether the servers are virtualized or not, I use a ratio between the deployed Mailbox Server cores vs. Edge Transport server cores of 5:1 up to 8:1.
To be honest, I prefer Edge Transport servers to be virtualized as it allows for easy scaling up/down as you take them into production and are ramping up utilization.
Imagine you have 20 Mailbox Servers, each having 20 CPU cores. This means you have a total of 400 CPU cores. Using a ration of 5:1, you need approx. 80 CPU cores. Or, theoretically, 4 Edge Transport Servers with 20 cores each.
Bringing it all together
Given the above, you must also consider any HA requirements. If you are deploying your solution across two data centers, and you require each data center to be able to handle the entire load, you should deploy 4 Edge Transport Servers per data center... Though you might find that those servers are relatively underutilized under normal circumstances.
Now that you know that you are deploying 8 servers, you also realize that you need approx. 1.6TB of disk space for the transport database per server––assuming you are using the full capacity... Regardless of what you do, make sure that you do not exceed 2TB as a requirement for the transport database (per server). If that is the case, scale out by adding more Edge Transport servers.
In large environments that process a lot of messages (and thus sends many messages between the on-premises organization and Office 365), it is advised to tweak the Edge Transport service to offer the best performance and avoid queueing of messages to Office 365.
On the Edge Transport server, edit the EdgeTransport.Exe.Config file and add the following key to the <AppSettings> section of the file:
<Add key = "SmtpConnectorQueueMessageCountThresholdForConcurrentConnections" value = "5”>
Note that, by default, the key does not exist and defaults to a value of 20. To guarantee the best possible performance, consider lowering the value of the key to 5. If needed, you can even go as low as 2. Changing the key value doesn’t guarantee that all performance issues, if any, are solved. It’s just one of many cogs in the wheel that can have an impact.
If you ever find yourself troubleshooting slow mail delivery to Office 365, make sure to look at the SMTP protocol logs on the Edge Transport servers as they might contain valuable information that can disclose where to start looking: the logs can reveal if there is e.g. slowness on the receiving side (e.g. delays after the RCPT TO command). Environmental problems such as an underperforming server and slow or bad connectivity won’t necessarily show up in the protocol logs, and might require alternative ways to troubleshoot.