If you’re reading this, the odds are good that at some point in your life you’ve seen a Disney movie of some sort. Many of these movies feature some sort of magic device, such as a magic mirror or a pool of water, that let the characters see things they otherwise couldn’t. Think of the wicked queen in Snow White chanting “Mirror, mirror, on the wall…” and you’ll get the idea. Microsoft offers something very similar in the form of the service health dashboard (SHD) in Office 365.
The idea behind this dashboard is a good one: collect all the information a tenant administrator needs about the health and service quality of that particular tenant in an easy-to-view form so that admins can see when problems with the service are impacting, or are likely to impact, their users.
This move makes perfect sense; after all, Microsoft has complete knowledge of everything that’s going on inside their service, so it’s logical that they’d collect it and make it visible to their customers. In practice, that’s not exactly what happens. See, it turns out that the SHD shares another trait with many Disney movies: they often depict a fantastic world that is very different from our own more mundane reality.
Out Of Sight
The first issue is that Microsoft’s vision is necessarily turned almost entirely inwards. Like Belle’s magic mirror in Beauty and the Beast, the SHD only works in one direction. Microsoft can show you information about the health of those parts of their service that (may) affect your users, but they can’t show—because they don’t know—information about the health of your hybrid components that may be affecting your users’ access. For example, let’s say you’re using AD FS and one of the servers behind your load balancer dies. Microsoft can’t tell that that’s happened so they can’t warn you. The same holds with AADConnect, on-premises DNS servers, and other components that your users depend on but which Microsoft doesn’t control: the SHD can’t report what Microsoft doesn’t see.
Mirror, Mirror, On Another Continent…
Take the recent outage that affected some Office 365 and Exchange Online users in Japan, Australia, and the Pacific rim. Microsoft detected this outage and it was posted on the SHD for some affected users. However, consider the case of one of our customers whose tenant is homed in a European data center: the customer didn’t see anything about the outage in their SHD because Microsoft didn’t “know” that their users were affected. The users’ mailboxes and other resources are all homed in Europe, after all. In fairness, the customer had chosen to set the user accounts’ location to force data residency in Europe, so it would be non-trivial for Microsoft to figure out that a user whose mailbox is in Dublin really lives and works in Singapore.
Help! I’m Drowning!
Office 365 has a scale and complexity that are pretty astonishing when you consider them carefully. From an engineering and functionality perspective, Microsoft has built something amazing. As with any other very complex system, though, it can be difficult to find the right level of abstraction when talking about it, and this is certainly true of the SHD. One very common complaint from Office 365 administrators is that the SHD is too chatty—that is, it shows alerts for services, workloads, or outages that aren’t interesting to a given administrator because they don’t affect her users. To solve this problem, Microsoft needs much more detail about an individual tenant’s users, where they are, what they’re doing, and so on.
Super Powers Required
Self-service IT has become increasingly common, and Microsoft has been a leader in giving users the tools to provision and use Office 365 workloads without a lot of IT involvement. This isn’t true when it comes to service quality monitoring, though. The primary tool that ordinary users have to find out about Office 365 outages is Twitter, which isn’t ideal for the users, their tenant administrators, or Microsoft. The existing SHD is only visible to administrators, and it’s only usable when the admins themselves are able to log in to the SHD—something that has been impossible in a couple of recent outages.
Supplementing The SHD
To borrow a phrase from one of my college math professors, the SHD is necessary but not sufficient for tenant administrators to see what’s happening with their tenant and the users who depend on it. Mailscape 365 displays all the data that Microsoft exposes in the SHD, but it supplements that information with a wealth of other capabilities:
Mailscape 365 runs active synthetic tests against critical components, including dirsync, AD FS, and the Office 365 workloads, to verify that they’re working. By performing the same operations that clients such as Skype and Outlook do, we show you whether those clients will work.
Mailscape 365 user experience probes allow you to monitor service quality from the locations where your users are actually located. These probes show network conditions and service availability for each workload, allowing you to localize problems and take corrective action based on which users are affected, where they are, and when problems occur.
Mailscape 365 covers your Office 365 infrastructure end-to-end. Your on-premises hybrid components, network links to Microsoft’s cloud, and Microsoft’s services themselves are all continuously monitored so that you can get immediate notification when a problem occurs—before you start getting trouble tickets or phone calls from aggravated users.