Office 365 Outage Monitoring | MFA Outage
For many admins, their worst fears happened this morning: Users with Multi-factor Authentication...
On November 19, Office 365 experienced another outage. Some outages are more localized and only have a minor impact to end users, however this time it was widespread, across the globe and impacted a multitude of Office Services. This led many end users to flood help desks and many IT Pros were left with little information to report back to their end users.
Hearing the words “We’re experiencing an Outage” is never something an IT Pro wants to hear. However, the way IT Pros handle an outage has changed substantially over the years. Back in the on-premises days and outage was completely on an Exchange IT Pro (although it was often a networking or storage issue). The IT Pro could easily RDP into the server, figure out the problem, and then work all hours of the night to get it up and running again.
Then the world of IT changed, many organizations adopted hybrid Office 365 deployments and cloud technologies in general. With the rise of Office 365 the world of an IT Pro changed, a wide variety of services from Exchange Online, Skype, Teams, SharePoint, and OneDrive were all under one umbrella. The industry saw a shift in roles as a lot of old Exchange IT Pros had to get skilled up in a wide variety of services. For some time, many bought the sales pitch that moving to the cloud meant the responsibility was no longer on the IT Pro. However, that’s only partially true…
During the Office 365 outage on November 19, as with most outages, only a subset of services we’re affected. In ENow’s tenant, we noticed that Network connectivity, Exchange Online, Teams, Planner, OneDrive, among other services were affected.
However, without an Office 365 monitoring solution in place many IT Pros had no visibility on the scope of the problem as they couldn’t access the IT Pro center. This led many to turn to Twitter, as you can see below many in the community were frustrated.
Over the next three hours, Microsoft periodically updated the Microsoft 365 status on Twitter as well as Office 365 health status page. They first update on the Service Status page noted the following:
“We've determined that users may experience intermittent access issues with the Microsoft 365 IT Pro Center, Exchange Online, SharePoint Online, Microsoft Teams, Skype for Business, and Yammer at this time. We're continuing our investigation into the root cause and we'll provide more information when we’ve isolated the root cause."
When the outage was finally rectified, Microsoft directed IT Pros to the IT Pro center to learn more. As of 11 am PDT on November 21st, IT Pros are still waiting on the post-incident report.
When your experiencing an Office 365 outage, sometimes you truly have to wait it out until Microsoft resolves the issue. However, as IT Pro you’re still often held accountable by the rest of your company. This puts many IT pros in a catch-22 situation.
While Office 365 is a tremendous service, outages happen, and it is more of a when than if situation. With monitoring in place IT Pros can go from a reactive to proactive state. Today IT Pros are often left in the dark, but with a monitoring solution in place the lights are turned back on. By knowing the service status and subset of users affected by outages, IT can direct their attention to putting a mitigation plan together and inform their end users. For example, some outage may only affect a subset of services. IT can then communicate what’s disrupted and possible work arounds their end users can utilize. Perhaps Skype went down, and users can utilize Teams instead. Or maybe, the next outage will only affect OneDrive during that time you can instruct users to save locally until the issue is resolved.
In this case, the outage was clearly on Microsoft. However, not all outage culprits are on Microsoft, in fact Microsoft once estimated that over 52% of support cases end up being issues on the client side. Just as you have no visibility into the infrastructure on Microsoft’s end, Microsoft has no visibility on your side including Hybrid Servers, the Network, or certificates. Over the years we have heard the horror stories of IT Pros who assumed that an outage they were experiencing on Microsoft, when the issue was truly a certificate on the organizations end…
ENow Software is the leading provider of Office365 Management solutions that helps organization detect outages immediately, validate the end user experience, and control license spend.
Let’s quickly walkthrough how ENow surfaces problems in real-time and enable our customers like Facebook and VMware to successfully navigate the outages to achieve SLA transparency.
Once the November 19th outage started to affect the ability to access services like Exchange Online, Teams, Planner and the IT Pro center, the ENow OneLook dashboard turned red as a visual indicator for the NOC. You can see in the screenshot below that the Network, Exchange Online, and Teams & SharePoint.
The visual queue of the red indicators quickly show there are issues with the Office 365 service.
IT Pros can then continue to drill down the root cause for each various services. Additionally, remote probes can be installed so you can isolate the subset of users affected.
In a cloud-world, outages are bound to happen. While Microsoft is responsible for restoring service during outages, IT needs to take ownership of their environment and user experience. It is crucial to have greater visibility into business impacts during a service outage the moment it happens.
ENow’s Office 365 Monitoring and Reporting solution enables IT Pros to pinpoint the exact services effected and root cause of the issues an organization is experiencing during a service outage by providing: