Back to Blog

Office 365 Monitoring: 5/2/19 Office 365 and Azure Global Outage Recap

Image of Michael Van Horenbeeck MVP, MCSM
Michael Van Horenbeeck MVP, MCSM
Microsoft 365 Outage banner

Microsoft's latest outage reveals some attention points for Microsoft.

This past Thursday, May 2nd 2019, Microsoft suffered another outage on (parts of) its cloud services. The outage follows a series of outages, earlier this year, affecting a variety of online services including Azure, SharePoint Online, OneDrive, Intune, Microsoft Teams, etc.

Twitter screenshot

According to the recently published post-incident report, the root of the issue was a faulty DNS update, leaving thousands of users unable to connect to said services for a period of roughly two hours:

“A configuration issue occurred during planned maintenance activity related to a name server delegation change within Azure Domain Name Services (DNS). Specifically, an issue in the update to one of the name servers for DNS zones caused server records to point to a DNS server that contained blank zone data. As a result, the affected DNS infrastructure returned negative responses and users encountered connectivity issues when attempting to access Microsoft services."

If anything, the outage shows there are several areas of improvement for Microsoft. For example, the lack of (correct) communications left a lot of customers wondering what was wrong. This is an issue that keeps reappearing through various outages.

During the early stages of the outage, Microsoft’s various health dashboards showed no issues, forcing customers to turn to Twitter to find out more information about the issue itself:

  Twitter screenshot
 
Another interesting fact is that Microsoft did not catch the issue before customers did, as one of the report findings outlines:

“This issue was noticed and reported by customers before our monitoring detected the issue.”

All things considered; I don’t blame Microsoft for not immediately catching up onto the issue. After all, it was an external element causing disruption connecting to its service, there wasn’t something wrong with the service itself. It, however, does show that a holistic approach to monitoring is crucial.

That is why ENow's Office 365 Monitoring solution doesn’t rely on a single point of monitoring, but also leverages external probes to gain visibility in these types of external disturbances.

Luckily, not all customers seem to have suffered from this outage. I, myself, for instance did not notice anything of the outage. This was probably because of the various DNS entries that were cached (long enough) along the way. But in hindsight, the impact could have been much worse.

Despite Microsoft’s efforts to prevent failures like this from happening, there is no such thing as a 100% fail-safe strategy. Issues can (and will) happen, and this outage is the perfect example of it. And while issues of this size are typically far and few in between, detecting issues early-on really makes a difference to how you, as an organization, can deal with it. You won’t be able to solve the issues for Microsoft, but you can get ahead of the curve and communicate more clearly to your users about it. Depending on the type of outage, you might even be able to proactively provide a workaround before the issue spins out of control.
 

ENow Software's Office 365 monitoring provides visibility

ENow Software is the leading provider of Office365 Management solutions that helps you save money and increase end user productivity.

Let’s take a look at how ENow’s Office 365 monitoring solution quickly surfaces problems in real-time and allows our customers to successfully diagnose and troubleshoot tricky outages like the SharePoint and OneDrive Online problems.

 Shortly after the DNS problem began taking effect, we received some visual indications on the OneLook Dashboard that pointed us in the direction of the problem. The screenshot below shows that there are critical issues for Office 365 Network connectivity as well as a problem with Teams and SharePoint Online. This helps us understand immediately that there is a problem with the Office 365 service.
.application dashboard

During the May 2nd outage, ENow customers saw that there were failed status notifications for One Drive, SharePoint Online, and Teams.

network status

 Drilling down into the SharePoint Online indicator shows that we are not able to connect to the SharePoint Online service. 3

Additionally, ENow’s Office 365 monitoring solution performs synthetic transactions that test the functionality specific to your tenant. We can see from the image below that because we are not able to connect to the SharePoint Online service, our upload/download test fails.

4

Users who rely on the Microsoft Service Health dashboard didn’t get a concrete update for several hours. This frustration can be avoided by utilizing ENow’s OneLook Dashboard to save precious time when there is an outage.

ENow saves the day again!

ENow customers like Barclays, Facebook and VMware were able to quickly identify and drill down to the root cause of the problem as it was happening.

Watch the video below to see how this took place in real time!

 


The Importance of Office 365 Monitoring

In a cloud-world, outages are bound to happen. While Microsoft is responsible for restoring service during outages, IT needs to take ownership of their environment and user experience. It is crucial to have greater visibility into business impacts during a service outage the moment it happens.

ENow’s Office 365 Monitoring and Reporting solution enables IT Pros to pinpoint the exact services effected and root cause of the issues an organization is experiencing during a service outage by providing:

  • The ability to monitor entire environments in one place with ENow’s OneLook dashboard which makes identifying a problem fast and easy without having to scramble through Twitter and the Service Health Dashboard looking for answers.
  • A full picture of all services and subset of services affected during an outage with ENow’s remote probes which covers several Office 365 apps and other cloud-based collaboration services.

Identify the scope of Office 365 service outage impacts and restore workplace productivity with ENow’s Office 365 Monitoring and Reporting solution. Access your free 14-day trial today!


Microsoft Outage banner

Office 365 Monitoring: Microsoft 365 Services Outage December 11, 2020

Image of ENow Software
ENow Software

On December 11, 2020, at ~2:00 pm UTC, Microsoft reported an issue affecting users in the United...

Read more
Office 365 Outage banner

Office 365 Monitoring: Office 365 Services Outage September 28, 2020

Image of ENow Software
ENow Software

On September 28th at roughly ~9:45 pm UTC, Microsoft reported an issue affecting multiple Microsoft...

Read more