<img height="1" width="1" src="https://www.facebook.com/tr?id=1529264867168163&amp;ev=PageView &amp;noscript=1">
blog_listing_hero_img.jpg

Office 365 Monitoring: Microsoft 365 Services Outage March 15, 2021

Throughout this post, we will be dissecting the key events of the worldwide Microsoft 365 services outage that spanned over 9 hours on March 15th 2021. 

On March 15th, 2021 at 8:40 pm UTC, Microsoft reported an issues that was preventing users from accessing Microsoft 365 services.Shortly after, they confirmed that the issue could be affecting users worldwide. 


IMG_3552

Many users took to Twitter to express their frustration with the major outage as well as their inability to check the Service Health Dashboard for updates.

Screen Shot 2021-03-15 at 1.19.16 PM

As you can see below, many users are able to login to the Admin Center but, no information on the current outage is available.

Screen Shot 2021-03-15 at 2.16.40 PM

Further details could be found on https://status.office.com. According to Microsoft, any service that leverages Azure AD may be affected, including but not limited to Microsoft Teams, Forms, Exchange, Intune and Yammer.

Screen Shot 2021-03-15 at 1.49.35 PM


Microsoft reported that they had identified an issue with a recent change to an authentication system and they were rolling back the update to mitigate impact. 

IMG_3563

Shortly after, Microsoft reported that the process to roll back the change was taking longer than expected and that they would provide an ETA as soon as one became available.

IMG_3561

Finally, at ~10:15 pm UTC, Microsoft reported that they were rolling out a mitigation worldwide and customers should begin to see recovery at this time. They anticipated full remediation within the hour. 

IMG_3559

At ~11:00 pm, Microsoft confirmed that the update had finished deployment to all impacted regions and that Microsoft 365 services were showing a decreasing error rate in telemetry.

IMG_3567

Roughly two hours later, Microsoft reported that service health had improved across multiple Microsoft 365 services. However, they were still taking steps to resolve isolated residual impact for services that were still experiencing impact.

IMG_3574

At ~2:30 am UTC, Microsoft reported that they had received confirmation that most services had recovered and that they would continue to monitor the remaining impacted services until fully mitigated and would continue to provide updates via status.office.com

IMG_3575

Finally, at ~5:30 am, Microsoft confirmed impact has been largely mitigated and they would continue to provide service-specific updates.

IMG_3576

Microsoft posted another update at 11:30 am confirming again that the majority of services impacted had recovered with the exception of Intune and Microsoft Managed Desktop. 

IMG_3592

Microsoft Root Cause Analysis (Tracking ID LN01-P8Z)

According to Microsoft, the route cause of the outage was as follows: "The preliminary analysis of this incident shows that an error occurred in the rotation of keys used to support Azure AD’s use of OpenID, and other, Identity standard protocols for cryptographic signing operations. As part of standard security hygiene, an automated system, on a time-based schedule, removes keys that are no longer in use. Over the last few weeks, a particular key was marked as “retain” for longer than normal to support a complex cross-cloud migration. This exposed a bug where the automation incorrectly ignored that “retain” state, leading it to remove that particular key."

You can find more information on the Azure status history here.


Office 365 Monitoring: For less than a cup of coffee

Yesterdays outage was yet another reminder that organizations are at the mercy of cloud providers like Microsoft, however IT's reputation is still on the hook. The faster the IT team can determine if an outage is caused by Microsoft vs their infrastructure the greater chance IT Pros can protect workplace productivity.

ENow's OneLook dashboard monitors all of Office 365 from a single pane of glass. When an issue does occur, IT Pros are easily able to identify the services affected and drill down to the root cause. This enables IT to confidently inform upper management of issues and recommend alternative solutions until service is restored. 

 

Don't wait for the next outage. Contact us today to start monitoring Microsoft Office 365 to ensure your prepared for the next one.


Learn more