On September 28, 2021, at ~11:05 am UTC, Microsoft posted a notification (@MSFT365status) indicating that they were investigating an issue in which some users were unable to access Microsoft 365 services due to a Multi-Factor Authentication (MFA) problem.
The incident number was MO287933.
Very limited information was disclosed at first by Microsoft, and many @MSFT365status Twitter followers were responding with demands for additional information. The number of impacted users seemed significant based on the tone of the responses from system admins and IT pros.
Over an hour after their first message, Microsoft sent out a second tweet indicating that they were still investigating the matter and that they were able to confirm that this issue impacted only those users with on-prem MFA servers which connect to ADFS or NPS.
Twitter responses on @MSFT365status at this time were mixed, as some were demanding an ETA for service restoration, while others were indicating that they were seeing service restoration.
By approximately 1:00 pm UTC, Microsoft was communicating more detailed information via status.office.com. Microsoft was re-confirming, now with more clarity, that this issue specifically impacts customers using on-prem Multi-Factor Authentication (MFA) servers to connect to Active Directory Federation Services (ADFS) or Network Policy Services (NPS). They made it clear that Cloud Authentication was NOT impacted. Microsoft also indicated that remediation efforts were already underway ("scale up activities on underlying infrastructure") and some impacted users were already confirming that the issue was resolved. However, Microsoft did not communicate at this time any meaningful information as to the root cause of the issue.
Microsoft's communication @MSFT365status at approximately 01:53 pm UTC, while reassuring for impacted users, still provided no detailed information as to a root cause.
And by 2:42 pm UTC, some 3 and half hours after first report, it was all over. Thankfully. On Twitter and status.office.com, Microsoft confirmed that all scale up activities on the underlying infrastructure were completed at this time. And, more importantly, they confirmed that the matter was resolved for all impacted customers. But still, no information was forthcoming as to a root cause.
The Importance of Office 365 Monitoring
In a cloud-world, outages are bound to happen. While Microsoft is responsible for restoring service during outages, IT needs to take ownership of their environment and user experience. It is crucial to have greater visibility into business impacts during a service outage the moment it happens.
ENow’s Office 365 Monitoring and Reporting solution enables IT Pros to pinpoint the exact services effected and root cause of the issues an organization is experiencing during a service outage by providing:
- The ability to monitor networks and entire environments in one place with ENow’s OneLook dashboard which makes identifying a problem fast and easy without having to scramble through Twitter and the Service Health Dashboard looking for answers.
- A full picture of all services and subset of services affected during an outage with ENow’s remote probes which covers several Office 365 apps and other cloud-based collaboration services.