Office 365 Monitoring: Exchange Online Outage #2 November 5, 2020
On November 5th, 2020 at ~7:00pm UTC, Microsoft reported a second outage of the day preventing some users from accessing their mailboxes through Exchange Online via all connection methods.
Followers on Twitter expressed their frustration with the recent string of outages Microsoft has endured over recent months. While others took a lighter approach making jokes, some referencing Microsoft's Office 365 title suggesting "O265" may be a more fitting name.
At ~8:45 UTC, Microsoft reported that a recent service update to a portion of their infrastructure was causing impact to mailbox access via Exchange Online from any connection method. They also reported that they were preparing a fix they expect will remediate the impact.
Roughly an hour later, at ~9:30 pm, Microsoft reported that further investigation identified a network driver issue as the underlying cause of impact. They reported that they were applying configuration updates for expedited relief while they worked on a comprehensive solution to the underlying problem and would provide an ETA as soon as one becomes available.
At ~1;15 am UTC, Microsoft reported that their updates were taking longer than anticipated and they were in the process of narrowing down alternate mitigation options for faster relief to customers.
Two hours later, they reported that had identified and validated a solution to help mitigate impact and they had began applying the solution.
At ~5:30 am UTC, Microsoft reported that they had received reports of recovery from some users that had received the fix. Finally, at ~7:30am UTC Microsoft reported that they had applied the fix to all remaining environments and confirmed the issue had been mitigated.
The Importance of Office 365 Monitoring
In a cloud-world, outages are bound to happen. While Microsoft is responsible for restoring service during outages, IT needs to take ownership of their environment and user experience. It is crucial to have greater visibility into business impacts during a service outage the moment it happens.
ENow’s Office 365 Monitoring and Reporting solution enables IT Pros to pinpoint the exact services effected and root cause of the issues an organization is experiencing during a service outage by providing:
The ability to monitor entire environments in one place with ENow’s OneLook dashboard which makes identifying a problem fast and easy without having to scramble through Twitter and the Service Health Dashboard looking for answers.
A full picture of all services and subset of services affected during an outage with ENow’s remote probes which covers several Office 365 apps and other cloud-based collaboration services.