For many admins, their worst fears happened this morning: Users with Multi-factor Authentication (MFA) have been unable to sign into services since approximately 6:30am PDT today (10/18/2019).
So, what happens during this time? Well first, users who are affected will need to relax and just wait for the situation to be fixed. If you are an admin, you can plan around how to handle this better next time (Yes, I'm sure it will happen again. This is a when, not if situation.)
As an administrator, being locked out of the Admin portals can be a big deal. You can't get any work done or even be aware of what else might be happening in your environment. How do you solve for that?
First answer: Break Glass Account. A Break Glass Account is an account that has access without relying on things such as Phone-based MFA or Federation. Here are some of Microsoft's best practices:
The account should be a Cloud-only account that uses the *.onmicrosoft.com domain. Do not use a federated account.
The account should not be associated with an individual. Make it something like "firstname.lastname@example.org" or something. You don't want to have to find the user it is tied to when an emergency happens.
Make sure the authentication method is different than your other accounts.
Exclude at least one account from phone-based MFA.
If you have deployed MFA for your organization, hopefully you have deployed things like Conditional Access which can help avoid issues in a situation like today. If the policy is applied and the user is in a known location (like the office), they can still access their work. It will help minimize the impact to users in unknown locations (such as the local café).
Have you been affected by the outage today? If so, did you have a Break Glass Account? Did you have Conditional Access setup to minimize the impact?
The Importance of Office 365 Monitoring
In a cloud-world, outages are bound to happen. While Microsoft is responsible for restoring service during outages, IT needs to take ownership of their environment and user experience. It is crucial to have greater visibility into business impacts during a service outage the moment it happens.
ENow’s Office 365 Monitoring and Reporting solution enables IT Pros to pinpoint the exact services effected and root cause of the issues an organization is experiencing during a service outage by providing:
The ability to monitor entire environments in one place with ENow’s OneLook dashboard which makes identifying a problem fast and easy without having to scramble through Twitter and the Service Health Dashboard looking for answers.
A full picture of all services and subset of services affected during an outage with ENow’s remote probes which covers several Office 365 apps and other cloud-based collaboration services.