As IT Pros a major part of our responsibility is to keep our organizations IT services up and running. Historically this was a pretty straight forward job. It’s never been an easy job, but your software on your servers connected to your network makes everything straight forward. Moving services to Office 365 makes things much more complicated. How do you manage an outage for a cloud service? Is there any point to monitoring a cloud service when you can’t do anything to fix an outage?
In this blog post I’m going to look at a recent Office 365 outage and talk about what we as IT Pros should be doing to ensure that we’re helping the organizations we work for get the most out of their Office 365 subscription.
Recent Office 365 outages
As I’m writing this on the morning of July 8, if your organization is setup using Exchange Hybrid with some mailboxes on-premises on Exchange before 2013 SP1, you may be experiencing a Free/Busy “outage”. On Friday 7/5 Microsoft switched the certificates for the Federation Gateway leaving some organizations with older on-premises deployments of Exchange without working Free/Busy.
The solution for this outage is to simply run the following PowerShell cmdlet.
Get-FederationTrust | Set-FederationTrust -RefeshMetedata
Not a huge problem, but certainly something that could be avoided with some careful monitoring of your Office 365 deployment.
On July 2nd just before 1 PM pacific time, the Microsoft Twitter account @MSFT365Status tweeted about an outage caused by a network device within Microsoft’s infrastructure. Microsoft later provided more information about this outage under MO184196 at status.office.com.
On July 3rd, Office 365 had an outage in SharePoint Online. The incident for that outage was SP184328, and more information about that outage can also be found on status.office.com.
That’s three examples of problems within Office 365 that may have affected your users. If you’re in a situation where knowing about these outages before your end-users started calling your helpdesk, maybe it’s time to start looking into a third-party monitoring solution.
How do I know if “the cloud” is down?
The move from on premises IT services into cloud services can be a tough transition for organizations and IT department alike. This transition is the largest change in how IT works that I have seen in my nearly 30-year career. Even those of us that are accustomed to working in dynamic environments will find this transition over whelming.
When your organization moves to Office 365 and other cloud services, it can become much more difficult to identify an outage. It’s highly unlikely that Office 365 will ever be completely down. Outages do happen, but they are generally limited to a part of the service.
Microsoft does try to provide some information about outages to administrators via the Office 365 portal and the Office 365 Admin app. While both tools will give you some information about some Office 365 outages, you do have to keep in mind that Microsoft isn’t going to go out of the way to point out flaws in their service. I’m not saying that Microsoft will “hide” information about outages, but they are not going to go out of their way to point out outages that they think would otherwise go unnoticed.
If there isn’t anything I can do about an outage, why do I need to know about it?
If there is an outage in Office 365, there typically isn’t much you can do to fix it. If your organization is small with a limited IT budget maybe you don’t really need to be the first to know about Office 365 outages.
However, many larger organizations will have an internal customer base that is used to being notified of outages, instead of being the ones to report outages to the IT department. If your users need a higher quality of IT services that does not depend on them reporting problems to you, it may be appropriate for you to setup a third party monitoring system that has the capability to notify you of Office 365 outages before your end-users report them to you.
Beyond just notifying your end-users of an outage, there might be something you can do to get your user community up and working again. Some organizations deploy third party solutions for backing up Office 365 data, or even third-party solutions that fill in an “outage gap” to bring minimal features back to the users. Often these solutions will require some administrator intervention to activate their services. If that’s your situation, you need to know about outages before you can activate your contingency plans.
Monitoring Office 365 Outages
The savvy readers among you have probably put together that this blog post on ENow’s website aligns well with the service that ENow sells. Of course, the good folks at ENow would love for you to sign up for their Office 365 monitoring service, but I’m not going to turn this blog post into an ad.
The fact is some Office 365 customers could use an additional layer of monitoring to ensure that they are aware of Office 365 outages and able to respond appropriately before their end-users are affected. If that is the situation you find your organization in, I think it’s a good idea for you to talk to the people at ENow about their monitoring solution.
If you think your IT department can provide better service to your user community with the help of an improved Office 365 monitoring solution, check out the link below to find out more about ENow’s solution.