Why No Lessons Learned Microsoft?

Last week saw the first major service disruption to Office 365 in several years. A severe storm in Texas impacted the cooling system at the US South Central data centre, which resulted in protective systems in the data centre switching into containment mode and shutting down servers to prevent further damage. Many people in the immediate local area were affected, but more worryingly, so were users far outside the local area as cascading effects were felt with Azure AD across the world.

That was last week. Everything is back to normal. I decried the state of communication during the outage and asked for more human moments throughout. And after monitoring the Microsoft news sites for the past week and seeing nothing (nada, zilch, zero) about the outage and what went wrong, I’m left wondering why not.

Clearly something happened that should not have happened. Clearly something in how Azure AD (and other non-regional services like the Azure Resource Manager) is engineered / architectured is not where it should be yet. What I’m looking for is an explanation and elaboration of what happened, what Microsoft is going to do to resolve it properly this time, and perhaps even some insight into what happened in the data centre last week.

Customers purchasing cloud services from Microsoft rely on those abilities to do their work. And when everything is working fine, everyone is happy. But when there’s a problem, getting back to a normal state as quickly as possible is critical. But secondly – and perhaps even more importantly – is the deep analysis of what happened, what was learnt, and what will be done / is being done to prevent a recurrence. An outage we can accept, albeit grudgingly. A failure to learn from what happened we are much less willing to tolerate.

And the unwillingness to publicly disclose the learnings from a major outage makes a post like this one highly suspect, even though I’m sure the guidance is great:

We have heard from you, our customers, that you’d like us to provide more guidance and recommendations to help you successfully deploy Azure Active Directory (AD). So today, I’m excited to share a new set of step-by-step deployment plans based on the best practices we’ve learned from working with thousands of customers to successfully roll-out Azure AD.

Deployment plans guide you through the business value, planning considerations, implementation steps, and management of Azure AD solutions. They bring together everything you need to deploy Azure AD capabilities to get the maximum value. Deployment plans include Microsoft recommended best practices, user communications, planning guides, implementation steps, test cases, and more!

In the first instance, as a consequence of what happened last week, customers across the world would be more happy to know that Microsoft itself can “successfully deploy Azure Active Directory” in a way that local outages don’t cause global meltdowns.

More Human Moments from Microsoft

After a pretty good run, Microsoft Office 365 had a major outage in San Antonio TX this week. A lightning storm during the night caused a power spike in the US South Central data centre, which negatively affected the cooling system in one part of the data centre, which triggered the automatic systems to start shutting down parts of the data centre to prevent further problems. Some Office 365 customers in the region were affected. Some Azure customers in the region were affected. And various Office 365 customers around the world were also affected – one segment due to a historical decision to host metadata for early Visual Studio Team Services (VSTS) customers out of the San Antonio data centre, regardless of where they were actually located, and another segment due to Azure Active Directory suffering from degraded availability and how the global architecture of multi-factor authentication has been configured.

During the outage, Microsoft used two main avenues for communicating status to the world: the Microsoft Azure status page, and the Microsoft Office 365 status page. The problem in US South Central was such, that, however, the Azure status page was only intermittently available; it kept going up and down. The Office 365 status page was more reliable, although much less informational with updates only delivered infrequently (every 3 hours). It would be good to see more information on the Office 365 status page, along with a more regular stream of updates (every 15 minutes even). When everything has gone south and your ability to work has been degraded, any new information about timeframes and resolutions and current actions are highly valued.

The other avenue was Twitter. At 1.13pm Texas time on September 4, the @Office365Status twitter account posted this update:

When I look through the responses to the tweet, I see the following: disagreement, uncertainty, and perplexity. Various people disagreed that services were restored, as they were still under outage conditions. Others were asking what the impact of the outage would be, such as on their legal holds. And still others were perplexed that Microsoft could let this happen; shouldn’t this have been designed out of the service by now?

While the above can be seen, there is something that can’t be seen: any attempt during the outage by @Office365Status to directly respond, engage, allay fears, spread hope, or give updates. Zilch. Nada. Zero. As a company with more than 100,000 employees, surely in such outage conditions, at least one person could be on hand to provide a human moment to the paying customers of Office 365 who have responded to the original post.

To ask for more insight. What are you seeing at your place? How many people is this affecting?

To offer an apology one individual at a time. Bruce, I’m sorry that Office 365 is down now. We’re working to put it back together as fast as possible.

To give updates on what the engineers in the data centre were doing. We have 25 engineers arriving from out-of-state, to help with the physical clean up. You should see the mess!

(I’m making up these answers).

To answer the hard questions. Legal holds won’t be compromised. They will stay in place. But clearly, nothing new will be added while the system is out. Or, Will your email send after we’re back online? Yes, if it is in your Outlook outbox. It depends in other situations. What is happening at your place?

To share any updates on where the team was at, and what was happening. Wow, what a day this has been for us. So unexpected. Our first responder team is about to go off-duty, and the second team is starting in 4 minutes.

A person. A human moment. Dear Microsoft, you can do this.

Satya has said that people shouldn’t join Microsoft to be cool, but to make others cool. But when the chilling winds of an outage blow across an already cool population, the warmth of a human being delivering a human moment is needed to keep the balance and prevent everyone from freezing.

Protecting Mobile Devices

Mobile devices as endpoints to corporate information have taken the world by storm. The “mobile first” mantra refers to the preferential use of a mobile device before a desktop or laptop. Have phone, will work (or even run the company). The potential of the device to enable new ways of working has to be safeguarded from that which could undermine both current execution and the integrity of long-range plans.

In the Microsoft 365 world, this is the role of Intune Mobile Threat Defense. The service looks at what’s happening on devices, with applications, with the content of messages, with the types of network traffic going through the device … and makes a determination whether all is well or starting to go rotten (slowly or quickly). When a threat is detected – which can be in collaboration with another mobile threat analysis vendor – new protections are enforced to reduce risk, stop data loss, and contain the threat. These could be conditional access policies, such that the end user has to verify that they are the person requesting access to the information through a second factor or means of authentication. Or it could be more draconian, whereby data is locked and blocked from access by anyone or anything. If the device can be remediated – via a secondary user authentication action or a device update that contains the threat – everything goes back to how it is supposed to work.

The Microsoft Intune Team just announced a new integration with BETTER Mobile for leveraging signals from BETTER ActiveShield to trigger Intune policies around conditional access and other mitigation policies. Current Intune customers can get 50 free licenses for 18 months from BETTER Mobile, to try out the integration.

Information Protection in Windows 10 and OneDrive

Paul Thurrott’s analysis of the soon-to-be-available Windows 10 update – Version 1809 (Redstone 5) – included this snippet that caught my eye:

Storage Sense now integrates with OneDrive and can automatically change any downloaded files to online-only if you haven’t used them in a configurable number of days (in Settings > System > Storage > Storage Sense).

Every vendor struggles with the balance between releasing tools that enable productivity through information availability and protecting information from too much disclosure / availability. What should this person have access to based on their job role and their tasks is a governance question for organisations, that’s enabled by technical capabilities offered by vendors. Data loss prevention stops people from flowing information to other people when it’s sensitive or confidential and the other party doesn’t have access rights. Access control lists on collaborative workspaces, shared folders, and systems of all kinds provide another form of information protection – it lets those who need the content in, and keeps those who don’t have the right to the content out. Role-based access control goes a step further and adds the nuance of who can and cannot take specific actions within a system.

Choosing to sync your OneDrive contents to a local machine is great for productivity – everything is immediately available whether you are connected to the network or not. But the risk is that unauthorised access to your machine – directly by a person or indirectly by a security threat executing and exfiltrating the data on your disk – will enable access to content by people who do not have authorisation. To information that is sensitive, confidential, or in need of special protections. The above forthcoming integration with Storage Sense in Windows 10 will mean that content from OneDrive that is not used often can be removed from local storage, reducing the potential information protection disclosure surface. If it’s not there directly, it can’t be accessed directly … and thus there’s another action required to gain access, which can be evaluated against up-to-the-second security policies.

Onboarding and Offboarding: The Hidden Processes

There’s a whole set of activities required for effectively onboarding and offboarding new employees. People to coordinate. Processes to develop and operate efficiently. Magic moments that should just happen – because first impressions count and create memories.

One of the behind-the-scenes or hidden processes involves setting up access for the new employee to the systems they require for doing their work. An email account. Access to the collaborative workspace tools being used. HR system access. And more. This can be done manually by an IT administrator with super-user privileges across systems, or driven based on policy using a directory service with provisioning (and de-provisioning) capabilities. The latter means an administrator creates a user account in one central system (the directory), adds the user to a group that has access rights to specific others systems, and the provisioning service notes the change and follows a pre-defined script for adding the new user to other connected systems.

For Office 365 and Microsoft 365 customers, the user provisioning service in Azure Active Directory enables automated, policy-based provisioning of non-Microsoft cloud apps, such as Salesforce, Slack, GoToMeeting, Dropbox, Box and more. This creates sanctioned accounts in these services, decreasing the footprint of unsanctioned apps and shadow IT services. Last week, Microsoft announced additional services can now be provisioned and deprovisioned using Azure AD – including Asana, BlueJeans, Bonusly, LucidChart, and Zendesk.

And when an employee leaves, removing them from the groups with access to other systems essentially runs the process in reverse: user accounts are revoked and thus access privileges are removed.

Being intentional / deliberate / automated in this area is another example of what information protection looks like in practice.

Information Protection: The What – Office 365 DLP

As mentioned the other day, Microsoft uses two specific products to deal with the what of information protection: Office 365 DLP and Azure Information Protection. There are similarities between the two, but some fundamental differences as well. Let’s focus on Office 365 DLP today.

DLP is all about know and flow. Both are done specifically within the context of the DLP policies you have configured. Know is about the what – the specific sensitive information types or labeled content that exist within an email being written, a document attached to an email being written, or a document being shared from SharePoint or OneDrive.

But the know is only enacted at the point of flow, such as when the user is writing an email that has been addressed to someone not authorised to receive it (e.g., an external recipient), or a document sharing action that would share the document with someone not authorised to view or edit it.

This core idea – know and flow – aligns with the specific protection mandate of Office 365 DLP – to “prevent loss” by stopping an unauthorised someone from gaining access.

Thus DLP policies – as set up in the Security & Compliance Centre – are intended for:
– preventing an internal user from sending content in an email or attached document to a recipient who should not receive it.
– preventing the sharing of a document with someone who should not receive it.
– these actions must be taken within and through Office 365.

DLP will not prevent loss in all situations, unless there are other parts of the Information Protection portfolio in use. For example, if a user downloads a file with sensitive data and then syncs it with Dropbox (or some other cloud sharing service), that content has just disappeared. It has been taken out of the boundary of Office 365, and loss prevention capabilities are blind to what happens. Ditto if it is put onto a USB thumbdrive. There are other solutions in the portfolio – Microsoft Cloud App Security and Windows Information Protection for example – that can address most of these challenges, and Azure Information Protection to a degree as well (in conjunction with those other two). We’ll leave that complexity for another day.

But for now – DLP is all about know and flow.

Information Protection: The What

When thinking about information protection, one of the key questions is what: what specific information should be protected? Some information doesn’t need to be protected at all, such as when it is common knowledge (2+2=4) or easily available (the name of the current leader of a country).

Other information does need to be protected – for a variety of reasons (the why, which we’ll talk about more fully later). Broadly speaking, information that needs to be protected is like that because its inappropriate use or disclosure could cause harm to a person, entity, or organisation. For example, disclosing someone’s credit card number and expiry date to the wrong person could result in financial harm (unauthorised transactions, lost funds, decimated credit rating, etc.) Disclosing someone’s name, address, national ID number and similar data could result in harm through identity theft; an unauthorised actor uses that valid data to masquerade as the other person, receiving benefits that the other party is entitled to or is forced to pay for without receiving the benefit. In an organisational context, disclosing financial planning documents or explanations of the forthcoming business strategy moves to a competitor can result in a weakened market position, reduced market valuation, and in the worst case, outright business failure.

The potential to cause harm is what drives the need to create mitigations through information protection, and in Microsoft’s perspective on information protection, there are two general classes:

  1. General and generic types of information that are sensitive, and that can be computationally discovered. For example, a credit card number is a credit card number is a credit card number, and if you can work out the identifying characteristics of credit card numbers, you can detect the presence of one or more. Likewise for social security numbers (US), tax numbers (pretty much everywhere), health identification numbers (ditto), and more. Information in this class exists generally, and a specific organisation could (or may have to) protect such information if they collect or handle it.
  2. Specific types of information that could cause harm to a specific business (or government agency, organisation, non-profit, etc.) if these were to fall into the wrong hands. For example, strategy documents, financial plans, employee lists, expansion ideas, current M&A targets, and more. Information in this class exists in customized forms for specific entities, and depending on the specific business / organisation / other, will need to be set up. There are of course general classes of these types of information across most entities, but the specific realisation of that is up to the specific entity.

Microsoft deals with the above through two specific products in its information protection solutions portfolio: Office 365 data loss prevention (DLP) and Azure Information Protection (AIP). Both products can work with the generic sensitive information types as well as specific types of information that could cause harm. DLP always works automatically (scanning, analysing, thinking), and AIP can work either by user choice (manual labeling of a document or email) or based on automated content analysis. And if something is found that goes against a policy, an automated action can be triggered – such as a user notification, an alert to an administrator, or a block action that prevents the message or document from being sent / saved / shared.

“Information Protection”

If you thought “collaboration” was a wiggly word with lots of definitions and places it could be used, you should try the phrase “information protection” on for size. Once you start enumerating the types and styles and approaches and consequences and implications and gotchas, you start to build a complex picture of requirements. Which is why Microsoft doesn’t offer an “information protection” product as such, but rather a set of solutions that apply in different situations. I need to get my head around what is actually on offer in Microsoft’s Information Protection Solutions catalog, so let’s have a talk about it. And probably not just today.

The diagram above is a common one used by Microsoft to show the breadth of its solution set. The four blue circles in the middle express the generic commonalities – detect, classify, protect, and monitor. The 11 solutions around the outside are the specific products that are [1] part of the solution set, and [2] in adherence with one or more of the four blue circles.

One immediate conclusion based on the breadth of these capabilities is that information protection is complex. There’s a lot to understand when you are dealing with a product set in Office 365 for productivity and collaboration that is as broad and deep as what Microsoft is attempting. To be the company that helps “everyone to achieve more” – a broad and all-encompassing vision if ever there was one – you have to safeguard and protect the means of achieving as much as providing tools to help with the achieving.

A second observation in looking at the diagram is that it’s important to note that not all of these capabilities are in Office 365. Some are – Office 365 Message Encryption, Office 365 Advanced Security Management (now called Office 365 Cloud App Security), and Office 365 DLP – are three obvious inclusions. And of the capabilities that are in Office 365, not all are in all plans; essentially, if you want all the Office 365 capabilities, you’ll need to purchase the E5 license. Lower licensing levels have a diminishing number of capabilities. The rest of the capabilities come from the Enterprise Mobility + Security plan – this is where you get the full version of Microsoft Cloud App Security, Conditional Access (from Azure Active Directory), Azure Information Protection, and more. One way of thinking about it is that you buy Office 365 E5 for productivity and collaboration and Enterprise Mobility + Security E5 for safeguarding that productivity and collaboration. It’s not a fully correct differentiation, but it’s a broadly accurate distinction. And if you buy the Microsoft 365 plan, you get both the Office 365 capabilities and Enterprise Mobility + Security capabilities, along with Windows 10.

So what do the above capabilities actually do? Let’s talk about that another day.

Even Microsoft Struggles With It

With the rate of change in Office 365, everyone is struggling with staying up to date on what’s available, what’s coming, what’s possible, what’s not working yet, and more. There’s the cognitive-only staying up to date, which can be done by some disciplined (but rather relentless) reading each day. But there’s also the more formalised artifacts that are produced that cause the major problems: the help videos, user guides, screenshots, scenario explorations, and much more that need to be kept up-to-date.

But since even Microsoft struggles with its own relentless cadence of change, don’t take a too hard line on yourself.

Here’s an example: Microsoft released a new approver role for the Customer Lockbox feature in Office 365 E5 (also accessible through the Advanced Compliance add-on). It’s a good addition to the service, because weighing down global admins with every small detail on running the service isn’t a good design or operational principle. And just because you are a global admin of a tenant does not equate with you having the right business knowledge to be able to judge between valid and invalid requests by Microsoft engineers during support incidents. Someone else might be better placed to do that – providing a better chain of authority and approval. So the new role is a good nuance to add, and is in line with the general proliferation of feature-specific roles in Office 365.

Anyway, in making the above announcement, Microsoft includes a video from November 2015 that explains Customer Lockbox. The talking at the beginning is fine, the animations of how the support request work are also fine, but the live demo and click through of the interface … are now old. The Office 365 admin center in the video is no more; now it’s the Microsoft 365 admin center. The way the app launcher works in the video is no more; now it’s done differently. The layout of the admin center interface is also different. So while the video was correct and proper in late 2015, it’s no longer reflective of the interface and its capabilities. For people new to Office 365, seeing the video with one interface and then experiencing a different name, layout and more is confusing.

And hence this begs the question: what should Microsoft do with these older artifacts when something changes? If they were constantly recording old videos to bring them up-to-date I’m sure the cadence of change would decrease! By implication – what do you do? One of my friends in the adoption and effective use space takes the view that prepared artifacts become outdated so quickly that doing live explorations with a new business group is the only way to proceed. Don’t bother with preparing documentation; just learn in the moment, and go with the flow.


Microsoft Whiteboard

While a white background is a common starting experience in Microsoft’s applications, the specific capabilities of each tool both create and constrain what you can use it for. Word’s white background is for words, sentences, paragraphs and pages. Excel’s white background is for numbers and calculations and data modelling and charts. PowerPoint’s white background has traditionally been for words and sentences as well, albeit it in a different form and for a different purpose to Word’s whiteness and wordiness. In some spheres, PowerPoint is becoming more of a structured method of telling a story with photos and pictures and minimal words. Windows Explorer – for storing and sharing files. Etcetera.

What we’ve lacked for too long – constrained by not having the tool itself nor a wide distribution of touch-enabled and pen-enabled devices – is the equivalent of a whiteboard in a meeting room. The blank canvas on which you can write words, draw lines or pictures, put numbers in a table … the do anything blank canvas for beginning a new work or idea or project. Not for finishing it – there are other and better tools for that – but for starting … there’s nothing quite like a blank whiteboard or blank sheet of paper. Oh the possibilities. Oh the opportunity for … starting afresh, anew, differently, creatively.

Now that there are many more appropriate devices on the market – the iPad crowd with their Apple Pencils, the Microsoft Surface crowd with their Surface Pens, and various others – Microsoft’s release of its new Whiteboard application for Windows 10 (and soon iOS and other device platforms) makes a lot of sense. The context is ripe, so the content can now flow in new and different ways.

From the Microsoft 365 blog:

Microsoft Whiteboard is now generally available for Windows 10, coming soon to iOS, and preview on the web. Whether meeting in person or virtually, people need the ability to collaborate in real-time. The new Whiteboard application enables people to ideate, iterate, and work together both in person and remotely, across multiple devices. Using pen, touch, and keyboard, you can jot down notes, create tables and shapes, freeform drawings, and search and insert images from the web.

Welcome to Whiteboard. I got my copy for Windows 10 from the Microsoft Store. Given where it was announced, the collaboration capabilities will require an Office 365 subscription of some kind.

It’s time to let our pursuit of the perfect begin again with the mighty pen.